CA3140571A1 - Method and apparatus for clickhouse-based crowd selection - Google Patents

Method and apparatus for clickhouse-based crowd selection Download PDF

Info

Publication number
CA3140571A1
CA3140571A1 CA3140571A CA3140571A CA3140571A1 CA 3140571 A1 CA3140571 A1 CA 3140571A1 CA 3140571 A CA3140571 A CA 3140571A CA 3140571 A CA3140571 A CA 3140571A CA 3140571 A1 CA3140571 A1 CA 3140571A1
Authority
CA
Canada
Prior art keywords
tag
query
clickhouse
type
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3140571A
Other languages
French (fr)
Inventor
Qiao XIE
Zhiwei Wang
Zhaogui Yang
Weijian Yu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3140571A1 publication Critical patent/CA3140571A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0244Optimization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Computational Linguistics (AREA)
  • Accounting & Taxation (AREA)
  • Fuzzy Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and apparatus for Clickhouse-based crowd selection relate to big data technologies, and satisfy requirements of the business side for fast and efficient computation of crowd packages.
The method involves: configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items; based on feature types of tag values, partitioning the CH
table into aggregation tables of different types; acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and by means of multithreading executing the Clickhouse query clauses parallelly, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results. The apparatus implements the described method.

Description

METHOD AND APPARATUS FOR CLICKHOUSE-BASED CROWD SELECTION
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the technical field of big data, and more particularly to a method and apparatus for Clickhouse-based crowd selection.
Description of Related Art
[0002] Currently, precise selection of crowd from massive internet data is critical to successful placement of advertising information. Usually, this can be achieved by tagging user characteristic behaviors, using the obtained tags as criteria to screen the target crowd, and placing real-time advertisement. Since advertising campaigns are usually audience specific, criteria used for screening different crowds are highly diverse.
[0003] Conventionally, off-line compute engines such as Elasticsearch and spark are used to determine this selection. However, as the business side has increasingly demanding requirements for computation performance in terms of crowd packages, these existing compute engines are becoming incompetent to satisfy users when it comes to quasi real-time performance. Besides, the data volume of selection criteria has now reached a scale of hundreds of millions, making data pulling extremely time-consuming, and this is adverse to business needs.
SUMMARY OF THE INVENTION
[0004] The objective of the present invention is to provide a method and an apparatus for Clickhouse-based crowd selection, which provide the business side with fast and efficient computation of crowd packages.

Date recue / Date received 202 1-1 1-26
[0005] To achieve the foregoing objective, in a first aspect, the present invention provides a method for Clickhouse-based crowd selection, which comprises:
[0006] configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;
[0007] partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values;
[0008] acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and
[0009] executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.
[0010] Preferably, the step of configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items comprises:
[0011] configurating the tag information through a visual interface of a business system, wherein the tag information includes three kinds of configuration information items and the corresponding tag original-table data, wherein the three kinds of configuration information items are tag codes, tag names, and tag values; and
[0012] automatically creating spark tasks to process the tag original-table data, and introducing the tag original-table data into the CH table in the Clickhouse engine in formats of tag codes, tag names, and tag values.
[0013] More preferably, the step of based on feature types of tag values, partitioning the CH
table into aggregation tables of different types comprises:
[0014] dividing the feature types of the tag values into an int type, a double type, a string type, and a date type; and
[0015] according to the feature types of the tag values in the tag original-table data, partitioning Date recue / Date received 202 1-1 1-26 the CH table correspondingly into an it-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table.
[0016] Further, after the step of partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values, the method further comprises:
[0017] distributing the aggregation tables across nodes of the Clickhouse engine, for performing distributed query.
[0018] Preferably, the step of acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses comprises:
[0019] selecting the query tags through the visual interface of the business system to form the sql query criteria, and sending the sql query criteria to kafka; and
[0020] consuming sql query criteria data, receiving drop tables through SparkStreaming tasks, and converting the sql query criteria into the Clickhouse query clauses.
[0021] Preferably, the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results comprises:
[0022] using the Spark tasks to execute the Clickhouse query clauses parallelly by means of multithreading, while acquiring detail data from the corresponding aggregation tables and storing the detail data into the Hive table, so as to obtain the crowd selection query results through computation.
[0023] Preferably, after the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results, the method further comprises:

Date recue / Date received 202 1-1 1-26
[0024] storing the detail data of the Hive table into redis, for the business system to use according to business needs.
[0025] As compared to the prior art, the method for Clickhouse-based crowd selection of the present invention has the following beneficial effects:
[0026] With the disclosed method for Clickhouse-based crowd selection, a user is enabled to configure tag information through a back end of a business system and create spark tasks.
Then the user can introduce original-table data of tags into a CH table of a Clickhouse engine according to configured information items. Afterward, according to feature types of tag values, a CH table in a Clickhouse engine can be partitioned into different types of aggregation table. Subsequently, the user can provide query tags to generate sql query criteria.
The criteria are then converted into Clickhouse query clauses. By means of multithreading, the Clickhouse query clauses are executed parallelly. Query detail data are acquired from aggregation tables of the corresponding types, and stored into a Hive table, so that crowd selection query results can be obtained.
[0027] It is thus clear that the present invention uses a concept of Clickhouse + Spark computation to introduce the computation process for criteria of tag combinations into a Clickhouse engine, and use the combination of the Spark compute engine and the Clickhouse compute engine to generate the crowd package, thereby shortening the time required by computation of detail data satisfying the criteria of tag combinations from minutes to seconds.
[0028] In a second aspect, the present invention provides an apparatus for Clickhouse-based crowd selection, which is applicable to the method for Clickhouse-based crowd selection of the foregoing technical scheme. The apparatus comprises:
[0029] a configuring unit, for configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;
[0030] a table-partitioning unit, for partitioning the CH table into aggregation tables of different Date recue / Date received 202 1-1 1-26 types on the basis of feature types of tag values;
[0031] a query unit, for acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and
[0032] a processing unit, for executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.
[0033] Preferably, the table-partitioning unit is for dividing the feature types of the tag values into an int type, a double type, a string type, and a date type, according to the feature types of the tag values in the tag original-table data, partitioning the CH table correspondingly into an it-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table.
[0034] As compared to the prior art, the disclosed apparatus for Clickhouse-based crowd selection provides beneficial effects that are similar to those provided by the disclosed method for Clickhouse-based crowd selection as enumerated above, and thus no repetitions are made herein.
[0035] In a third aspect, the present invention provides a computer-readable storage medium, storing thereon a computer program. When the computer program is executed by a processor, it implements the steps of the method for Clickhouse-based crowd selection as described previously.
[0036] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the disclosed method for Clickhouse-based crowd selection as enumerated above, and thus no repetitions are made herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0037] The accompanying drawings are provided herein for better understanding of the present invention and form a part of this disclosure. The illustrative embodiments and their Date recue / Date received 202 1-1 1-26 descriptions are for explaining the present invention and by no means form any improper limitation to the present invention, wherein:
[0038] FIG. 1 is a flowchart of the method for Clickhouse-based crowd selection of one embodiment of the present invention;
[0039] FIG. 2 is a schematic drawing illustrating that a CH table is partitioned into four types of aggregation tables according to an embodiment of the present invention; and
[0040] FIG. 3 is another flowchart of the method for Clickhouse-based crowd selection of the embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0041] To make the foregoing objectives, features, and advantages of the present invention clearer and more understandable, the following description will be directed to some embodiments as depicted in the accompanying drawings to detail the technical schemes disclosed in these embodiments. It is, however, to be understood that the embodiments referred herein are only a part of all possible embodiments and thus not exhaustive. Based on the embodiments of the present invention, all the other embodiments can be conceived without creative labor by people of ordinary skill in the art, and all these and other embodiments shall be embraced in the scope of the present invention.
Embodiment 1
[0042] Referring to FIG. 1 through FIG. 3, the present embodiment provides a method for Clickhouse-based crowd selection that comprises:
[0043] configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;
partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values; acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and by means of multithreading executing the Clickhouse query clauses parallelly, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive Date recue / Date received 202 1-1 1-26 table, and obtaining crowd selection query results.
[0044] With the disclosed method for Clickhouse-based crowd selection, a user is enabled to configure tag information through a back end of a business system and create spark tasks.
Then the user can introduce original-table data of tags into a CH table of a Clickhouse engine according to configured information items. Afterward, according to feature types of tag values, a CH table in a Clickhouse engine can be partitioned into different types of aggregation table. Subsequently, the user can provide query tags to generate sql query criteria.
The criteria are then converted into Clickhouse query clauses. By means of multithreading, the Clickhouse query clauses are executed parallelly. Query detail data are acquired from aggregation tables of the corresponding types, and stored into a Hive table, so that crowd selection query results can be obtained.
[0045] It is thus clear that the present invention uses a concept of Clickhouse + Spark computation to introduce the computation process for criteria of tag combinations into a Clickhouse engine, and use the combination of the Spark compute engine and the Clickhouse compute engine to generate the crowd package, thereby shortening the time required by computation of detail data satisfying the criteria of tag combinations from minutes to seconds.
[0046] In the foregoing embodiment, the step of configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items comprises:
[0047] configurating the tag information through a visual interface of a business system, wherein the tag information includes three kinds of configuration information items and the corresponding tag original-table data, wherein the three kinds of configuration information items are tag codes, tag names, and tag values; and automatically creating spark tasks to process the tag original-table data, and introducing the tag original-table data into the CH
table in the Clickhouse engine in formats of tag codes, tag names, and tag values.

Date recue / Date received 202 1-1 1-26
[0048] In a particular implementation, the information items may include three types, namely tag codes, tag names, and tag values. For instance, if it is desired to know preferences of male members about some merchandise item, the tag code, tag name, and tag value of the information items are set as labelname, labelvalue, and userid, respectively.
Then the tag original-table data are introduced into the CH table in the formats of the foregoing information items. For an example where one of the entries of the CH table is labelname =A, then labelvalue =male(1), userid=[1,2,3,4].
[0049] As a further example, in order to query a crowd package of 18-year-old male members, a user may, through the back end of the business system, visually configure a tag A=gender (labelname=A, labelvalue=1 representing male), and a tag B= age (labelname=B, labelvalue=18), so as to obtain the sql query criteria of "age=18 && gender =male." Then, data of male members and data of 18-year-old members are acquired from the corresponding aggregation tables. These data are assembled and computed before fed back to the user as the query results for crowd selection.
[0050] In the foregoing embodiment, the step of based on feature types of tag values, partitioning the CH table into aggregation tables of different types comprises:
[0051] dividing the feature types of the tag values into an int type, a double type, a string type, and a date type; and according to the feature types of the tag values in the tag original-table data, partitioning the CH table correspondingly into an it-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table.
[0052] As shown in FIG. 2, in a particular implementation, through the back end of the business system, the commonly used tag values are divided into four types, namely the int type, the double type, the string type, and the date type, so as to satisfy diverse business needs about query criteria. Then according to the feature types of the tag values in the tag original-table data, the CH table can correspondingly be partitioned into the it-type tag-value aggregation Date recue / Date received 202 1-1 1-26 table, the double-type tag-value aggregation table, the string-type tag-value aggregation table, and the date-type tag-value aggregation table. This is to say that all the tag values in the int-type tag-value aggregation table are of the int type, and all the tag values in the double-type tag-value aggregation table are of the double type, and so on.
[0053] For example, for checking the access count, the relevant tag values are obviously of the int type, and the conditional operators for crowd package data are count days>= '15' and count days<'30'. In other words, it is to query for the access count generated between the date of 15 and the date of 30.
[0054] For example, for checking the coordinate locations of users, the relevant tag values are obviously of the double type. This type is suitable for queries of coordinate locations, and helps select a member crowd in a certain latitude-longitude coordinate range.
[0055] For example, for checking member levels of users, the relevant tag values are obviously of the string type. This type is favorable to precise query matching.
[0056] For example, for checking access dates of users, the relevant tag values are obviously of the date type. This type is suitable for comparison of dates, such as criteria for crowd packages like a first-for-purchase time later than a certain date.
[0057] To satisfy business needs, in the present embodiment, four types of CH
aggregation tables are introduced, with the following table structures:
Field Type Description labelname String Tag Code labelvalue String/Int64/double/Date Tag Value uv AggregateFunction(groupBitmap, Bitmap generated from unique UInt64) aggregation function numbers of members Date recue / Date received 202 1-1 1-26
[0058] As can be seen, all labelname fields are of the String type. This is usually used to express a tag code. The labelvalue fields may be of the String type, the Int type, the double type, or the Date type, usually for expressing a tag value. uv is usually used to express a tag name.
[0059] With the four types of aggregation tables, data can be acquired from corresponding types of CH aggregation tables according to business needs automatically for subsequent computation, thereby significantly improving computation performance of the system.
[0060] For example, for tags related to the number of days users sleep for, the criteria are described as users sleeping for more than a certain number of days or fewer than a certain number of days, i.e., count days>= '15' and count days<'30'. The aggregation tables of the Int type are correspondingly selected for data acquiring and to support data operators.
[0061] In the foregoing embodiment, after the step of partitioning the CH
table into aggregation tables of different types on the basis of feature types of tag values, the method further comprises: distributing the aggregation tables across nodes of the Clickhouse engine, for performing distributed query.
[0062] In a particular implementation, since the aggregation tables are local tables distributed across nodes of the Clickhouse engine, a distributed table directing to the aggregation tables at the nodes may be used to realize distributed query computation. As demonstrated in tests, when tag data were introduced into a Clickhouse engine from an original table (such as HDFS), the standalone performance was 1 million/sec, showing linear expansion in terms of performance. It was also found through actual measurement that 50 billion data could be introduced into the CH table in one hour. The measurement was performed in a Clickhouse cluster environment, with the following machine configuration: 32 cores, 256G, 4T drive, using a cluster of 6 machines.
Date recue / Date received 202 1-1 1-26
[0063] In the foregoing embodiment, the step of acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses comprises:
[0064] selecting the query tags through the visual interface of the business system to form the sql query criteria, and sending the sql query criteria to kafka; and consuming sql query criteria data, receiving drop tables through SparkStreaming tasks, and converting the sql query criteria into the Clickhouse query clauses.
[0065] In the foregoing embodiment, the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results comprises:
[0066] using the Spark tasks to execute the Clickhouse query clauses parallelly by means of multithreading, while acquiring detail data from the corresponding aggregation tables and storing the detail data into the Hive table, so as to obtain the crowd selection query results through computation.
[0067] In the foregoing embodiment, after the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results, the method further comprises:
[0068] storing the detail data of the Hive table into redis, for the business system to use according to business needs.
[0069] Referring to FIG. 3, for easy understanding, an example is used to explain the scheme as discussed in the foregoing embodiment:
[0070] A user may select query tags through the page of the business system to generate sql query criteria and send them to kafka. SparkStreaming consumes the corresponding kafka data and converts sql into Clickhouse query clauses that are then stored into Mysql. Spark Date recue / Date received 202 1-1 1-26 tasks read Clickhouse query clauses from Mysql. Afterward, by means of multithreading, the Clickhouse query clauses are executed parallelly to query detail data from the CH aggregation tables corresponding to the distributed nodes. The detail data are then stored into a Hive table for the business system to query and use, thereby outputting crowd selection query results.
In addition, the detail data in the hive table may be transferred to redis according to business needs, so as to provide the business system with other query services.
[0071] To sum up, the present embodiment has the following innovations:
1. The present invention uses a concept of Clickhouse + Spark computation to introduce the computation process for criteria of tag combinations into a Clickhouse engine, and use the combination of the Spark compute engine and the Clickhouse compute engine to generate the crowd package, thereby shortening the time required by computation of detail data satisfying the criteria of tag combinations from minutes to seconds.
2. Addition or removal of tags can be visually configured through the back end of the business system, thereby eliminating hassles about complicated operations for adding or removing tags.
3. The present invention leverages the benefits of Clickhouse to solve issues about aggregation performance related to large volume of data, thereby providing better user experience.
4. The present invention saves hardware resources. According to early-stage assessment, Clickhouse only needs half of the hardware computation resources required by elasticsearch. As to storage capacity, for identical data, one Clickhouse machine is equivalent to 27 elasticsearch machines.
Embodiment 2
[0072] The present embodiment provides a Clickhouse-based crowd selection apparatus, which comprises:
[0073] a configuring unit, for configurating tag information to create spark tasks, and Date recue / Date received 202 1-1 1-26 introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;
[0074] a table-partitioning unit, for based on feature types of tag values, partitioning the CH
table into aggregation tables of different types;
[0075] a query unit, for acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and
[0076] a processing unit, for executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.
[0077] Preferably, the table-partitioning unit is for dividing the feature types of the tag values into an int type, a double type, a string type, and a date type, according to the feature types of the tag values in the tag original-table data, partitioning the CH table correspondingly into an it-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table.
[0078] As compared to the prior art, the disclosed apparatus for Clickhouse-based crowd selection provides beneficial effects that are similar to those provided by the method for Clickhouse-based crowd selection as enumerated above, and thus no repetitions are made herein.
Embodiment 3
[0079] The present embodiment provides a computer-readable storage medium, storing thereon a computer program. When the computer program is executed by a processor, it implements the steps of the method for Clickhouse-based crowd selection as described previously.
[0080] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the disclose method as enumerated above, and thus no repetitions are made herein.

Date recue / Date received 202 1-1 1-26
[0081] As will be appreciated by people of ordinary skill in the art, implementation of all or a part of the steps of the method of the present invention as described previously may be realized by having a program instruct related hardware components. The program may be stored in a computer-readable storage medium, and the program is about performing the individual steps of the methods described in the foregoing embodiments. The storage medium may be a ROM/RAM, a magnetic disk, an optical disk, a memory card or the like.
[0082] The present invention has been described with reference to the preferred embodiments and it is understood that the embodiments are not intended to limit the scope of the present invention. Moreover, as the contents disclosed herein should be readily understood and can be implemented by a person skilled in the art, all equivalent changes or modifications which do not depart from the concept of the present invention should be encompassed by the appended claims. Hence, the scope of the present invention shall only be defined by the appended claims.

Date recue / Date received 202 1-1 1-26

Claims (10)

What is claimed is:
1. A method for Clickhouse-based crowd selection, comprising:
configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;
partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values;
acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.
2. The method of claim 1, wherein the step of configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items comprises:
configurating the tag information through a visual interface of a business system, wherein the tag information includes three kinds of configuration information items and the corresponding tag original-table data, wherein the three kinds of configuration information items are tag codes, tag names, and tag values; and automatically creating spark tasks to process the tag original-table data, and introducing the tag Date recue / Date received 2021-11-26 original-table data into the CH table in the Clickhouse engine in formats of tag codes, tag names, and tag values.
3. The method of claim 2, wherein the step of partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values comprises:
dividing the feature types of the tag values into an int type, a double type, a string type, and a date type; and partitioning the CH table correspondingly into an int-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table according to the feature types of the tag values in the tag original-table data.
4. The method of claim 3, wherein after the step of partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values, the method further comprises:
distributing the aggregation tables across nodes of the Clickhouse engine, for performing distributed query.
5. The method of claim 4, wherein the step of acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses comprises:
selecting the query tags through the visual interface of the business system to form the sql query criteria, and sending the sql query criteria to kafka; and consuming sql query criteria data, receiving drop tables through SparkStreaming tasks, and converting the sql query criteria into the Clickhouse query clauses.

Date recue / Date received 2021-11-26
6. The method of any of claims 1 through 5, wherein the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results comprises:
using the Spark tasks to execute the Clickhouse query clauses parallelly by means of multithreading, while acquiring detail data from the corresponding aggregation tables and storing the detail data into the Hive table, so as to obtain the crowd selection query results through computation.
7. The method of claim 1, wherein after the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results, the method further comprises:
storing the detail data of the Hive table into redis, for the business system to use according to business needs.
8. An apparatus for Clickhouse-based crowd selection, comprising:
a configuring unit, for configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;
a table-partitioning unit, for partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values;
a query unit, for acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and Date recue / Date received 2021-11-26 a processing unit, for executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.
9. The apparatus of claim 8, wherein the table-partitioning unit is for dividing the feature types of the tag values into an int type, a double type, a string type, and a date type, and partitioning the CH table correspondingly into an int-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table according to the feature types of the tag values in the tag original-table data.
10. A computer-readable storage medium storing therein a computer program, wherein the computer program when executed by a processor performing a method as described in any of claims 1 through 7.

Date recue / Date received 2021-11-26
CA3140571A 2020-11-26 2021-11-26 Method and apparatus for clickhouse-based crowd selection Pending CA3140571A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011347988.0 2020-11-26
CN202011347988.0A CN112396462B (en) 2020-11-26 2020-11-26 Crowd circling method and device based on click house

Publications (1)

Publication Number Publication Date
CA3140571A1 true CA3140571A1 (en) 2022-05-26

Family

ID=74605283

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3140571A Pending CA3140571A1 (en) 2020-11-26 2021-11-26 Method and apparatus for clickhouse-based crowd selection

Country Status (2)

Country Link
CN (1) CN112396462B (en)
CA (1) CA3140571A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840772A (en) * 2022-11-11 2023-03-24 中电金信软件有限公司 Passenger group data statistical method and device, electronic equipment and storage medium
CN118069680A (en) * 2024-02-02 2024-05-24 天翼爱音乐文化科技有限公司 Data processing method and system, electronic equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590923A (en) * 2021-07-28 2021-11-02 深圳市酷开网络科技股份有限公司 Crowd delineating task splitting method, device, equipment and storage medium
CN113724007A (en) * 2021-08-31 2021-11-30 平安国际智慧城市科技股份有限公司 Patient population selection method, device, equipment and computer-readable storage medium
CN114549042B (en) * 2022-01-10 2022-10-18 北京元年科技股份有限公司 Cost sharing method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462317A (en) * 2014-12-01 2015-03-25 苏州朗米尔照明科技有限公司 Data screening method based on tag
CN109937413B (en) * 2016-07-20 2023-04-21 深圳市东信时代信息技术有限公司 Processing method and system for massive crowd characteristic data
CN110968582B (en) * 2019-11-01 2022-12-30 苏宁云计算有限公司 Crowd generation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840772A (en) * 2022-11-11 2023-03-24 中电金信软件有限公司 Passenger group data statistical method and device, electronic equipment and storage medium
CN118069680A (en) * 2024-02-02 2024-05-24 天翼爱音乐文化科技有限公司 Data processing method and system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112396462B (en) 2022-11-22
CN112396462A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
CA3140571A1 (en) Method and apparatus for clickhouse-based crowd selection
CN103177061B (en) Unique value estimation in partition table
US9251212B2 (en) Profiling in a massive parallel processing environment
CN109597974B (en) Report generation method and device
JP2011054159A (en) System to modify website for organic search optimization
CN106940721B (en) Data processing method and system for user-defined selection box
CN112115113B (en) Data storage system, method, device, equipment and storage medium
CN108021629B (en) Method, device and system for generating advertisement material data
CN112256977A (en) Componentized commodity recommendation method, computing device and computer storage medium
CN112732763A (en) Data aggregation method and device, electronic equipment and medium
CN109033157B (en) Complex data retrieval method and system based on user-defined search condition tree
CN117472965A (en) Database data query method and device, electronic equipment and storage medium
CN110297945B (en) Data information processing method and system based on XBRL
US20190340179A1 (en) Result set output criteria
CN108874856B (en) Page generation system, method and server
CN112948441B (en) Multi-dimensional data collection method and equipment for financial data
CN108197164A (en) Business data storage method and device
CN114840551A (en) Database table processing method and device, electronic equipment and storage medium
CN114201157A (en) Method and system for customizing target service module by low code
CN113836379A (en) Intelligent recommendation method and system based on customer image
CN112783633A (en) Data updating system and method based on resource mutual exclusion scheduling model
CN111427922A (en) Data analysis method, system, device and storage medium based on distributed architecture
CN112784195A (en) Page data publishing method and system
CN113485799B (en) Method, device, equipment and storage medium for executing same timing task by multiple libraries
US20080120265A1 (en) System and method for processing data elements

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916