CN116257571A - Index statistics method, device and index statistics system for Clickhouse - Google Patents

Index statistics method, device and index statistics system for Clickhouse Download PDF

Info

Publication number
CN116257571A
CN116257571A CN202310201570.6A CN202310201570A CN116257571A CN 116257571 A CN116257571 A CN 116257571A CN 202310201570 A CN202310201570 A CN 202310201570A CN 116257571 A CN116257571 A CN 116257571A
Authority
CN
China
Prior art keywords
data
statistics
index
counted
clickhouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310201570.6A
Other languages
Chinese (zh)
Inventor
金鑫
林毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202310201570.6A priority Critical patent/CN116257571A/en
Publication of CN116257571A publication Critical patent/CN116257571A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to an index statistical method, an index statistical device and an index statistical system for Clickhouse, wherein the index statistical method for Clickhouse comprises the following steps: based on a pre-established window view, carrying out aggregation statistics on the acquired data to be counted in a preset time interval to obtain an index statistical result; and writing the index statistical result into a preset data table for storage. According to the method, index statistics calculation and result storage can be realized based on Clickhouse, other middleware is not required to be introduced, the complexity of a system can be further reduced, and the maintenance cost and the deployment cost of the middleware are reduced.

Description

Index statistics method, device and index statistics system for Clickhouse
Technical Field
The present application relates to the field of data statistics, and in particular, to an index statistics method, apparatus, and index statistics system for Clickhouse.
Background
Currently, index statistical computation for mass data is usually implemented based on a Flink stream computation in combination with an elastic search. After operations such as extraction, conversion, loading and the like are performed on real-time stream data through the link, the statistical result is written into a message queue, and then the ETL data acquisition tool acquires the statistical result from the message queue and stores the statistical result into a non-relational database such as Elasticsearch, hive. The method needs to combine various middleware to complete index statistics calculation, so that the system complexity is high, and the maintenance cost and the deployment cost of the middleware are high.
Aiming at the problems of high complexity of a required system and high maintenance cost and deployment cost of the middleware caused by the fact that index statistics calculation exists in the related technology, an effective solution is not proposed at present.
Disclosure of Invention
The embodiment provides an index statistical method, an index statistical device and an index statistical system for Clickhouse, which are used for solving the problems that in the related technology, the complexity of a required system is high, and the maintenance cost and the deployment cost of the caused middleware are high.
In a first aspect, in this embodiment, there is provided an index statistics method for Clickhouse, including:
based on a pre-established window view, carrying out aggregation statistics on the acquired data to be counted in a preset time interval to obtain an index statistical result;
and writing the index statistical result into a preset data table for storage.
In some embodiments, before performing aggregation statistics on the acquired data to be counted in a preset time interval based on a pre-established window view to obtain an index statistical result, the method further includes:
collecting the data to be counted from a message queue based on a first data collecting tool; the first data acquisition tool is based on an ETL technology.
In some embodiments, the collecting the data to be counted from the message queue based on the first data collecting tool includes:
and acquiring the data to be counted from the message queue through the first data acquisition tool based on preset configuration information.
In some of these embodiments, before collecting the data to be counted from the message queue based on the first data collection means, the method further comprises:
and transmitting the data to be counted in the preset data source to the message queue through a second data acquisition tool at the front end.
In some embodiments, the performing aggregate statistics on the obtained data to be counted in the preset time interval based on the pre-established window view to obtain an index statistical result includes:
and carrying out aggregation statistics on the acquired data to be counted in a preset time interval based on calculation logic in a pre-established window view to obtain an index statistical result.
In some embodiments, the performing aggregate statistics on the obtained data to be counted in a preset time interval based on calculation logic in a pre-established window view to obtain an index statistical result includes:
and based on calculation logic in a pre-established window view, carrying out data screening on the data to be counted, and carrying out aggregation statistics on the screened data to be counted in a preset time interval to obtain the index statistical result.
In a second aspect, in this embodiment, there is provided an index statistics apparatus for Clickhouse, including: a statistics module and a storage module; wherein:
the statistics module is used for carrying out aggregation statistics on the acquired data to be counted in a preset time interval based on a window view established in advance to obtain an index statistics result;
the storage module is used for writing the index statistical result into a preset data table for storage.
In a third aspect, in this embodiment, there is provided an index statistics system, including: a terminal device and a server device; the terminal equipment is in communication connection with the server equipment;
the terminal equipment is used for initiating an index statistics request;
the server device is configured to perform the index statistics method for Clickhouse described in the first aspect above.
In a fourth aspect, in this embodiment, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the index statistics method for Clickhouse described in the first aspect.
In a fifth aspect, in this embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the index statistics method for Clickhouse described in the first aspect above.
Compared with the related art, the index statistical method, the index statistical device and the index statistical system for the Clickhouse provided by the embodiment are used for carrying out aggregation statistics on the acquired data to be counted in a preset time interval based on a window view established in advance to obtain an index statistical result; and writing the index statistical result into a preset data table for storage. According to the method, index statistics calculation and result storage can be realized based on Clickhouse, other middleware is not required to be introduced, the complexity of a system can be further reduced, and the maintenance cost and the deployment cost of the middleware are reduced.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
fig. 1 is a hardware configuration block diagram of a terminal of the index statistical method for click house of the present embodiment;
FIG. 2 is a flow chart of the index statistics method for Clickhouse of the present embodiment;
FIG. 3a is a schematic diagram of a process of index statistics calculation in the related art;
FIG. 3b is a schematic diagram showing the process of index statistics calculation in the present embodiment;
FIG. 4 is a flow chart of the index statistics method of the preferred embodiment;
FIG. 5 is a block diagram showing the constitution of an index statistic apparatus for Clickhouse of the present embodiment;
fig. 6 is a schematic structural diagram of the index statistics system of the present embodiment.
Detailed Description
For a clearer understanding of the objects, technical solutions and advantages of the present application, the present application is described and illustrated below with reference to the accompanying drawings and examples.
Unless defined otherwise, technical or scientific terms used herein shall have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these," and the like in this application are not intended to be limiting in number, but rather are singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used in the present application, are intended to cover a non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this application, merely distinguish similar objects and do not represent a particular ordering of objects.
The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or similar computing device. For example, running on a terminal, fig. 1 is a block diagram of the hardware structure of the terminal for the index statistics method for Clickhouse of the present embodiment. As shown in fig. 1, the terminal may include one or more (only one is shown in fig. 1) processors 102 and a memory 104 for storing data, wherein the processors 102 may include, but are not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like. The terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the index statistics method for click house in the present embodiment, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-described method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The network includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, there is provided an index statistical method for a Clickhouse, and fig. 2 is a flowchart of the index statistical method for a Clickhouse of this embodiment, as shown in fig. 2, the flowchart includes the steps of:
step S210, based on a pre-established window view, aggregation statistics is carried out on the acquired data to be counted in a preset time interval, and an index statistical result is obtained.
The window view refers to a view of data in a time period of a real-time statistics positive point based on a set window time field in the click house. For example, if a field collector ReceiptTime in the data to be counted, which represents the collector receiving time, is set as a window time field, the window view will perform data statistics at preset time intervals based on the field. For another example, if the window time is set to 5 minutes, statistics are made every 5 minutes at positive points, i.e., statistics time is 5 minutes, 10 minutes, 15 minutes, and so on. For example, if the current time is 5 points 12 minutes and the window time is 5 minutes, the next time statistics is 5 points 15 minutes, and the 5 points 15 minutes statistics are 5 points 10 minutes to 5 points 15 minutes. In addition, clickhouse is an open-source columnar store database for online analytical processing queries that is capable of generating analytical data reports in real-time using structured query language (Structured Query Language, abbreviated as SQL) queries. The data to be counted is real-time data, such as flow original data, which needs index statistics.
The window view may be created by writing SQL statements used by Clickhouses, and for statistical requirements, window calculation statements are written to implement the window view's calculation function. For example, a window view may be created and window computation statements written by the following SQL statement to fulfill the need to compute Web access egress traffic:
CREATE WINDOW VIEW wv_log_webSumBytesOut
to`ailpha_statistics_all`
AS
select now()as`@TIMESTAMP`,
TUMBLE_END(w_id)as ENDTIME,'info:72.118.1.14,1.datanode2'as ENGINEINFO,”as ETL_ENGINEINFO,0as EVENTCOUNT,
bitOr(dateTimeToSnowflake(now()),CRC32(concat(destHostName,SOURCEFROM)))as EVENTID,'36a2af98dcc95946'as filterHash,0as FROMCUSTOMSTRATEGY,
”as HTTPVERSION,”as LOGTYPE,”as MAILTITLE,
'webSumBytesOut'as METRICID,['webSumBytesOut']as METRICIDS,
concat('destHostName:',destHostName,'_##_')as METRICKEY,”as MODELNAME,'metric'as MODELTYPE,
”as NAME,0as NEEDRELOADREDIS,now()as OCCURREDTIME,now()as OCCURRENCETIME,toString(now())as OCCURRENCETIMESTR,
,TUMBLE_START(w_id)as STARTTIME,
sum(bytesOut)as stat_sum_bytesOut_webSumBytesOut,
0as ISVECTOR,
[”]as ATTACKNAME,
1as ISWINDOWVIEW,
”as APPPROTOCOL,
'log'as SOURCEFROM
from`ailpha-baas-log`
where appProtocol in('http','https')
group by destHostName,
TUMBLE(`collectorReceiptTime`,INTERVAL'1'MINUTE)as w_id
order by STARTTIME desc
limit 2000;
wherein, in the SQL sentence, "CREATE WINDOW VIEW wv _log_webSumB ytestout" means that a window view with a view name of "wv_log_webSumBytestout" is created; additionally, "to" alpha_statistics_all "means that the data of the window view is written into a table named" alpha_statistics_all "of the Clickhouse database; the rest is the specific calculation logic of the window view, and is the aggregated query statement. Still further, the following portions of the SQL statement:
CREATE WINDOW VIEW wv_log_webSumBytesOut
to`ailpha_statistics_all`
AS select...
from`ailpha-baas-log`;
after data satisfying the set condition is searched from a table named "ailpha-bias-log", the data is written into the "ailpha_statistics_all" table. In addition, after the data meeting the set condition is queried, the above parts are sorted according to the target domain name group and the starting time in the reverse order, and the first 2000 pieces of data are fetched. The set condition is that an Application (app) protocol is http or https data. Further, in the above example, the calculation is performed once by setting the collector reception time to the window time field and specifying the window time to be 1 minute. Further, the SQL statement "sum (byte estout) as stat_sum_byte estout_websubbestout" in the SQL statement means that after data meeting the set condition is screened from the data to be counted, summation calculation is carried out on the field of "byte estout" in the data every 1 minute, so that an index counting result is obtained.
Compared with the method for calculating the index based on the Flink in the related art, the method for calculating the index based on the Flink uses Java or Scala programming language with higher learning cost to achieve index statistics requirements, and the method can meet preset index statistics requirements by using SQL language with stronger semantic expression capability and lower learning cost. As a Domain-specific language (Domain-specific language, abbreviated as DSL), the SQL language has a self-description capability and can be widely used in development, operation and maintenance and test stages, so that the window view capability of the click house is used to implement index statistics, and compared with the application based on the link technology, the SQL language is easier to popularize in practical application, and is lighter and lower in floor cost.
Step S220, writing the index statistical result into a preset data table for storage.
The preset data table is also a table built in the Clickhouse. Therefore, the present embodiment can store the calculated data directly in the Clickhouse without using other middleware after calculating the index data in real time by using the window view of the Clickhouse. Therefore, compared with the scheme that the calculated data is obtained by performing real-time stream calculation based on the Flink and is stored in the non-relational database such as the elastic search and the like in the related technology, the method and the device can reduce maintenance cost and deployment cost for middleware and reduce complexity of a system.
In particular, the present embodiment may be based on a lightweight data collection tool, such as an Extract-Transform-Load (ETL) tool, a Vector, to send the data to be counted from the message queue to the Clickhouse. The ETL is a process of extracting, converting, and loading data from a source end to a destination end.
FIG. 3a is a schematic diagram illustrating a process of index statistics calculation in the related art; fig. 3b is a schematic diagram illustrating a process of calculating the index statistics in this embodiment. As can be seen from fig. 3a, in the related art, the front-end probe is used to send the original data of the traffic to a distributed message queue such as Kafka, for example, the probe 1 is used to send the original data of the traffic in the device 1 and the device 2 to the topic 1 in the Kafka message queue, and the probe 2 is used to send the original data of the traffic in the device 2 to the topic 2 in the message queue. Then, index statistics is carried out on the data in the theme 1 by the Flink based on the rule model 1 respectively, and the calculation result is written into the theme 11 of Kafka; index statistics is performed on the data in the topic 2 based on the rule model 2, and the calculation result is written into the topic 12 of Kafka. Thereafter, data from different topics of Kafka are stored on the elastic search based on data collection tools, such as logstash, flink. For example, the calculation result in the topic 11 is stored in the index 1 on the elastiscearch by the processing module 1, and the calculation result in the topic 12 is stored in the index 2 on the elastiscearch by the processing module 2.
As can be seen from fig. 3b, the process of acquiring the data to be counted from the different devices of the data source to the different subjects of kafka based on the front-end data acquisition tool probe in this embodiment is similar to that of fig. 3a, and will not be repeated here. In fig. 3b, the difference between the present embodiment and the related art is that the present embodiment uses a Vector to send data of different subjects on Kafka to different window views of the Clickhouse to perform window calculation, and then writes the data into different target tables of the Clickhouse. For example, using Vector to send the data in the theme 1 to the window view 1 of the click house via the processing module 1 to perform index statistics, and writing the result into the target table 1; the data in the theme 2 is sent to the window view 2 of the Clickhouse through the processing module 2 by using the Vector for index statistics, and the result is written into the target table 2. Based on this, in conjunction with fig. 3a and 3b, it is apparent that the present embodiment omits maintenance and deployment of the middleware Flink, elasticsearch, and can implement the index statistics requirement with lower system complexity and lower maintenance and deployment costs.
Step S210 to step S220 are carried out aggregation statistics on the acquired data to be counted in a preset time interval based on a pre-established window view to obtain an index statistical result; and writing the index statistical result into a preset data table for storage. According to the method, index statistics calculation and result storage can be realized based on Clickhouse, other middleware is not required to be introduced, the complexity of a system can be further reduced, and the maintenance cost and the deployment cost of the middleware are reduced.
Additionally, in one embodiment, before performing aggregation statistics on the acquired data to be counted in a preset time interval based on a pre-established window view to obtain an index statistical result, the method may further include the following steps:
step S230, collecting data to be counted from a message queue based on a first data collecting tool; the first data acquisition tool is based on ETL technology. Wherein the message queue may be Kafka and the first data collection tool may be a lightweight ETL tool Vector. In this embodiment, the Vector obtains the data to be counted from Kafka and sends the data to the Clickhouse to perform index statistics processing, so that the complexity of the system can be reduced.
Additionally, in one embodiment, based on the step S230, the collecting the data to be counted from the message queue based on the first data collecting tool may specifically include: based on preset configuration information, collecting data to be counted from a message queue through a first data collection tool.
The configuration information is used for indicating the type of the database to which the data needs to be sent, the configuration of the input data source, the name of the connected database, the end point of the connection, the table to which the data needs to be output, and the like. In this embodiment, the Vector is used, and data of a certain topic in Kafka is cleaned and converted by a configuration mode and then stored in a certain table of the click house. Illustratively, the configuration information may include the following:
[sinks.ch_event_sink]
type="clickhouse"
inputs=["event_side_out.event"]
database="dbapp"
endpoint="http://1.clickhouse1:8123"
table="ailpha-baas-event"
[sinks.ch_event_alarm_sink]
type="clickhouse"
inputs=["event_side_out.alarm"]
database="dbapp"
endpoint="http://1.clickhouse1:8123"
table="ailpha-baas-alarm";
wherein, the type in the configuration information represents the type of the database, and the type of the database in the above example is Clickhouse; inputs represent the input data source configuration, i.e., the data source configuration information is "event_side_out_alarm"; database represents the name of the connected database, and the connected database is "abapp" in the example; an endpoint represents an endpoint of a connection; the table indicates to which table of clickhouses the data of the data source indicated by the inputs needs to be output.
Additionally, in one embodiment, before collecting the data to be counted from the message queue based on the first data collecting tool, the method may further include the steps of:
and step S240, transmitting the data to be counted in the preset data source to the message queue through a second data acquisition tool at the front end. The second data acquisition tool may be a front-end probe, which is configured to send original data of a preset data source, that is, the data to be counted, to a certain topic of the Kafka distributed message queue, so as to realize real-time accurate transmission of the data.
In addition, in one embodiment, based on the step S210, based on the window view established in advance, aggregate statistics is performed on the acquired data to be counted in a preset time interval to obtain an index statistical result, which specifically may include the following steps:
step S211, based on the calculation logic in the window view established in advance, carrying out aggregation statistics on the acquired data to be counted in a preset time interval to obtain an index statistical result.
Further, in one embodiment, based on the step S211, based on the calculation logic in the pre-established window view, aggregate statistics is performed on the obtained data to be counted in the preset time interval to obtain an index statistical result, which may specifically include: based on the calculation logic in the window view established in advance, data screening is carried out on the data to be counted, and aggregation statistics is carried out on the screened data to be counted in a preset time interval, so that an index statistical result is obtained. According to the embodiment, based on the window view of the Clickhouse, the SQL statement is utilized to meet specific statistical requirements, so that the learning cost of index statistics can be reduced, the light weight of the index statistics is realized, the complexity of a system is reduced, and the cost for maintaining and deploying middleware is reduced.
Specifically, the to-be-counted data may be screened according to a condition set in the calculation logic, for example, the to-be-counted data is original flow data, the Web access outlet flow needs to be calculated, the set condition is that data with an app protocol of http or https is screened from the to-be-counted data, grouping is performed based on a target domain name, and the first 2000 pieces of data are acquired after reverse ordering according to the starting time, so as to obtain screened to-be-counted data. Then, statistics are performed on the aggregate process in the calculation logic, such as summing a certain field.
The present embodiment is described and illustrated below by way of preferred embodiments.
Fig. 4 is a flowchart of the index statistics method of the present preferred embodiment. As shown in fig. 4, the index statistical method includes the following steps:
step S401, sending the flow original data to a preset theme of a Kafka distributed message queue through a front-end probe;
step S402, transmitting the flow original data to Clickhouse based on a lightweight ETL tool Vector;
step S403, through a configuration mode, the flow original data of the Kafka theme is stored into a preset table of the Clickhouse after data cleaning and conversion by a Vector;
step S404, according to specific business calculation rules determined by preset requirements, the Clickhouse calculates Web access outlet flow of the data to be counted to obtain index statistical results;
step S405, based on the grammar in the window view, stores the index statistics into a pre-specified table of Clickhouse.
In this embodiment, an indicator statistics device for click house is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. The terms "module," "unit," "sub-unit," and the like as used below may refer to a combination of software and/or hardware that performs a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.
Fig. 5 is a block diagram of the constitution of the index statistic apparatus 50 for click house of the present embodiment, and as shown in fig. 5, the index statistic apparatus 50 for click house includes: a statistics module 52 and a storage module 54; wherein: the statistics module 52 is configured to perform aggregate statistics on the obtained data to be counted in a preset time interval based on a window view established in advance, so as to obtain an index statistics result; the storage module 54 is configured to write the index statistics result into a preset data table for storage.
The index statistics device 50 for the Clickhouse can realize calculation and result storage of index statistics based on the Clickhouse, does not need to introduce other middleware, and further can reduce the complexity of the system and reduce the maintenance cost and the deployment cost of the middleware.
The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.
An index statistics system 60 is also provided in this embodiment. Fig. 6 is a schematic diagram of the index statistics system 60 according to the present embodiment. As shown in fig. 6, the index statistics system 60 includes: a terminal device 62 and a server device 64; wherein the terminal device 62 is communicatively connected to the server device 64; the terminal device 62 is used for initiating an index statistics request; the server device 64 is configured to perform the metrics statistics method for Clickhouse provided by any of the embodiments described above.
There is also provided in this embodiment an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, carrying out aggregation statistics on acquired data to be counted in a preset time interval based on a pre-established window view to obtain an index statistical result;
s2, writing the index statistical result into a preset data table for storage.
It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and are not described in detail in this embodiment.
In addition, in combination with the index statistical method for Clickhouse provided in the above embodiment, a storage medium may be provided to implement this embodiment. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements any of the index statistics methods of the above embodiments for Clickhouse.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are within the scope of the present application in light of the embodiments provided herein.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It is evident that the drawings are only examples or embodiments of the present application, from which the present application can also be adapted to other similar situations by a person skilled in the art without the inventive effort. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as an admission of insufficient detail.
The term "embodiment" in this application means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. An index statistical method for click house, comprising:
based on a pre-established window view, carrying out aggregation statistics on the acquired data to be counted in a preset time interval to obtain an index statistical result;
and writing the index statistical result into a preset data table for storage.
2. The index statistical method for Clickhouse according to claim 1, wherein before performing aggregation statistics for a preset time interval on the acquired data to be counted based on a pre-established window view, the method further comprises:
collecting the data to be counted from a message queue based on a first data collecting tool; the first data acquisition tool is based on an ETL technology.
3. The method for index statistics for clickhouses according to claim 2, wherein the first data collection means collects the data to be counted from a message queue, comprising:
and acquiring the data to be counted from the message queue through the first data acquisition tool based on preset configuration information.
4. The method for index statistics for a Clickhouse according to claim 2, wherein prior to collecting the data to be counted from a message queue based on a first data collection tool, the method further comprises:
and transmitting the data to be counted in the preset data source to the message queue through a second data acquisition tool at the front end.
5. The index statistics method for click house according to any one of claims 1 to 4, wherein the performing aggregate statistics on the obtained data to be counted in a preset time interval based on a pre-established window view to obtain an index statistics result includes:
and carrying out aggregation statistics on the acquired data to be counted in a preset time interval based on calculation logic in a pre-established window view to obtain an index statistical result.
6. The method for click house according to claim 5, wherein the performing aggregate statistics on the obtained data to be counted in a preset time interval based on the calculation logic in the pre-established window view to obtain an index statistical result includes:
and based on calculation logic in a pre-established window view, carrying out data screening on the data to be counted, and carrying out aggregation statistics on the screened data to be counted in a preset time interval to obtain the index statistical result.
7. An index statistics apparatus for click house, comprising: a statistics module and a storage module; wherein:
the statistics module is used for carrying out aggregation statistics on the acquired data to be counted in a preset time interval based on a window view established in advance to obtain an index statistics result;
the storage module is used for writing the index statistical result into a preset data table for storage.
8. An index statistical system, comprising: a terminal device and a server device; the terminal equipment is in communication connection with the server equipment;
the terminal equipment is used for initiating an index statistics request;
the server device is configured to perform the index statistics method for Clickhouse of any one of claims 1 to 6.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the index statistical method for clickhouses of any one of claims 1 to 6.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the index statistics method for Clickhouse of any of claims 1 to 6.
CN202310201570.6A 2023-02-28 2023-02-28 Index statistics method, device and index statistics system for Clickhouse Pending CN116257571A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310201570.6A CN116257571A (en) 2023-02-28 2023-02-28 Index statistics method, device and index statistics system for Clickhouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310201570.6A CN116257571A (en) 2023-02-28 2023-02-28 Index statistics method, device and index statistics system for Clickhouse

Publications (1)

Publication Number Publication Date
CN116257571A true CN116257571A (en) 2023-06-13

Family

ID=86679020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310201570.6A Pending CN116257571A (en) 2023-02-28 2023-02-28 Index statistics method, device and index statistics system for Clickhouse

Country Status (1)

Country Link
CN (1) CN116257571A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116629805A (en) * 2023-06-07 2023-08-22 浪潮智慧科技有限公司 Water conservancy index service method, equipment and medium for distributed flow batch integration

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116629805A (en) * 2023-06-07 2023-08-22 浪潮智慧科技有限公司 Water conservancy index service method, equipment and medium for distributed flow batch integration
CN116629805B (en) * 2023-06-07 2023-12-01 浪潮智慧科技有限公司 Water conservancy index service method, equipment and medium for distributed flow batch integration

Similar Documents

Publication Publication Date Title
CN106250424B (en) A kind of searching method, the apparatus and system of log context
US9529895B2 (en) Method and system for discovering dynamic relations among entities
CN105786993B (en) Application program function plug-in recommendation method and device
CN108073625B (en) System and method for metadata information management
CN104809130B (en) Method, equipment and the system of data query
US20190197140A1 (en) Automation of sql tuning method and system using statistic sql pattern analysis
US20200042424A1 (en) Method, apparatus and system for processing log data
CN106941493A (en) A kind of network security situation awareness result output intent and device
US9641405B2 (en) System and method for sequencing per-hop data in performance-monitored network environments
CN109033188A (en) A kind of metadata acquisition method, apparatus, server and computer-readable medium
CN112732663A (en) Log information processing method and device
CN105574032A (en) Rule matching operation method and device
CN116257571A (en) Index statistics method, device and index statistics system for Clickhouse
CN111694793A (en) Log storage method and device and log query method and device
CN114817389A (en) Data processing method, data processing device, storage medium and electronic equipment
CN106648722A (en) Flume receiving side data processing method and device based on big data
CN105095228A (en) Method and apparatus for monitoring social information
US9338255B1 (en) System and method for correlating end-user experience data and backend-performance data
CN112000866B (en) Internet data analysis methods, devices, electronic devices and media
CN112507265A (en) Method and device for anomaly detection based on tree structure and related products
CN112181929A (en) Cloud management platform log processing method and device, electronic device and storage medium
CN107526808B (en) Real-time data processing method and device
CN115757570A (en) Log data analysis method and device, electronic equipment and medium
CN112612673B (en) Analysis method and device for dial testing log, storage medium and electronic device
CN113434612A (en) Data statistical method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination