CN112269802A - Method and system for frequent deletion, modification and check optimization based on Clickhouse - Google Patents

Method and system for frequent deletion, modification and check optimization based on Clickhouse Download PDF

Info

Publication number
CN112269802A
CN112269802A CN202011111740.4A CN202011111740A CN112269802A CN 112269802 A CN112269802 A CN 112269802A CN 202011111740 A CN202011111740 A CN 202011111740A CN 112269802 A CN112269802 A CN 112269802A
Authority
CN
China
Prior art keywords
data
cluster
mongodb
clickhouse
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011111740.4A
Other languages
Chinese (zh)
Inventor
周小润
刘跃红
管正爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yinsheng Payment Service Co Ltd
Original Assignee
Yinsheng Payment Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yinsheng Payment Service Co Ltd filed Critical Yinsheng Payment Service Co Ltd
Priority to CN202011111740.4A priority Critical patent/CN112269802A/en
Publication of CN112269802A publication Critical patent/CN112269802A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The invention discloses a method and a system for frequent deletion, modification and check optimization based on Clickhouse, relating to the technical field of data processing; the method comprises the following steps: s1, synchronizing data, analyzing and storing the real-time operation log data in the synchronous original database to a kafka cluster and mongodb; s2, consumption of running data, wherein the Application end Application is respectively connected with the kafka cluster, the mongodb cluster and the clickhouse cluster, when the Application end Application consumes the running data, the Application end Application firstly queries the mongodb data, and if the query is not available, the Application end Application queries the clickhouse cluster data; s3, operation on the query data; s4, timing synchronization of incremental data; the invention has the beneficial effects that: the method can support the operations of quick real-time insertion and update, and support high-concurrency query aiming at real-time data.

Description

Method and system for frequent deletion, modification and check optimization based on Clickhouse
Technical Field
The invention relates to the technical field of big data processing, in particular to a method and a system for frequent deletion, modification and check optimization based on Clickhouse.
Background
At present, big data are widely applied to the fields of finance, electronic commerce, logistics, enterprise government affairs and the like, and are applied to the industrial internet in recent years, and with the acceleration of the internet process of everything in the 5G era, the storage of mass data, the real-time query of mass data and the OLAP analysis bring challenges. Clickhouse is being widely applied to real-time processing and OLAP online analysis with its outstanding query and write performance, some systems can store values of individual columns individually, but cannot process analysis queries efficiently due to optimization of other scenarios, such as HBase, BigTable, Cassandra and HyperTable. In these systems, throughput of about one hundred thousand lines per second can be achieved, but not as high as hundreds of millions of lines per second as in Clickhouse.
Clickhouse was developed by yandex. Has been used continuously for many years as a core component of the system. To date, the system has over 13 trillion records in a clickwouse, and over 200 billions of events are processed per day. It allows dynamic queries and generation of reports directly from the raw data. The method is widely applied to scenes such as real-time reports, online real-time analysis, user images, intelligent recommendation systems and the like. However, there are inherent disadvantages in the use process of Clickhouse, which face some technical points to be solved:
1. transactions are not supported, true deletion/updating is not supported;
2. high concurrency is not supported, official recommendations qps of 100, the number of connections can be increased by modifying the configuration file, but in case the server is good enough;
3. the method is not suitable for real-time insertion or small-batch insert, update and delete operations, because the ClickHouse bottom layer can continuously carry out asynchronous data combination, the query performance can be influenced, and the method needs to be avoided as much as possible when real-time data writing is carried out.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method and a system for frequently deleting, modifying and checking optimization based on Clickhouse, which can support the operations of quick real-time insertion and updating and support high-concurrency query aiming at real-time data.
The technical scheme adopted by the invention for solving the technical problems is as follows: in a method for frequent censorship optimization based on Clickhouse, the improvement comprising the steps of:
s1, synchronizing data, analyzing and storing the real-time operation log data in the synchronous original database to a kafka cluster and mongodb;
s2, consumption of running data, wherein the Application end Application is respectively connected with the kafka cluster, the mongodb cluster and the clickhouse cluster, when the Application end Application consumes the running data, the Application end Application firstly queries the mongodb data, and if the query is not available, the Application end Application queries the clickhouse cluster data;
s3, operating the query data, and directly operating mongodb when the Application end queries the data in mongodb and updates or deletes the data; when the Application end does not inquire data in mongodb and inquires data in the clickhouse cluster, writing the inquiry records of the Application back to mongodb for inquiry;
and S4, synchronizing the timing of the incremental data, wherein mongodb is connected with a clickhouse cluster, and mongodb synchronizes the incremental data to the clickhouse cluster.
Further, in step S1, the original database is an Oracle database or Mysql database.
Further, in step S1, data synchronization is implemented by using OGG or Canal.
Further, in step S2, when the Application queries data of the last three days, the query is sent to mongodb.
Further, in step S2, when the Application queries the history data, the Application goes to the clickhouse cluster for querying.
Further, in step S4, the incremental data is synchronized to the clickhouse cluster by timing, so as to ensure the integrity of the clickhouse cluster data.
On the other hand, the invention also discloses a system for frequently deleting, modifying and checking optimization based on Clickhouse, which is improved in that the system comprises an original database, a kafka cluster, mongodb, Application at an Application end and a Clickhouse cluster;
the original database is used for storing real-time operation log data;
the kafka cluster and the mongodb are respectively connected with the original database, and the real-time operation log data in the original database are analyzed and stored in the kafka cluster and the mongodb;
the Application end is respectively connected with the kafka cluster, the mongodb cluster and the clickhouse cluster, and is used for realizing data query on the mongodb cluster and data query on the clickhouse cluster;
the clickhouse cluster is connected with the mongodb, and the mongodb synchronizes incremental data to the clickhouse cluster in a timing mode.
In the above structure, the original database is an Oracle database or a Mysql database.
In the above structure, the Application queries mongodb's data as data for the last three days of operation.
In the above structure, the Application queries the data of the clickhouse cluster as historical data.
The invention has the beneficial effects that: the method and the system for frequent deletion, modification and check optimization based on the Clickhouse can support rapid real-time insertion and update operations, support high-concurrency query aiming at real-time data, rapidly respond to query requests and achieve millisecond-level response.
Drawings
Fig. 1 is a schematic flow chart of a method for frequent censoring and searching optimization based on Clickhouse in the present invention.
Fig. 2 is a schematic diagram of a system for frequent censoring optimization based on Clickhouse according to the present invention.
Fig. 3 is a schematic flow chart of incremental data synchronization based on the Clickhouse frequent censoring optimization method of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The conception, the specific structure, and the technical effects produced by the present invention will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the features, and the effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention. In addition, all the connection/connection relations referred to in the patent do not mean that the components are directly connected, but mean that a better connection structure can be formed by adding or reducing connection auxiliary components according to specific implementation conditions. All technical characteristics in the invention can be interactively combined on the premise of not conflicting with each other.
Referring to fig. 1, the invention discloses a method for frequent censorship optimization based on Clickhouse, and specifically, the method comprises the following steps:
s1, synchronizing data, analyzing and storing the real-time operation log data in the synchronous original database to a kafka cluster and mongodb;
in this embodiment, in step S1, the original database is an Oracle database or Mysql database; and, adopt OGG or Canal to realize the synchronization of the data;
s2, consumption of running data, wherein the Application end Application is respectively connected with the kafka cluster, the mongodb cluster and the clickhouse cluster, when the Application end Application consumes the running data, the Application end Application firstly queries the mongodb data, and if the query is not available, the Application end Application queries the clickhouse cluster data;
in the embodiment, when the Application terminal queries data of the last three days, the Application terminal queries the mongdb; when the Application at the Application end inquires the historical data, inquiring in the clickhouse cluster; it should be noted that the historical data may include the data of the last three days, or may not include the data of the last three days, and may be determined according to actual conditions, and in the case of relatively high real-time performance, the historical data is generally the data including the data of the last three days.
S3, operating the query data, and directly operating mongodb when the Application end queries the data in mongodb and updates or deletes the data; when the Application end does not inquire data in mongodb and inquires data in the clickhouse cluster, writing the inquiry records of the Application back to mongodb for inquiry;
and S4, synchronizing the timing of the incremental data, wherein mongodb is connected with a clickhouse cluster, and mongodb synchronizes the incremental data to the clickhouse cluster.
And the integrity of the clickhouse cluster data is ensured by timing and synchronizing the incremental data to the clickhouse cluster.
In the present embodiment, as shown in fig. 3, for the timing synchronization of the incremental data in step S4, the present invention provides a specific embodiment, wherein the incremental data is first imported into the temporary table "table name _ tmp
Confirming whether the polling is finished or not, writing a formal table into a temporary table if the polling is finished, then confirming whether the polling is finished or not, naming the formal table as a table name _ tmp _ tmp and naming the temporary table as the table name _ tmp if the polling is finished
And renaming to a formal table and deleting a table name _ tmp _ tmp to finish the synchronization of the incremental data.
Referring to fig. 2, the invention further discloses a system for frequent pruning, searching and optimizing based on Clickhouse, which comprises an original database, a kafka cluster 10, a mongodb20, an Application30 and a Clickhouse cluster 40;
the original database is used for storing real-time operation log data; in this embodiment, the original database is an Oracle database or a Mysql database;
the kafka cluster 10 and the mongodb20 are respectively connected with the original database, and the real-time operation log data in the original database are analyzed and stored in the kafka cluster 10 and the mongodb 20;
the Application30 is respectively connected with the kafka cluster 10, the mongodb20 and the clickhouse cluster 40, and the Application30 is used for realizing data query on the mongodb20 and data query on the clickhouse cluster 40; referring to fig. 3, the Application30 queries mongodb20 data as data of the last three days of operation; the Application30 queries the data of the clickhouse cluster 40 as historical data;
the clickhouse cluster 40 is connected with mongodb20, and mongodb20 synchronizes incremental data to the clickhouse cluster 40 in a timing mode.
The steps of the business process are disclosed by the method and the system for frequently deleting, modifying, checking and optimizing based on the Clickhouse. Since the client frequently performs high-frequency operations of adding and deleting when performing real-time query, but the Clickhouse cluster 40 is not friendly to the deleting operation and affects performance greatly, and even if the client inserts a single high-frequency insert, the performance of the Clickhouse cluster 40 is also affected seriously, such operations should be avoided and processed in batches, and since real-time data needs to have a place to record the operations and needs to meet the requirement of real-time performance, the performance of the small data amount (three-day data) mongodb is excellent; on one hand, real-time processing can be achieved, and performance of the Clickhouse cluster 40 is not affected.
How the invention solves the problem that the Clickhouse cluster 40 is not suitable for the high concurrency scene is explained below, since most of data needing to be operated by most of real-time applications are hot data of about three days in the last day, mongodb is introduced, and since the mongodb has better support for the concurrent query and the small data volume supports millisecond-level response, the problem can be perfectly solved.
Further, we describe in detail how the system of the present invention can achieve high concurrency millisecond level responses, as described above, most requests will be processed by mongodb20, which is millisecond level; in addition, the remaining small part of requests are sent to the Clickhouse cluster 40, at this time, the concurrent queries are not very high, and the Clickhouse cluster 40 also responds to the queries in about a few milliseconds, so that the millisecond-level response of the whole system is ensured.
Data needs to be synchronized by timing to remove mongodb20, the synchronization period depends on a service scene, and when a Clickhouse cluster 40 needs to be used for online analysis of OLAP (on-line analytical processing), the synchronization operation can be accelerated, for example, once synchronization is completed for 2000 pieces; if the service only needs to go out of the T +1 data service, the service can be synchronized once a day.
In addition, the direct multi-table operation can cause the merge partition to affect the unstable table performance of the external service; therefore, the incremental data is firstly imported into the temporary table and then the formal table data is written back into the temporary table, and the data is renamed into the formal table to provide services for the outside in a rename mode after the data import is finished.
Therefore, through the description above, the method and the system for frequent deletion, modification, and check optimization based on Clickhouse of the present invention can support operations of fast real-time insertion and update, support high concurrent query for real-time data, quickly respond to query requests, and can achieve millisecond-level response.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for frequent deletion, modification and optimization based on Clickhouse is characterized by comprising the following steps:
s1, synchronizing data, analyzing and storing the real-time operation log data in the synchronous original database to a kafka cluster and mongodb;
s2, consumption of running data, wherein the Application end Application is respectively connected with the kafka cluster, the mongodb cluster and the clickhouse cluster, when the Application end Application consumes the running data, the Application end Application firstly queries the mongodb data, and if the query is not available, the Application end Application queries the clickhouse cluster data;
s3, operating the query data, and directly operating mongodb when the Application end queries the data in mongodb and updates or deletes the data; when the Application end does not inquire data in mongodb and inquires data in the clickhouse cluster, writing the inquiry records of the Application back to the preposed mongodb for inquiry;
and S4, synchronizing the timing of the incremental data, wherein mongodb is connected with a clickhouse cluster, and mongodb synchronizes the incremental data to the clickhouse cluster.
2. The method as claimed in claim 1, wherein in step S1, the original database is an Oracle database or Mysql database.
3. The method according to claim 1, wherein in step S1, data synchronization is implemented by using OGG or Canal.
4. The method as claimed in claim 1, wherein in step S2, when the Application queries data of the last three days, the query is sent to mongodb.
5. The method as claimed in claim 4, wherein in step S2, when the Application queries the history data, the Application queries the Clickhouse cluster.
6. The method according to claim 1, wherein in step S4, the integrity of Clickhouse cluster data is ensured by timing and synchronizing incremental data to the Clickhouse cluster.
7. A system based on Clickhouse frequent deletion, modification and check optimization is characterized by comprising an original database, a kafka cluster, mongodb, Application side Application and a Clickhouse cluster;
the original database is used for storing real-time operation log data;
the kafka cluster and the mongodb are respectively connected with the original database, and the real-time operation log data in the original database are analyzed and stored in the kafka cluster and the mongodb;
the Application end is respectively connected with the kafka cluster, the mongodb cluster and the clickhouse cluster, and is used for realizing data query on the mongodb cluster and data query on the clickhouse cluster;
the clickhouse cluster is connected with the mongodb, and the mongodb synchronizes incremental data to the clickhouse cluster in a timing mode.
8. The system of claim 7, wherein the original database is an Oracle database or Mysql database.
9. The system of claim 7, wherein the Application queries mongodb's data as data operating on the last three days.
10. The system of claim 7, wherein the Application queries Clickhouse cluster data as historical data.
CN202011111740.4A 2020-10-16 2020-10-16 Method and system for frequent deletion, modification and check optimization based on Clickhouse Pending CN112269802A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011111740.4A CN112269802A (en) 2020-10-16 2020-10-16 Method and system for frequent deletion, modification and check optimization based on Clickhouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011111740.4A CN112269802A (en) 2020-10-16 2020-10-16 Method and system for frequent deletion, modification and check optimization based on Clickhouse

Publications (1)

Publication Number Publication Date
CN112269802A true CN112269802A (en) 2021-01-26

Family

ID=74337241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011111740.4A Pending CN112269802A (en) 2020-10-16 2020-10-16 Method and system for frequent deletion, modification and check optimization based on Clickhouse

Country Status (1)

Country Link
CN (1) CN112269802A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988916A (en) * 2021-03-05 2021-06-18 杭州天阙科技有限公司 Full and incremental synchronization method, device and storage medium for Clickhouse
CN113010189A (en) * 2021-03-17 2021-06-22 恒安嘉新(北京)科技股份公司 Database installation method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988916A (en) * 2021-03-05 2021-06-18 杭州天阙科技有限公司 Full and incremental synchronization method, device and storage medium for Clickhouse
CN112988916B (en) * 2021-03-05 2023-06-16 杭州天阙科技有限公司 Full and incremental synchronization method, apparatus and storage medium for Clickhouse
CN113010189A (en) * 2021-03-17 2021-06-22 恒安嘉新(北京)科技股份公司 Database installation method, device, equipment and storage medium
CN113010189B (en) * 2021-03-17 2023-07-25 恒安嘉新(北京)科技股份公司 Database installation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
EP1475724B1 (en) System and method for synchronization in a client/server environment
CN107818115B (en) Method and device for processing data table
EP0617814B1 (en) Open office directory database views
US6134543A (en) Incremental maintenance of materialized views containing one-to-one lossless joins
US6125360A (en) Incremental maintenance of materialized views containing one-to-N lossless joins
US8924365B2 (en) System and method for range search over distributive storage systems
EP3170109B1 (en) Method and system for adaptively building and updating column store database from row store database based on query demands
US5920857A (en) Efficient optimistic concurrency control and lazy queries for B-trees and other database structures
WO2021184761A1 (en) Data access method and apparatus, and data storage method and device
EP3401807B1 (en) Synopsis based advanced partition elimination
CN112269802A (en) Method and system for frequent deletion, modification and check optimization based on Clickhouse
CN116108057B (en) Distributed database access method, device, equipment and storage medium
CN111651519A (en) Data synchronization method, data synchronization device, electronic device, and storage medium
CN112612799A (en) Data synchronization method and terminal
CN113495872A (en) Transaction processing method and system in distributed database
US9390131B1 (en) Executing queries subject to different consistency requirements
CN110704442A (en) Real-time acquisition method and device for big data
Qi et al. The consistency analysis of secondary index on distributed ordered tables
Kiraz et al. Iot data storage: Relational & non-relational database management systems performance comparison
CN112527900A (en) Method, device, equipment and medium for database multi-copy reading consistency
CN115718787B (en) Data table data synchronization method, query method, electronic device and storage medium
CN115952200B (en) MPP architecture-based multi-source heterogeneous data aggregation query method and device
CN117390040B (en) Service request processing method, device and storage medium based on real-time wide table
CN117131023B (en) Data table processing method, device, computer equipment and readable storage medium
US11914655B2 (en) Mutation-responsive documentation generation based on knowledge base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination