CN114661706A - Clickhouse data writing plug-in method based on jlogstack - Google Patents

Clickhouse data writing plug-in method based on jlogstack Download PDF

Info

Publication number
CN114661706A
CN114661706A CN202011534160.6A CN202011534160A CN114661706A CN 114661706 A CN114661706 A CN 114661706A CN 202011534160 A CN202011534160 A CN 202011534160A CN 114661706 A CN114661706 A CN 114661706A
Authority
CN
China
Prior art keywords
data
clickhouse
database
thread
writing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011534160.6A
Other languages
Chinese (zh)
Inventor
钱奕辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yunzhe Technology Co ltd
Original Assignee
Hangzhou Yunzhe Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yunzhe Technology Co ltd filed Critical Hangzhou Yunzhe Technology Co ltd
Priority to CN202011534160.6A priority Critical patent/CN114661706A/en
Publication of CN114661706A publication Critical patent/CN114661706A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a Clickhouse data writing plug-in method based on jlogstack, which comprises the following steps: classifying data according to the specified key words, wherein each type of data corresponds to each table of the Clickhouse database; judging whether the database and the data table exist in the Clickhouse database, and if not, performing creation operation; checking fields of the written data, and triggering a click house database table field adding operation if a new field is found; the producer thread delivers the checked data to the corresponding processor thread to wait for the consumption of the processor thread; after the consumer thread consumes data each time, health detection is carried out on the Clickhouse cluster; when the consumer thread consumes the data, the flow of writing into the Clickhouse is started under two limits of time and number. The invention can realize the automatic operation of modifying the table fields by increasing or reducing the fields, effectively improve the speed of the data writing process and realize the writing of irregular data.

Description

Clickhouse data writing plug-in method based on jlogstack
Technical Field
The invention relates to the technical field of data writing plug-in units, in particular to a method for writing Clickhouse data into a plug-in unit based on jlogstack.
Background
The real-time log and performance data acquisition and analysis are used as an important means for mastering the operation condition of company business and searching and analyzing fault problems, and the most ELK schemes are used in the current numerous implementation schemes. The ELK scheme is based on a Logstash, Elasisearch, Kibana as a technical stack to realize the function of data acquisition, analysis and display. The company develops own jlogstash framework based on open source Logstash project, uses Java as development language, and improves the performance to about five times of original edition (Ruby version).
While data is displayed, the collected data needs to be analyzed and processed, and the performance of the Elasticsearch becomes a bottleneck, so that the OLAP database of Clickhouse is introduced. The Clickhouse is a column-oriented database, and adopts Local attached storage as a storage scheme, so that the IO performance is greatly improved; SQL is used as a development language, so that the use cost is greatly reduced. Meanwhile, the Clickhouse naturally supports the distributed mode, the high availability scheme of the fragment and the copy is supported originally, and the linear cluster expansion and the stability of the cluster during operation are guaranteed.
In the market, the existing Clickhouse writing plug-in only supports a user to create a database and a data table in advance and then write data, and simultaneously only supports writing a single library table, does not support simultaneous writing of multiple tables, and the written data format is fixed, and the table structure cannot be changed after the writing plug-in is started. In view of the related technical problems, no effective solution has been proposed at present.
Disclosure of Invention
Aiming at the problems in the related technology, the invention provides a method for writing Clickhouse data into a plug-in unit based on jlogstash, so as to overcome the technical problems in the prior art. In the writing process, the field name and the field type of each piece of data are judged, if the field is increased or decreased, the table field is automatically modified, the complete data writing process is not influenced, and the writing of irregular data can be realized under the condition that the writing plug-in is not restarted.
In order to achieve the above object, the present invention provides a method for writing Clickhouse data into a plug-in based on jlogstack, comprising the following steps:
(1) classifying data according to the specified key words, wherein each type of data corresponds to each table of the Clickhouse database;
(2) judging whether the database and the data table exist in the Clickhouse database, if not, carrying out creation operation, and meanwhile, preparing related table data information in advance, and if the keyword has corresponding information in preset data, reading the preset information for creation;
(3) performing field verification on written data, locally caching table field information of a database, performing field verification on each piece of information, and if newly added fields are found, triggering a click house database table field addition operation;
(4) the producer thread delivers the checked data to the corresponding processor thread to wait for the consumption of the processor thread;
(5) after the consumer thread consumes data each time, health detection is carried out on the Clickhouse cluster, if a certain node in the cluster is found to be offline, the node is stopped from writing data, and the data cannot be written into the node until the node is recovered to normal;
(6) when the consumer thread consumes data, the consumer thread is limited by time and number, and no matter who is triggered first, the consumer thread starts a write-in Clickhouse process.
In the preferred embodiment of the present invention, during the process of creating the database and the data table, the data writing of the Clickhouse database is stopped until the creation thread is completed and the writing is continued after the success information is returned.
In the preferred embodiment of the present invention, in the process of adding table fields to the Clickhouse database, the process of verifying data is stopped until the modification thread is completed and the verification is continued after the success information is returned.
Compared with the prior art, the invention has the following technical effects:
(1) the method comprises the steps that a preset table structure is set, a built-in file is read in an initialization stage of a writer, some preset table information is read, each piece of information corresponds to information such as a table name, a table structure, an engine and a setting item of a table, when new user data enters a data link, the writer can hang all writing work of a current new user, library building and table building work in Clickhouse is completed according to the read preset table information, and after the work is completed, a writing lock is released, and the data starts to be written normally;
(2) the invention sets multi-table writing, which is different from a plurality of open-source Clickhouse writers that set written table names in configuration items, business needs to distinguish data according to data sets of the data, then writes the data into different tables, and aiming at the situation, a thread is separately opened for each table, each thread is responsible for data writing work of one table, isolation on the thread is achieved, the problem of data confusion is avoided, the data is uniformly received by a main thread, the data is distributed to different writing threads through queues, the writing threads consume queues in a fixed time and a fixed quantity, and then the data is written into the Clickhouse in Batch, because the data writing of the Clickhouse is based on double control of Batch, the queue support time of a writer internal data stream and the data, when the set condition is reached, the data consumption is carried out, when the set condition is not reached, the current queue is blocked, the writing of the data is directly written into a local table, compared with writing in a distributed table, the problem of consistency of data written in the distributed table is effectively solved;
(3) the invention sets dynamic Schema (dynamic list structure), before each data is written into the processing queue, there is a field checking process: each field of each piece of data and a table structure stored locally are checked, when a field which is not stored in a cache is found, a Clickhouse table structure is obtained again, the problem of cache inconsistency caused by jlogstack multiple nodes and multithreading is avoided, then a check is carried out, if the second check still judges that the field is a new field, the current flow is paused, the action of modifying the table structure is triggered, a new table field is added to the current table, after the action of modifying is finished, the check flow of the current data is continued, and after all checks are finished, the data are injected into the thread of a table processor, so that the method has high practical value and popularization value.
Drawings
Fig. 1 is a schematic flowchart of a flow structure in embodiment 1 of a jlogstash-based Clickhouse data write plug-in method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a flow structure in embodiment 2 of a method for writing Clickhouse data into a plug-in based on jlogstack according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Some exemplary embodiments of the invention have been described for illustrative purposes, and it is to be understood that the invention may be practiced otherwise than as specifically described.
In this embodiment, a method for writing Clickhouse data into a plug-in based on jlogstack includes the following steps:
(1) classifying data according to the specified key words, wherein each type of data corresponds to each table of the Clickhouse database;
(2) judging whether the database and the data table exist in the Clickhouse database, if not, carrying out creation operation, and meanwhile, preparing related table data information in advance, and if the keyword has corresponding information in preset data, reading the preset information for creation;
(3) performing field verification on written data, locally caching table field information of a database, performing field verification on each piece of information, and if newly added fields are found, triggering a click house database table field addition operation;
(4) the producer thread delivers the checked data to the corresponding processor thread to wait for the consumption of the processor thread;
(5) after the consumer thread consumes data each time, health detection is carried out on the Clickhouse cluster, if a certain node in the cluster is found to be offline, the node is stopped from writing data, and the data cannot be written into the node until the node returns to normal;
(6) when the consumer thread consumes data, the data is limited by time and number, and no matter who is triggered first, the Clickhouse writing process is started, so that the phenomenon that the Clickhouse cluster merge pressure is too high and the writing speed is suddenly reduced due to frequent writing of small data volume is effectively avoided.
In some embodiments, in the process of creating the database and the data table, the data writing of the Clickhouse database is stopped until the creation thread is completed and the writing is continued after the success information is returned.
In some embodiments, in the process of adding table fields to the Clickhouse database, the process of verifying data is stopped until the modification thread is completed and the verification is continued after the success information is returned.
Example 1
As shown in fig. 1, the example includes a jlogstack core operation module, an input module, a filter module, an output plug-in module, and a Clickhouse cluster module, where the core operation module is used to ensure the start and operation of the project; the plug-in module is used for customizing a data circulation processing mode and realizing the ETL process of data; the data storage server Clickhouse is responsible for receiving logs written by output. Clickhouse plug-ins and persisting them to the hard disk for later data use.
The method comprises the following specific steps:
(1) uploading a jlogstack core operation packet;
(2) setting a jlogstack running script;
(3) upload output. clickhouse plug-in package.
Example 2
As shown in fig. 2, the automatic intelligent Clickhouse writing of data by using the write-in plug-in of the present invention includes the following specific steps:
(1) uploading the plug-in package to a server where the jlogstack is located and placing the plug-in package at a specified position according to requirements;
(2) yaml configuration files are modified, cluster names, cluster addresses, write intervals and other information are configured;
(3) yaml configuration files, configuration table engines, partition information, table names, table fields and other information are modified;
(4) jlogstack is started through script and task is started.
The invention has the beneficial effects that:
(1) a preset table structure, during the initialization stage of the writer, reading a built-in file and reading some preset table information, wherein each piece of information corresponds to the table name, the table structure, the engine, the setting items and other information of one table, when a new user data enters a data link, the writer holds all the writing work of the current new user, and the Clickho is completed according to the read preset table information
Distinguishing data by a data set, writing the data into different tables, and aiming at the situation, independently opening a thread for each table, wherein each thread is responsible for the database building and table building work in the data writing work use of one table;
for example, if a data set has three fields, i.e., a, b, and c, and the data set has only two fields, i.e., ab and ac, for the first time and only two fields for the second time in the data link, this results in a very serious problem: when the first piece of data arrives, the creation of a Clickhouse table is carried out according to data fields, but when the second piece of data arrives, a new field appears, and then the ADD COLUMN operation needs to be carried out on the table structure.
(2) Multi-table writing, different from a plurality of open-source Clickhouse writers, the table names written in the configuration items are set, the service needs to achieve thread isolation according to the number of data, the problem of data confusion is avoided, the data is uniformly received by a main thread, distributing the data to different write-in threads through the queue, enabling the write-in threads to regularly and quantitatively consume the queue, then writing the data into Clickhouse in batches, because the data writing of the Clickhouse is based on the Batch, the number of Parts is reduced, the background asynchronous Merge operation of the Clickhouse is greatly reduced, the exception of the Too Many Parts is avoided, the queue of the writer internal data flow supports the double control of time and data, when the set condition is met, data consumption is carried out while the current queue is blocked, data writing is directly written into the local table, and compared with writing into the distributed table, the problem of consistency of data writing into the distributed table is effectively avoided; the data consistency problem is not perceived in the writing process, because the process of data falling from the distributed table to the local table is completed by the Clickhouse bottom layer, and the operation is not directly connected, so that the operator cannot judge whether the data written by the operator actually falls in the Clickhouse.
(3) Dynamic Schema (dynamic table structure), before each piece of data is written into the processing queue, there is a field checking process: checking each field of each piece of data and a table structure stored locally, when finding that no field exists in a cache, acquiring a Clickhouse table structure once again to avoid the problem of cache inconsistency caused by jlogstack multiple nodes and multithreading, then checking again, if the second check still judges that the field is a new field, pausing the current flow, triggering the action of table structure modification, namely adding a new table field for the current table, continuing the checking flow of the current data after the action of modification is finished, and after all checks are finished, inputting the data into the thread of a table processor;
for example, one piece of data has 2 tags, the original storage format is { "a1":111, "b1":222}, and we will flatten the tag during writing and process the tag into the following format:
Figure RE-GDA0002982291110000101
an empty case is encountered and is automatically filled with a null value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (3)

1. A Clickhouse data writing plug-in method based on jlogstack is characterized by comprising the following steps:
(1) classifying data according to the specified key words, wherein each type of data corresponds to each table of the Clickhouse database;
(2) judging whether the database and the data table exist in the Clickhouse database, if not, carrying out creation operation, and meanwhile, preparing related table data information in advance, and if the keyword has corresponding information in preset data, reading the preset information for creation;
(3) performing field verification on written data, locally caching table field information of a database, performing field verification on each piece of information, and if newly added fields are found, triggering a click house database table field addition operation;
(4) the producer thread delivers the checked data to the corresponding processor thread to wait for the consumption of the processor thread;
(5) after the consumer thread consumes data each time, health detection is carried out on the Clickhouse cluster, if a certain node in the cluster is found to be offline, the node is stopped from writing data, and the data cannot be written into the node until the node returns to normal;
(6) when the consumer thread consumes data, the consumer thread is limited by time and number, and no matter who is triggered first, the consumer thread starts a write-in Clickhouse process.
2. The method as claimed in claim 1, wherein in the process of creating the database and the data table, the data writing of the Clickhouse database is stopped until the creation thread is completed and the success information is returned, and then the data writing is continued.
3. The method as claimed in claim 1, wherein in the process of adding table fields to the Clickhouse database, the process of verifying data stops until the modification thread is completed and the verification is continued after the success information is returned.
CN202011534160.6A 2020-12-23 2020-12-23 Clickhouse data writing plug-in method based on jlogstack Pending CN114661706A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011534160.6A CN114661706A (en) 2020-12-23 2020-12-23 Clickhouse data writing plug-in method based on jlogstack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011534160.6A CN114661706A (en) 2020-12-23 2020-12-23 Clickhouse data writing plug-in method based on jlogstack

Publications (1)

Publication Number Publication Date
CN114661706A true CN114661706A (en) 2022-06-24

Family

ID=82025554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011534160.6A Pending CN114661706A (en) 2020-12-23 2020-12-23 Clickhouse data writing plug-in method based on jlogstack

Country Status (1)

Country Link
CN (1) CN114661706A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149969A (en) * 2023-04-04 2023-05-23 湖南中青能科技有限公司 Database model matching anomaly monitoring and processing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149969A (en) * 2023-04-04 2023-05-23 湖南中青能科技有限公司 Database model matching anomaly monitoring and processing method
CN116149969B (en) * 2023-04-04 2023-06-20 湖南中青能科技有限公司 Database model matching anomaly monitoring and processing method

Similar Documents

Publication Publication Date Title
US9792327B2 (en) Self-described query execution in a massively parallel SQL execution engine
US8099631B2 (en) Call-stacks representation for easier analysis of thread dump
US11526465B2 (en) Generating hash trees for database schemas
US10198346B1 (en) Test framework for applications using journal-based databases
CN111858759B (en) HTAP database system based on consensus algorithm
US20100169289A1 (en) Two Phase Commit With Grid Elements
US10133767B1 (en) Materialization strategies in journal-based databases
CN112860777B (en) Data processing method, device and equipment
US20110202564A1 (en) Data store switching apparatus, data store switching method, and non-transitory computer readable storage medium
US9390111B2 (en) Database insert with deferred materialization
Matallah et al. Experimental comparative study of NoSQL databases: HBASE versus MongoDB by YCSB
CN114661706A (en) Clickhouse data writing plug-in method based on jlogstack
CN117171108B (en) Virtual model mapping method and system
Qian et al. An evaluation of Lucene for keywords search in large-scale short text storage
CN113962597A (en) Data analysis method and device, electronic equipment and storage medium
US20120323946A1 (en) Data framework to enable rich processing of data from any arbitrary data source
CN113918535A (en) Data reading method, device, equipment and storage medium
Raman et al. BoDS: A benchmark on data sortedness
CN107622123B (en) ASM file system-oriented file analysis method
US6963957B1 (en) Memory paging based on memory pressure and probability of use of pages
CN108769137A (en) Distributed structure/architecture data storing and reading method and device based on multigroup framework
CN111459931A (en) Data duplication checking method and data duplication checking device
Harter Emergent properties in modular storage: A study of apple desktop applications, Facebook messages, and docker containers
Kumarasinghe et al. Performance comparison of NoSQL databases in pseudo distributed mode: Cassandra, MongoDB & Redis
Sandberg High performance querying of time series market data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination