CN114116739A

CN114116739A - System for inserting key value pair data into columnar database and implementation method

Info

Publication number: CN114116739A
Application number: CN202111324503.0A
Authority: CN
Inventors: 李沅泽; 赵子墨; 董晨晨; 李照川; 孙永超; 郭亚琨
Original assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Current assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2022-03-01
Anticipated expiration: 2041-11-10
Also published as: CN114116739B

Abstract

The invention discloses a system for inserting key value pair data into a columnar database and an implementation method, belonging to the technical field of computer software, aiming at solving the technical problem of how to rapidly and accurately segment and store the data and provide convenience for subsequent portrait data analysis, and adopting the technical scheme that: the system comprises a key value pair data processing program module, a column type database loading program module and an AppImage packaging and deploying module; the key value pair data processing program module is used for splitting and cleaning the key value pair data, processing the key value pair data into a format capable of being directly imported into the column type storage database, and storing the data into a message queue; the columnar database loading program module is used for taking data in the message queue and importing the data in the message queue into a columnar storage database; the AppImage packing and deploying module is used for packing the key value pair data processing program module and the columnar database loading program module respectively to generate two AppImage files.

Description

System for inserting key value pair data into columnar database and implementation method

Technical Field

The invention relates to the technical field of computer software, in particular to a system for inserting key value pair data into a columnar database and an implementation method.

Background

At present, data of buried points, gateway logs, iot equipment signal data and the like of online service are data in a key-value pair format, and the key-value pair data are processed and stored according to a specific format, so that the key-value pair data plays a vital role in research and development, operation and decision making.

Key-value pair data is data consisting of two parts, a key and a value. It is various and different, its hierarchy can be 2, 3 or more layers, and the stored data can be simple character string, and can also be list or object. Json (javascript Object notification) is one of the most widely used data transmission formats in the current industry. It is a lightweight data exchange format, based on a subset of the JS specification made by ECMAScript (w3 c), that stores and represents data in a text format that is completely independent of the programming language. Before JSON emerged, data was always transferred in XML. Because XML is a plain text format, it is suitable for exchanging data over a network. XML itself is not complex, but with a large stack of complex specifications such as DTD, XSD, XPath, XSLT, etc., XML usage becomes increasingly complex. JSON is rapidly becoming popular with the Web world, and is becoming the ECMA standard, since it is very simple. Almost all programming languages have libraries that parse JSON, while in JavaScript JSON can be used directly, since JavaScript embeds the parsing of JSON. Changing any JavaScript object into JSON serializes the object into a JSON-formatted string that can be passed to other computers over the network. If we receive a character string in JSON format, it only needs to deserialize it into a JavaScript object, and the object can be used directly in JavaScript.

Columnar storage is another way of organizing data at the bottom of a database that stores data in columns, as opposed to row-wise storage. Compared with the traditional line type storage, the line type storage can dynamically add new fields, so that the structure of the line type storage is more flexible and more space-saving. Only one primary key is needed for storing one row of data in a row mode, and multiple primary keys are needed for storing one row of data in a row mode. The line type memory stores all the business data, and the column type memory stores column names in addition to the business data. The HBase has another characteristic compared with other conventional databases that its update operation is not update in the conventional sense (replacing or overwriting the original data with new data), but inserts new data on the basis of keeping the old data, except that the timestamp of the old data is different from that of the new data. It can easily call multiple versions, i.e. different periods of data.

At present, a great amount of key value pair data are generated every day by online service, so how to rapidly and accurately segment and store the data, and convenience is provided for subsequent portrait data analysis.

Disclosure of Invention

The technical task of the invention is to provide a system for inserting key value pair data into a columnar database and an implementation method thereof, so as to solve the problem of how to rapidly and accurately segment and store the data and provide convenience for subsequent portrait data analysis.

The technical task of the present invention is achieved in that a system for key-value pair data insertion into a columnar database, the system comprising,

the key value pair data processing program module is used for splitting and cleaning the key value pair data, processing the key value pair data into a format capable of being directly imported into the column type storage database, and storing the key value pair data into the message queue; the message queue is used for data transmission;

the columnar database loading program module is used for taking data in the message queue and importing the data in the message queue into a columnar storage database;

and the AppImage packing and deploying module is used for packing the key value pair data processing program module and the columnar database loading program module respectively to generate two AppImage files.

Preferably, the packaging environment requirements of the AppImage packaging and deployment module are as follows:

firstly, linux environment: a linux system which is maximally installed;

secondly, installing an AppImage packaging tool in the linux system: linux deployqt and patchelf;

and thirdly, various language development environments are installed in the linux system.

Preferably, the AppImage packaging and deploying module includes,

the creating submodule is used for creating a folder by inputting mkdir output in a command line;

a copy submodule for copying the key-value pair data processing program module or the columnar database loading program module into an output;

and the execution submodule is used for executing the linux deployqt command to complete packaging.

Preferably, the key-value pair data processing module comprises,

the formatting sub-module is used for performing preliminary json formatting on the acquired data by using a data acquisition tool and outputting the data to a message queue topic A;

and the splitting and cleaning submodule is used for splitting and cleaning json data in the message queue topic A according to fixed logic.

Preferably, the data collection tool includes filebeat, logstash, or sqoop, and when the data is collected, extra character strings are spliced and used as flag bits for identifying data sources.

Preferably, the logic pairs json data are specified as follows:

reading a flag bit (additionally added character strings are spliced during data acquisition and used for identifying a data source) of a first row of json data of a logic pair to serve as first large-class data;

reading a character string before a brace and colon combined character as second large-class data;

and thirdly, reading the quantity and the position of the braces and the middle braces behind the second large-class data as the level of the judgment logic pair json data.

Preferably, the logic pair json data cleansing rule is as follows:

firstly, calibrating different key values according to second main data of different sources, namely only matching the calibrated key values, and automatically filtering the uncalibrated key values;

secondly, retrieving according to the second large class data, dividing each level data of the second large class data which are matched and hit, reading key and value values, putting the key and value values into a two-dimensional array, packaging and outputting the two-dimensional array to a message queue topic B.

Preferably, the columnar database loader module includes,

a calling submodule for calling an insert data interface of a columnar storage database (e.g., Hbase);

the import submodule is used for importing the data in the message queue topic B into the column family and the corresponding data column according to the first large class data and the second large class data;

the establishing submodule is used for automatically establishing a global index according to the second large-class data; among other things, the columnar storage database provides a fast data retrieval query service.

A key value pair data inserts the implement method of the column database, said method is based on the management of AppImage packet, use the column to store the database, analyze, process and import the key value pair data of different formats automatically; the method comprises the following specific steps:

s1, packing the key-value pair data processing program module and the columnar database loading program module into 2 AppImage packages respectively by using AppImage packing tools linux depployqt and patchelf under the linux environment;

s2, using a filebolt, logstack or sqoop data acquisition tool to format the acquired data primarily for json and output the json to a message queue topic A, and segmenting and cleaning the json data in the message queue topic A according to fixed logic;

s3, calling an insert data interface of the column type storage database (such as Hbase) by the column type database loader module, respectively importing the data in the message queue topic B into the column family and the corresponding data column thereof according to the first large class data and the second large class data, and automatically establishing a global index according to the second large class data.

Preferably, the logic pair json data in step S2 is specifically as follows:

reading the quantity and the position of the braces and the middle braces behind the second large-class data as the level of the judgment logic pair json data;

the json data cleaning rule of the logic pair in the step S2 is specifically as follows:

The system for inserting the key-value pair data into the columnar database and the implementation method have the following advantages:

the method processes and stores data in key value pair formats such as buried point data, gateway logs, iot equipment signal data and the like according to a specific format, and has the characteristics of light weight, high universality and easiness in deployment;

the key value pair data can be automatically analyzed, processed and imported into the column database, so that the huge labor cost and time cost of manually performing the series of tasks are avoided;

thirdly, the key value pair data processing program module can process the key value pair data with different formats according to requirements, and has high universality;

the method is based on the application image package management, can realize quick and effective deployment, and is compatible with different linux release versions;

and (V) the invention uses the message queue to exchange data to ensure the data transmission to be timely and not lost.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a flow diagram of an implementation of a method for key-value pair data insertion into a columnar database.

Detailed Description

The system and method for inserting key-value pair data into a columnar database according to the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

Example 1:

the system for key-value pair data insertion into a columnar database of the present invention, the system comprising,

The packaging environment requirements of the AppImage packaging and deployment module in this embodiment are as follows:

firstly, linux environment: a linux system which is maximally installed;

The AppImage packaging and deployment module in this embodiment includes,

The key-value pair data processing module in this embodiment comprises,

The data acquisition tool in the embodiment includes filebeat, logstash, or sqoop, and when the filebeat, logstash, or sqoop acquires data, an additional character string is spliced and used as a flag bit for identifying a data source.

The json data of the logic pair in this embodiment is specifically as follows:

and thirdly, reading the quantity and the position of the braces and the middle braces behind the second large-class data as the level of the judgment logic pair json data. The second broad category of data acquisition rules, for example:

wherein the first id is the second broad class of data.

The json data cleaning rule of the logic pair in the embodiment is specifically as follows:

The columnar database loader module in this embodiment includes,

Example 2:

as shown in fig. 1, the method for inserting key-value pair data into a columnar database according to the present invention is based on the AppImage packet management, uses a columnar storage database, automatically analyzes, processes and imports key-value pair data of different formats; the method comprises the following specific steps:

s1, packing the key-value-pair data processing program module and the column-type database loading program module by using AppImage packing tools linux, depployqt and patchelf under linux environment, specifically: inputting mkdir output by a command line to create a folder, copying a key value pair data processing program module or a columnar database loading program module package into the output, and executing a linux deployqt command to complete packaging; executing twice to generate 2 AppImage files;

The logic in step S2 of this embodiment specifically includes the following for json data:

the logic in step S2 of this embodiment specifically includes the following rule for cleaning json data:

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A system for inserting key-value pair data into a columnar database, the system comprising,

the key value pair data processing program module is used for splitting and cleaning the key value pair data, processing the key value pair data into a format capable of being directly imported into the column type storage database, and storing the key value pair data into the message queue;

2. The system for key-value pair data insertion into columnar databases of claim 1, wherein the packaging environment of the AppImage packaging and deployment module requires the following:

firstly, linux environment: a linux system which is maximally installed;

3. The system for key-value pair data insertion into a columnar database according to claim 1, wherein the AppImage packaging and deployment module comprises,

4. The system for key-value pair data insertion into a columnar database according to claim 1, wherein the key-value pair data processing module comprises,

and the splitting and cleaning submodule is used for splitting and cleaning json data in the message queue topic A according to logic.

5. The system for inserting key-value pair data into a columnar database according to claim 4, wherein the data collection tool comprises filebeat, logstack or sqoop, and when the filebeat, logstack or sqoop collects data, an additional character string is spliced and used as a flag for identifying a data source.

6. The system for key-value pair data insertion into a columnar database as recited in claim 4, wherein the logic pair json data is specified as follows:

reading a flag bit of a first row of json data of a logic pair to serve as first large-class data;

7. The system for key-value pair data insertion into a columnar database as recited in any one of claims 4-6, wherein the logic pair json data cleansing rules are specified as follows:

8. The system for key-value pair data insertion into a columnar database according to claim 7, wherein the columnar database loader module comprises,

the calling submodule is used for calling an insert data interface of the columnar storage database;

the establishing submodule is used for automatically establishing a global index according to the second large-class data; wherein the columnar storage database provides data retrieval query services.

9. A realization method for inserting key value pair data into a columnar database is characterized in that the method is based on AppImage packet management, uses a columnar storage database, automatically analyzes, processes and imports key value pair data with different formats; the method comprises the following specific steps:

s2, using a filebolt, logstack or sqoop data acquisition tool to format the acquired data primarily for json and output the json to a message queue topic A, and segmenting and cleaning the json data according to logic by the data in the message queue topic A;

s3, calling an insert data interface of the column type storage database by the column type database loading program module, respectively importing the data in the message queue topic B into the column family and the corresponding data column thereof according to the first large class data and the second large class data, and automatically establishing a global index according to the second large class data.

10. The method for implementing key-value pair data insertion into a columnar database according to claim 9, wherein the logic pair json data in step S2 is specifically as follows: