CN112988666B - Distributed log condition query method and system based on cuckoo filter - Google Patents

Distributed log condition query method and system based on cuckoo filter Download PDF

Info

Publication number
CN112988666B
CN112988666B CN202110300026.8A CN202110300026A CN112988666B CN 112988666 B CN112988666 B CN 112988666B CN 202110300026 A CN202110300026 A CN 202110300026A CN 112988666 B CN112988666 B CN 112988666B
Authority
CN
China
Prior art keywords
data
condition
query
cold
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110300026.8A
Other languages
Chinese (zh)
Other versions
CN112988666A (en
Inventor
李肯立
夏禹
余思洋
周旭
刘楚波
肖国庆
段明星
张家豪
巢婉琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Kuangan Network Technology Co ltd
Original Assignee
Hunan Kuangan Network Technology Co ltd
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Kuangan Network Technology Co ltd, Hunan University filed Critical Hunan Kuangan Network Technology Co ltd
Priority to CN202110300026.8A priority Critical patent/CN112988666B/en
Publication of CN112988666A publication Critical patent/CN112988666A/en
Application granted granted Critical
Publication of CN112988666B publication Critical patent/CN112988666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/144Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed log condition query method based on a cuckoo filter, which comprises the following steps: acquiring a condition query request sent by a client, performing data query in a pre-constructed hot database according to the condition query request, judging whether the total quantity of queried data is lower than the data quantity corresponding to the condition query request, if so, processing the condition query request to obtain an identification character string, using the identification character string as a Key (Key), performing data query in a cold condition cache layer to judge whether a Value (Value) corresponding to the Key exists in the cold condition cache layer, if not, executing query operation in a pre-constructed cold database data sublist according to the condition query request to obtain a condition query result, processing the condition query data result to generate a JSON character string, using the identification character string as a Key, using the JSON character string as a Value to form a Key Value pair, and storing the Key Value pair in the cold condition cache layer.

Description

Distributed log condition query method and system based on cuckoo filter
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a distributed log condition query method and system based on a cuckoo filter.
Background
With the rapid popularization of the internet of things technology, terminal equipment in the field of industrial control generates huge amounts of log data every day to perform security analysis evaluation and statistical management on the terminal equipment. How to conveniently and quickly carry out targeted condition query in a huge industrial control log data set which is stored for months or even years becomes a common and serious problem which needs to be faced by an industrial control management system.
The existing log condition query method includes the methods of carrying out distributed database division or carrying out query operation after table division on a database in advance, caching data and the like. The database sub-base means that a single database is divided into a plurality of databases according to the system service or the possibility of data query, and the like, and the database sub-table is a plurality of data tables divided from a single data table according to the same method. For databases with tens of millions, hundreds of millions and even higher level data volumes, the sub-databases can effectively reduce the query pressure of a single database and avoid the bottleneck of the processing capacity of the single database; the table dividing operation can reduce the time consumption caused by multi-field matching and sorting during condition query; caching the data can reduce the times of directly executing conditional query on the database in a mode of directly acquiring the data from the caching layer.
However, the above existing log condition query methods all have some non-negligible drawbacks: firstly, the method of pre-sorting the database can only play a role of optimizing the query speed when executing partial condition query, and when the database with smaller data volume cannot obtain enough query results, other database with large data volume still needs to be frequently accessed, thereby generating tedious query time consumption; secondly, for a method of pre-performing table division on a database, no matter a plurality of threads are arranged to simultaneously inquire each data table or a plurality of tables are connected and then conditional inquiry is performed, the problem of system performance caused by excessive tables needs to be solved; thirdly, for a method of adding a data cache in a system, a common data cache can only rapidly acquire a corresponding Value (Value) for a specific Key (Key), and cannot effectively solve the query requirement of multi-field condition matching during condition query; the method of caching the data of the whole conditional query result as a whole usually needs to spend a lot of time on maintaining the integrity of the cached data, and greatly affects the overall performance of the system.
Disclosure of Invention
Aiming at the defects or improvement requirements in the prior art, the invention provides a distributed log condition query method and a distributed log condition query system based on a cuckoo filter, and aims to solve the technical problems that the conventional method for pre-sorting a database generates long query time consumption, the conventional method for pre-sorting the database needs to face excessive sorting tables to cause system performance reduction, and the conventional method for adding a data cache in a system cannot effectively solve the technical problems that the query requirement of multi-field condition matching during condition query costs a large amount of time and greatly influences the overall performance of the system.
In order to achieve the above object, according to an aspect of the present invention, there is provided a distributed log condition query method based on a cuckoo filter, which is applied in an industrial control system, the distributed log condition query method including the following steps:
(1) acquiring a condition query request sent by a client, performing data query in a pre-constructed hot database according to the condition query request, judging whether the total amount of queried data is lower than the data amount corresponding to the condition query request, if so, processing the condition query request to obtain an identification character string, and entering the step (2), otherwise, returning the queried data to the client, and ending the process.
(2) Taking the identification character string obtained in the step (1) as a Key (Key), performing data query in a cold condition cache layer to judge whether a Value (Value) corresponding to the Key exists in the cold condition cache layer, if so, returning a query result to the client, ending the process, otherwise, entering the step
(3);
(3) Executing query operation in a pre-constructed cold database data sublist according to the conditional query request acquired in the step (1) to obtain a conditional query result;
(4) and (3) processing the conditional query data result obtained in the step (3) to generate a JSON character string, forming a key value pair by taking the identification character string obtained in the step (1) as a key and the JSON character string as a value, and storing the key value pair in a cold condition cache layer.
(5) And (3) mapping the identification character string obtained in the step (1) into fingerprint data of one byte, and storing the fingerprint data into a cuckoo filter.
Preferably, the thermal database is constructed by:
(a) acquiring log data stored in an industrial control system;
(b) and acquiring 10 thousands of log data which are latest to the current time of the industrial control system according to the generation time field of each log data, and storing the log data into a thermal database.
Preferably, the process of processing the conditional query request in step (1) to obtain the identification string includes the following sub-steps:
(1-1) obtaining the type of the log data corresponding to the condition query request;
(1-2) acquiring a condition matching value requested by a corresponding field in a condition query request according to a field condition query range and a field sequence allowed by a client to the log data of the type, judging whether the condition matching value is NULL, if so, setting the condition matching value to be a character string 'NULL', then entering the step (1-3), otherwise, keeping the condition matching value unchanged, and then entering the step (1-3);
(1-3) processing the type to which the log data belongs and the condition matching value obtained in the step (1-2) into character strings, and splicing the character strings by using the "&" symbol in sequence according to the field sequence obtained in the step (1-2) to obtain an identification character string.
Preferably, the cold door condition cache layer in step (2) refers to a condition query request that fails to query the hot database for a sufficient amount of requested data and needs to query the cold database. The cold condition cache layer is constructed by Redis, wherein an identification character string of a condition query request is used as a key, and a corresponding query result of the condition query request in the cold database is a value, forms a key value pair and stores the key value pair in the Redis.
Preferably, the cold database data sub-table in step (3) is constructed by the following steps:
(a) the method comprises the steps of obtaining a cold database, wherein the cold database is formed by all log data received from the beginning of receiving log data by an industrial control system to the current time of the industrial control system;
(b) acquiring the month to which each type of log data belongs according to the generation time field of each type of log data in the cold database;
(c) and (c) respectively storing the corresponding type log data into different month branch tables of the type log data in the cold database according to the belonged month of each type log data obtained in the step (b).
Preferably, the cuckoo filter in step (5) is constructed by:
(a) opening a len data storage barrel, wherein len represents the length of the cuckoo filter and satisfies the formula len-2n(n∈N+);
(b) 4 storage locations are provided in each data bucket, each storage location occupying 1 byte for storing fingerprint information identifying a string.
Preferably, step (5) comprises the sub-steps of:
(5-1) mapping the identification string id _ Str to obtain 1 byte of fingerPrint information fp, namely fp ═ finger print (id _ Str), wherein finger print () is a mapping function;
(5-2) acquiring a bucket serial number pos according to the identification character string id _ Str and the cuckoo filter length len, namely:
pos=hash(id_Str)%len;
(5-3) judging whether all 4 positions in the pos bucket have vacancies according to the bucket serial number obtained in the step (5-2), if so, entering the step (5-4), otherwise, entering the step (5-5);
(5-4) storing the fingerprint information fp obtained in the step (5-1) to the position of the empty position in the pos storage bucket, and ending the process;
(5-5) recording the fingerprint information fp as fp _1, randomly selecting one position from 4 positions in the pos storage bucket, taking out the fingerprint information fp _2 of the position, and storing the fingerprint information fp _1 into the position;
(5-6) the position of the pos bucket is denoted as pos _1, and the new position pos _2 is obtained by XOR operation between pos _1 and fp _2, that is, the position pos _2 is obtained
pos_2=pos_1⊕fp_2;
(5-7) judging whether a vacancy exists in the new position pos _2, if so, storing the fingerprint information fp _2 into the position pos _2, and ending the process, otherwise, recording the fingerprint information fp _2 as fp, and then returning to the step (5-5).
According to another aspect of the present invention, a distributed log conditional query system based on a cuckoo filter is provided, which is applied in an industrial control system, and includes:
the first module is used for acquiring a condition query request sent by a client, performing data query in a pre-constructed hot database according to the condition query request, judging whether the total amount of queried data is lower than the data amount corresponding to the condition query request, if so, processing the condition query request to obtain an identification character string, entering the second module, otherwise, returning the queried data to the client, and ending the process.
The second module is used for taking the identification character string obtained by the first module as a Key (Key), performing data query on the cold condition cache layer to judge whether a Value (Value) corresponding to the Key exists in the cold condition cache layer, if so, returning a query result to the client, ending the process, otherwise, entering the third module;
the third module is used for executing query operation in a pre-constructed cold database data sub-table according to the condition query request acquired by the first module to obtain a condition query result;
and the fourth module is used for processing the conditional query data result obtained by the third module to generate a JSON character string, forming a key-value pair by taking the identification character string obtained by the first module as a key and the JSON character string as a value, and storing the key-value pair in the cold condition cache layer.
And the fifth module is used for mapping the identification character string obtained by the first module into fingerprint data of one byte and storing the fingerprint data into the cuckoo filter.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) because the invention adopts the step (1) and the step (3), the whole database is divided into cold and hot parts, the cold condition cache layer is arranged, and the hot database, the cold condition cache layer and the cold database are inquired in sequence when the inquiry is executed, on one hand, the characteristics of the log data inquiry heat degree and the small data volume of the hot database are utilized, and the time consumed by the inquiry matching of a plurality of fields and the sequencing of the plurality of fields is greatly reduced when the condition inquiry operation is executed; on the other hand, by arranging the cold condition cache layer, under the condition that the hot database cannot optimize the query speed, the cold condition cache layer is subjected to rapid data query, so that the lengthy time consumed by querying the cold database is effectively avoided, and the overall query performance of the system is improved.
(2) Because the step (2) and the step (4) are adopted, on one hand, the cold database is subjected to table division operation, so that the lower query pressure of the single table in the cold database is kept, the condition that the query operation of multiple conditions is executed due to the huge total amount of log data sets is avoided, and when the query operation of the cold data table is executed, the cold data tables can be simultaneously queried in cooperation with multiple threads; on the other hand, the condition query result obtained by the cold database is dynamically stored in the cold condition cache layer, so that the condition query times of the cold database are effectively reduced. In fact, most of the query results which cannot be obtained in the hot database only need to be queried for the cold database once, and the query results can be obtained only by accessing the cold condition cache layer in the subsequent query under the same condition. This enables the cold database to perform a smaller number of sub-table partitions, thereby avoiding system performance problems caused by too many sub-tables.
(3) Because the invention adopts the steps (2) to (5), the cold condition cache layer constructed by Redis takes the identification character string of the condition query request as a key, the condition query result which can not be obtained in the hot database is matched with the cold condition cache layer and is corresponding to the cold condition cache layer and is stored, and the quick data response effect is achieved when the same condition query is subsequently executed; and through the introduction of the cuckoo filter, the process of updating the new log data to the cold condition cache layer is efficiently optimized, and the integrity of the data in the cold condition cache layer is effectively kept without causing excessive performance influence on the system.
Drawings
FIG. 1 is a flow chart of a distributed conditional query method based on a cuckoo filter according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention provides a distributed condition query method based on a cuckoo filter, which comprises a cold condition cache layer, a hot data layer, a cold data layer and a cuckoo filter layer in an implementation mode. The method is mainly characterized in that when the system executes condition query, the system queries a hot data layer with small data volume preferentially, queries a cold condition cache layer only when the query result of the hot data layer does not meet the data volume of a query request, queries the cold data layer if the request history of the condition query does not exist in the cold condition cache layer, stores the query result of the cold data layer in the cold condition cache layer and updates the cuckoo filter.
The invention has the advantages that the distributed cold-hot separation and the table division operation are carried out on a huge log data set, so that the query pressure of a single library and a single table is reduced, and the execution time of a part of conditional queries is optimized; storing the condition query results which cannot be obtained in the hot database by using a cold condition cache layer constructed by Redis so as to achieve the effect of rapid data response when the same condition query is subsequently executed; the optimization of the log data condition query sacrifices the efficiency of a part of log data when new log data are inserted, each new log data needs to judge whether to be stored in a cold condition cache after being stored in a cold database, the purpose is to keep the data integrity in the cold condition cache, and then the method introduces a cuckoo filter to efficiently optimize the judgment process whether to be stored in the cache when the data are inserted.
As shown in fig. 1, the invention provides a distributed log condition query method based on a cuckoo filter, which is applied to an industrial control system and comprises the following steps:
(1) acquiring a condition query request sent by a client, performing data query in a pre-constructed hot database according to the condition query request, judging whether the total amount of queried data is lower than the data amount corresponding to the condition query request, if so, processing the condition query request to obtain an identification character string, and entering the step (2), otherwise, returning the queried data to the client, and ending the process;
specifically, the data in the present invention is industrial control log data.
As for the industrial control log data, generally, the latest data is the most frequently used data of the current system, the thermal database in this step is constructed in the following manner:
(a) acquiring log data stored in an industrial control system;
(b) and acquiring 10 thousands of log data which are latest to the current time of the industrial control system according to the generation time field of each log data, and storing the log data into a thermal database.
Specifically, the process of processing the conditional query request in this step to obtain the identification string includes the following sub-steps:
(1-1) obtaining the type of the log data corresponding to the condition query request;
(1-2) acquiring a condition matching value requested by a corresponding field in a condition query request according to a field condition query range and a field sequence allowed by a client to the log data of the type, judging whether the condition matching value is NULL, if so, setting the condition matching value to be a character string 'NULL', then entering the step (1-3), otherwise, keeping the condition matching value unchanged, and then entering the step (1-3);
(1-3) processing the type to which the log data belongs and the condition matching value obtained in the step (1-2) into character strings, and splicing the character strings by using the "&" symbol in sequence according to the field sequence obtained in the step (1-2) to obtain an identification character string.
Taking the conditional query request as an example, the request acquires the firewall policy type
The log data of 50 pieces (firewall _ policy _ log) requests that the source IP field (source _ IP) requiring log data takes the value of 192.168.10.10, the target port field (target _ port) takes the value of 5320, and the other fields do not take the value requirements; the range and the sequence of the fields of the log data of the firewall policy type are "target _ ip", "target _ port", "source _ ip" and "source _ port" in the conditional query provided by the client. The request will then be processed into the identification string "firewall _ policy _ log & NULL &5320&192.168.10.10& NULL".
(2) Taking the identification character string obtained in the step (1) as a Key (Key), performing data query in a cold condition cache layer to judge whether a Value (Value) corresponding to the Key exists in the cold condition cache layer, if so, returning a query result to the client, ending the process, otherwise, entering the step
(3);
Specifically, the cold condition cache layer mentioned in this step refers to a condition query request that fails to query the hot database for a sufficient amount of requested data and needs to query the cold database. The cold condition cache layer is constructed by Redis, wherein an identification character string of a condition query request is used as a key, and a corresponding query result of the condition query request in the cold database is a value, forms a key value pair and stores the key value pair in the Redis.
(3) Executing query operation in a pre-constructed cold database data sublist according to the conditional query request acquired in the step (1) to obtain a conditional query result;
specifically, the cold database data sub-table mentioned in this step is constructed by the following steps:
(a) the method comprises the steps of obtaining a cold database, wherein the cold database is formed by all log data received from the beginning of receiving log data by an industrial control system to the current time of the industrial control system;
(b) acquiring the belonged month of each type of log data according to the generation TIME field (OCCUR _ TIME) of each type of log data in the cold database;
(c) and (c) respectively storing the corresponding type log data into different month branch tables of the type log data in the cold database according to the belonged month of each type log data obtained in the step (b).
The advantage of this step is that it keeps all log data in the cold database, constructs the hot database as a data subset of the cold database, avoiding the frequent occurrence of the situation where data in the hot database needs to be constantly transferred to the cold database after losing warmth, at the cost of relatively small data redundancy.
(4) Processing the conditional query data result obtained in the step (3) to generate a JSON character string, forming a key value pair by taking the identification character string obtained in the step (1) as a key and the JSON character string as a value, and storing the key value pair in a cold condition cache layer;
(5) mapping the identification character string obtained in the step (1) into fingerprint data of one byte, and storing the fingerprint data into a cuckoo filter;
specifically, the cuckoo filter mentioned in this step is constructed by the following steps:
(a) opening a len data storage barrel, wherein len represents the length of the cuckoo filter and satisfies the formula len-2n(n∈N+);
(b) 4 storage locations are provided in each data bucket, each storage location occupying 1 byte for storing fingerprint information identifying a string.
Specifically, the present step includes the following substeps:
(5-1) mapping the identification string id _ Str to obtain 1 byte of fingerPrint information fp, namely fp ═ finger print (id _ Str), wherein finger print () is a mapping function;
(5-2) acquiring a bucket number pos (which is an integer type) according to the identification string id _ Str and the cuckoo filter length len;
specifically, the step adopts a formula pos ═ hash (id _ Str)% len;
(5-3) judging whether all 4 positions in the pos bucket have vacancies according to the bucket serial number obtained in the step (5-2), if so, entering the step (5-4), otherwise, entering the step (5-5);
(5-4) storing the fingerprint information fp obtained in the step (5-1) to the position of the empty position in the pos storage bucket, and ending the process;
(5-5) recording the fingerprint information fp as fp _1, randomly selecting one position from 4 positions in the pos storage bucket, taking out the fingerprint information fp _2 of the position, and storing the fingerprint information fp _1 into the position;
(5-6) recording the position of the second bucket as pos _1, and performing an exclusive-or operation between pos _1 and fp _2 to obtain a new position pos _2, i.e. pos _2 is pos _1 and fp _ 2;
(5-7) judging whether a vacancy exists in the new position pos _2, if so, storing the fingerprint information fp _2 into the position pos _2, and ending the process, otherwise, recording the fingerprint information fp _2 as fp, and then returning to the step (5-5).
The step has the advantage that the judgment process of whether the new log data needs to be stored in the cold condition cache layer or not can be optimized when the new log data is inserted and processed. And (5) generating an identification character string matched with the data, and passing the identification character string through the step (5-1) and the step (5-2) to obtain fingerprint information fp and a bucket serial number pos. At this time, only by judging whether the same fingerprint information fp exists in the pos-th bucket, the data only needs to be stored in the cold condition cache layer if the same fingerprint information exists in the buckets. The performance consumption caused by the fact that each piece of new log data needs to be connected and accesses the cold condition cache layer to perform storage judgment is avoided.
In summary, the main idea of the present invention is that when a system executes a condition query, the system preferentially queries a hot data layer with a small data size, queries a cold condition cache layer only when a query result of the hot data layer does not satisfy the data size of a query request, queries the cold data layer if a request history of the condition query does not exist in the cold condition cache layer, stores the query result of the cold data layer in the cold condition cache layer, and updates a cuckoo filter. And when the new log data are transmitted to the system, optimizing the process of ensuring the integrity of data of the cold condition cache layer according to the cuckoo filter.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A distributed log condition query method based on a cuckoo filter is applied to an industrial control system, and is characterized by comprising the following steps:
(1) acquiring a condition query request sent by a client, performing data query in a pre-constructed hot database according to the condition query request, judging whether the total amount of queried data is lower than the data amount corresponding to the condition query request, if so, processing the condition query request to obtain an identification character string, and entering the step (2), otherwise, returning the queried data to the client, and ending the process;
(2) taking the identification character string obtained in the step (1) as a key, and performing data query on the cold condition cache layer to judge whether a value corresponding to the key exists in the cold condition cache layer, if so, returning a query result to the client, ending the process, otherwise, entering the step (3);
(3) executing query operation in a pre-constructed cold database data sublist according to the conditional query request acquired in the step (1) to obtain a conditional query result;
(4) processing the conditional query data result obtained in the step (3) to generate a JSON character string, forming a key value pair by taking the identification character string obtained in the step (1) as a key and the JSON character string as a value, and storing the key value pair in a cold condition cache layer;
(5) and (3) mapping the identification character string obtained in the step (1) into fingerprint data of one byte, and storing the fingerprint data into a cuckoo filter.
2. The cuckoo-filter-based distributed log conditional query method of claim 1, wherein the thermal database is constructed by:
(a) acquiring log data stored in an industrial control system;
(b) and acquiring 10 thousands of log data which are latest to the current time of the industrial control system according to the generation time field of each log data, and storing the log data into a thermal database.
3. The cuckoo-filter-based distributed log conditional query method according to claim 1 or 2, wherein the processing of the conditional query request in step (1) to obtain the identification string comprises the following sub-steps:
(1-1) obtaining the type of the log data corresponding to the condition query request;
(1-2) acquiring a condition matching value requested by a corresponding field in a condition query request according to a field condition query range and a field sequence allowed by a client to the log data of the type, judging whether the condition matching value is NULL, if so, setting the condition matching value to be a character string 'NULL', then entering the step (1-3), otherwise, keeping the condition matching value unchanged, and then entering the step (1-3);
(1-3) processing the type to which the log data belongs and the condition matching value obtained in the step (1-2) into character strings, and splicing the character strings by using the "&" symbol in sequence according to the field sequence obtained in the step (1-2) to obtain an identification character string.
4. The distributed log conditional query method based on cuckoo filters of claim 3, wherein the cold door condition cache layer in step (2) refers to a conditional query request that fails to query in the hot database for a sufficient amount of requested data and needs to query the cold database; the cold condition cache layer is constructed by Redis, wherein an identification character string of a condition query request is used as a key, and a corresponding query result of the condition query request in the cold database is a value, forms a key value pair and stores the key value pair in the Redis.
5. The cuckoo-filter-based distributed log conditional query method of claim 4, wherein the cold database data sub-table in step (3) is constructed by the following steps:
(a) the method comprises the steps of obtaining a cold database, wherein the cold database is formed by all log data received from the beginning of receiving log data by an industrial control system to the current time of the industrial control system;
(b) acquiring the month to which each type of log data belongs according to the generation time field of each type of log data in the cold database;
(c) and (c) respectively storing the corresponding type log data into different month branch tables of the type log data in the cold database according to the belonged month of each type log data obtained in the step (b).
6. The cuckoo-filter-based distributed log conditional query method of claim 5, wherein the cuckoo filter in step (5) is constructed by the following steps:
(a) opening a len data storage barrel, wherein len represents the length of the cuckoo filter and satisfies the formula len-2n(n∈N+);
(b) 4 storage locations are provided in each data bucket, each storage location occupying 1 byte for storing fingerprint information identifying a string.
7. The cuckoo-filter-based distributed log conditional query method of claim 6, wherein step (5) comprises the sub-steps of:
(5-1) mapping the identification string id _ Str to obtain 1 byte of fingerPrint information fp, namely fp ═ finger print (id _ Str), wherein finger print () is a mapping function;
(5-2) acquiring a bucket serial number pos according to the identification character string id _ Str and the cuckoo filter length len, namely:
pos=hash(id_Str)%len;
(5-3) judging whether all 4 positions in the pos bucket have vacancies according to the bucket serial number obtained in the step (5-2), if so, entering the step (5-4), otherwise, entering the step (5-5);
(5-4) storing the fingerprint information fp obtained in the step (5-1) to the position of the empty position in the pos storage bucket, and ending the process;
(5-5) recording the fingerprint information fp as fp _1, randomly selecting one position from 4 positions in the pos storage bucket, taking out the fingerprint information fp _2 of the position, and storing the fingerprint information fp _1 into the position;
(5-6) the position of the pos bucket is denoted as pos _1, and the new position pos _2 is obtained by XOR operation between pos _1 and fp _2, that is, the position pos _2 is obtained
pos_2=pos_1⊕fp_2;
(5-7) judging whether a vacancy exists in the new position pos _2, if so, storing the fingerprint information fp _2 into the position pos _2, and ending the process, otherwise, recording the fingerprint information fp _2 as fp, and then returning to the step (5-5).
8. A distributed log condition query system based on a cuckoo filter is applied to an industrial control system, and is characterized by comprising:
the first module is used for acquiring a condition query request sent by a client, performing data query in a pre-constructed hot database according to the condition query request, judging whether the total amount of queried data is lower than the data amount corresponding to the condition query request, if so, processing the condition query request to obtain an identification character string, entering the second module, otherwise, returning the queried data to the client, and ending the process;
the second module is used for performing data query on the cold condition cache layer by taking the identification character string obtained by the first module as a key so as to judge whether a value corresponding to the key exists in the cold condition cache layer, if so, returning a query result to the client, and if not, entering the third module;
the third module is used for executing query operation in a pre-constructed cold database data sub-table according to the condition query request acquired by the first module to obtain a condition query result;
the fourth module is used for processing the result of the conditional query data obtained by the third module to generate a JSON character string, forming a key-value pair by taking the identification character string obtained by the first module as a key and the JSON character string as a value, and storing the key-value pair in the cold condition cache layer;
and the fifth module is used for mapping the identification character string obtained by the first module into fingerprint data of one byte and storing the fingerprint data into the cuckoo filter.
CN202110300026.8A 2021-03-22 2021-03-22 Distributed log condition query method and system based on cuckoo filter Active CN112988666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110300026.8A CN112988666B (en) 2021-03-22 2021-03-22 Distributed log condition query method and system based on cuckoo filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110300026.8A CN112988666B (en) 2021-03-22 2021-03-22 Distributed log condition query method and system based on cuckoo filter

Publications (2)

Publication Number Publication Date
CN112988666A CN112988666A (en) 2021-06-18
CN112988666B true CN112988666B (en) 2022-04-22

Family

ID=76332726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110300026.8A Active CN112988666B (en) 2021-03-22 2021-03-22 Distributed log condition query method and system based on cuckoo filter

Country Status (1)

Country Link
CN (1) CN112988666B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9298726B1 (en) * 2012-10-01 2016-03-29 Netapp, Inc. Techniques for using a bloom filter in a duplication operation
CN111563222A (en) * 2020-05-07 2020-08-21 安徽龙讯信息科技有限公司 Content operation supervision system based on intensive website platform
US10756757B2 (en) * 2016-06-03 2020-08-25 Dell Products L.P. Maintaining data deduplication reference information
CN111797134A (en) * 2020-06-23 2020-10-20 北京小米松果电子有限公司 Data query method and device of distributed database and storage medium
CN112054864A (en) * 2014-12-01 2020-12-08 谷歌有限责任公司 System and method for identifying users watching television advertisements

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370225A1 (en) * 2017-08-16 2019-12-05 Mapr Technologies, Inc. Tiered storage in a distributed file system
US11178246B2 (en) * 2018-08-25 2021-11-16 Panzura, Llc Managing cloud-based storage using a time-series database
US11467967B2 (en) * 2018-08-25 2022-10-11 Panzura, Llc Managing a distributed cache in a cloud-based distributed computing environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9298726B1 (en) * 2012-10-01 2016-03-29 Netapp, Inc. Techniques for using a bloom filter in a duplication operation
CN112054864A (en) * 2014-12-01 2020-12-08 谷歌有限责任公司 System and method for identifying users watching television advertisements
US10756757B2 (en) * 2016-06-03 2020-08-25 Dell Products L.P. Maintaining data deduplication reference information
CN111563222A (en) * 2020-05-07 2020-08-21 安徽龙讯信息科技有限公司 Content operation supervision system based on intensive website platform
CN111797134A (en) * 2020-06-23 2020-10-20 北京小米松果电子有限公司 Data query method and device of distributed database and storage medium

Also Published As

Publication number Publication date
CN112988666A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
US9542442B2 (en) Accessing data in a column store database based on hardware compatible indexing and replicated reordered columns
US8725730B2 (en) Responding to a query in a data processing system
US7647417B1 (en) Object cacheability with ICAP
US8005868B2 (en) System and method for multiple distinct aggregate queries
US7966343B2 (en) Accessing data in a column store database based on hardware compatible data structures
KR100745883B1 (en) A transparent edge-of-network data cache
EP3832496A1 (en) Improved proxy server caching of database content
US8185546B2 (en) Enhanced control to users to populate a cache in a database system
US8560509B2 (en) Incremental computing for web search
US8843436B2 (en) Systems and methods for performing direct reporting access to transaction databases
AU2017243870B2 (en) "Methods and systems for database optimisation"
CN111046034A (en) Method and system for managing memory data and maintaining data in memory
US9218394B2 (en) Reading rows from memory prior to reading rows from secondary storage
US20100274795A1 (en) Method and system for implementing a composite database
US20120290595A1 (en) Super-records
US10528590B2 (en) Optimizing a query with extrema function using in-memory data summaries on the storage server
US11567934B2 (en) Consistent client-side caching for fine grained invalidations
US11580123B2 (en) Columnar techniques for big metadata management
CN112988666B (en) Distributed log condition query method and system based on cuckoo filter
WO2016175880A1 (en) Merging incoming data in a database
US11222003B1 (en) Executing transactions for a hierarchy of data objects stored in a non-transactional data store
EP3436988B1 (en) "methods and systems for database optimisation"
US12026168B2 (en) Columnar techniques for big metadata management
CN115455031B (en) Data query method, device, storage medium and equipment of Doris
US20230334037A1 (en) System and method for data analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220701

Address after: 410000 No. 102, Heguang Road, Xianghu street, Furong district, Changsha City, Hunan Province

Patentee after: Hunan Kuangan Network Technology Co.,Ltd.

Address before: Yuelu District City, Hunan province 410082 Changsha Lushan Road No. 1

Patentee before: HUNAN University

Patentee before: Hunan kuang'an Network Technology Co., Ltd

TR01 Transfer of patent right