CN113900886A - Abnormal log monitoring method - Google Patents

Abnormal log monitoring method Download PDF

Info

Publication number
CN113900886A
CN113900886A CN202111039082.7A CN202111039082A CN113900886A CN 113900886 A CN113900886 A CN 113900886A CN 202111039082 A CN202111039082 A CN 202111039082A CN 113900886 A CN113900886 A CN 113900886A
Authority
CN
China
Prior art keywords
bit
adopting
monitoring method
elements
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111039082.7A
Other languages
Chinese (zh)
Inventor
宋勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN202111039082.7A priority Critical patent/CN113900886A/en
Publication of CN113900886A publication Critical patent/CN113900886A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention particularly relates to an abnormal log monitoring method. The abnormal log monitoring method comprises the steps of storing all known elements to form a set R, and judging whether an element x exists in the set R by adopting a Bloom Filter as a probability algorithm; separating data into different areas by adopting a double-layer bucket algorithm design idea, and then respectively sequencing elements in each area by adopting a Bit-map method; and searching and storing a set R corresponding to the element x in the document set and the position of the element x in the set R by adopting an inverted index searching method. The abnormal log monitoring method can timely monitor abnormal log information in a log environment with high concurrency and large data volume, effectively collect and classify the log information, assist workers to find problems in the shortest time and ensure normal and stable operation of the system.

Description

Abnormal log monitoring method
Technical Field
The invention relates to the technical field of system monitoring, in particular to an abnormal log monitoring method.
Background
Improving the service quality and ensuring the safe operation of the network in the network management is an important research subject. Network management needs the support of network log data, and a log data acquisition technology becomes an important research content.
The system log is a very critical component, and can record information of hardware, software and system problems in the system, including the system log, the application log and the security log. Originally, the main use-oriented object of logs was software engineers, who examined problems by reading log information, because system log information was critical to determine the root cause of a failure or to narrow down the scope of system attacks. The system log can enable engineers to quickly know all events before a fault or attack occurs, and can be used for checking the reason of the error or searching traces left by an attacker when the attack occurs. Of course, it is also critical to develop a good set of system logging policies for a virtualized environment, as system logs need to be associated with many different external components.
Aiming at the problems of large-batch log monitoring and log data acquisition, at present, several schemes are mainly popular, including fluent, Logstash, Flume, scriber and the like, wherein LogAgent is adopted in the interior of the Alibara, and LogTail is adopted in the Aliskiu cloud. Fluent in these products takes absolute advantage and successfully resides in CNCF (Cloud Native Computing Foundation), and the proposed Unified Logging Layer (Unified Logging Layer) greatly reduces the complexity of the whole log collection and analysis. Fluentd considers that most existing log formats are poorly structured, which benefits from the excellent ability of humans to parse log data, since log data is initially human-oriented, humans being their primary log data consumers. Therefore, the fluent hopes to reduce the complexity of the whole log collection access by unifying the log storage format, supposing that the log data input under the assumption is in M formats, and N kinds of storage are accessed at the rear end of the log collection Agent (Agent), so that each storage system needs to realize the function of analyzing the M kinds of log formats, the total complexity is M × N, and if the log collection Agent unifies the log formats, the total complexity is M + N. This is the core idea of fluent, and its plug-in mechanism is a favorable place. Logstack and fluent are similar to the ELK technology stack and are widely used in the industry.
In consideration of the characteristic of large log data magnitude, an effective system log strategy can effectively help technicians to better organize a log structure and accurately find log information. The system logging strategy can send warning information to the user just when the fault occurs, and help find the problem in the shortest time. Today, a large number of machines process log data day and night for use by offline and online analysis systems to generate readable reports to assist humans in making decisions.
The invention provides an abnormal log monitoring method aiming at monitoring abnormal conditions and acquiring abnormal information data in the running process of an e-government affair system, and aims to accurately find abnormal information generated in the running process of the system when the system generates large service data volume and log information at TB level and is distributed at different nodes.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient abnormal log monitoring method.
The invention is realized by the following technical scheme:
an abnormal log monitoring method is characterized in that: the method for optimizing and improving the network monitoring point deployment algorithm and the log data acquisition method in the log data acquisition specifically comprises the following steps:
firstly, storing all known elements to form a set R, and judging whether an element x exists in the set R by adopting a Bloom Filter (BF) as a probability algorithm;
secondly, separating data into different areas by adopting a double-layer bucket algorithm design idea, and then respectively sequencing elements in each area by adopting a Bit-map method;
and thirdly, searching and storing a set R corresponding to the element x in the document set and the position of the element x in the set R by adopting an inverted index searching method.
In the first step, a Bloom Filter (BF) maps URLs (Uniform Resource locators) of elements in the set R to a certain bit in a binary bit array (bitmap array) by using a Hash table data structure, and if a bit of an element x corresponding to the binary bit array is already set to 1, it indicates that the element x exists in the set R.
Hash has a collision problem, and the values of two URLs obtained by using the same Hash are probably the same. In order to reduce the conflict, in the first step, a plurality of different Hash functions are used to obtain different Hash table data structures to judge whether the element x exists in the set R;
when the Hash table data structure obtained by any Hash function judges that the element x does not exist in the set R, the element x can be determined not to exist in the set R;
when the Hash table data structures obtained by all Hash functions judge that the element x exists in the set R, the element x is determined to exist in the set R.
In the first step, the implementation steps of judging whether the element x exists in the set R by a Bloom Filter (BF) are as follows:
1) a Bloom Filter (BF) uses a binary digit array with m bits to store information, and in an initial state, the binary digit array comprises the bit array with m bits, each bit is 0, namely the elements of the whole binary digit array are all set to be 0;
2) mapping each element in the set R ═ { x1, x2, …, xn } into a range of {1, …, m } using k mutually independent Hash functions (Hash functions), respectively;
3) when judging whether the element x belongs to the set R, k hash values are obtained by using k hash functions for the element x, if the positions of all hashi (x) are all 1(i, k are natural numbers, i is more than or equal to 1 and less than or equal to k), namely k positions are all set to be 1, the element x is considered to be an element in the set R, otherwise, the element x is considered not to be an element in the set R.
In the step 2), when any element y is added to the Bloom Filter (BF), k hash functions are used to obtain k hash values, and then the corresponding bit in the binary bit array is set to 1, that is, the position hashi (y) mapped by the ith hash function is set to 1(i is greater than or equal to 1 and is less than or equal to k).
In the step 2), when one position is set to be 1 for multiple times, only the first setting is valid, and the later settings are all invalid.
In the second step, a large amount of data to be processed is divided for multiple times by adopting a design idea of a double-layer barrel algorithm, the range is determined step by step, and finally data units which can be processed independently are formed; and when the elements need to be sequenced, processing each data unit by adopting a Bit-map method respectively.
In the second step, when n (m, n are natural numbers, m > n) elements in the m elements need to be sorted, the implementation steps of sorting by using a Bit Map algorithm are as follows:
1) opening up the space of m Bit positions by adopting a Bit-map method, and setting the m Bit positions as 0;
2) and traversing n elements needing to be sorted in sequence, and setting the corresponding bit positions of the elements to be 1.
The invention has the beneficial effects that: the abnormal log monitoring method can timely monitor abnormal log information in a log environment with high concurrency and large data volume, effectively collect and classify the log information, assist workers to find problems in the shortest time and ensure normal and stable operation of the system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a sorting method using a Bit Map algorithm according to the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Monitoring (Monitoring) and Logging (Logging) are among the most critical infrastructures in large distributed systems. Because of no monitoring, there is no way to know the operation of the service, and there is no Down machine in the cluster, whether the CPU usage and load of the machine are normal, whether the Traffic of the website is normal, and whether the error rate of the service is within a tolerable range. In short, monitoring allows us to know the operation and availability of a website in real time. Therefore, for the detection of the network and the system operation condition, the collection and processing condition of the log are already one of the standards for measuring the stable operation condition of the system.
Bloom Filter (BF) is a space-efficient random data structure that uses bit arrays to represent a set very compactly and to determine whether an element belongs to the set. It is a fast probabilistic algorithm that determines whether a set of elements exists. The Bloom Filter may make a false determination, but does not miss the determination. That is, the Bloom Filter decision element is no longer aggregated, and that is certainly not. If the judgment element exists in the set, the judgment is wrong with a certain probability. Thus, Bloom filters are not suitable for "zero error" applications. And in the application occasion that can tolerate low error rate, the Bloom Filter greatly saves space compared with other common algorithms (such as hash, half-searching). The method has the advantages that the space efficiency and the query time far exceed those of a common algorithm, and the defects of certain misrecognition rate and difficulty in deletion are overcome.
For the problem of network monitoring point deployment, the problem that monitoring points arranged in an original network are not easy to change after the topological structure of a distributed network is expanded is considered, and an increment selection method of the increment network monitoring points is optimized.
The abnormal log monitoring method optimizes and improves a network monitoring point deployment algorithm and a log data acquisition method in log data acquisition, and specifically comprises the following steps:
firstly, storing all known elements to form a set R, and judging whether an element x exists in the set R by adopting a Bloom Filter (BF) as a probability algorithm;
secondly, separating data into different areas by adopting a double-layer bucket algorithm design idea, and then respectively sequencing elements in each area by adopting a Bit-map method;
and thirdly, searching and storing a set R corresponding to the element x in the document set and the position of the element x in the set R by adopting an inverted index searching method.
Calculating whether a certain element x is in a set, firstly, the conceivable method is to store all known elements to form a set R, and then, comparing the element x with the elements in the set R one by one to judge whether the element x exists in the set R; the method can be realized by adopting a data structure such as a linked list. However, as the number of elements in the set R increases, the memory occupied by the elements increases. If tens of millions of different web pages need to be downloaded, the required memory can occupy the memory address space of the whole process. Even if the methods of MD5 and UUID are used to convert URLs into fixed short strings, the memory usage is quite large.
In the first step, a Bloom Filter (BF) maps URLs (Uniform Resource locators) of elements in the set R to a certain bit in a binary bit array (bitmap array) by using a Hash table data structure, and if a bit of an element x corresponding to the binary bit array is already set to 1, it indicates that the element x exists in the set R.
Hash has a collision problem, and the values of two URLs obtained by using the same Hash are probably the same. In order to reduce the conflict, in the first step, a plurality of different Hash functions are used to obtain different Hash table data structures to judge whether the element x exists in the set R;
when the Hash table data structure obtained by any Hash function judges that the element x does not exist in the set R, the element x can be determined not to exist in the set R;
when the Hash table data structures obtained by all Hash functions judge that the element x exists in the set R, the element x is determined to exist in the set R.
In the first step, the implementation steps of judging whether the element x exists in the set R by a Bloom Filter (BF) are as follows:
1) a Bloom Filter (BF) uses a binary digit array with m bits to store information, and in an initial state, the binary digit array comprises the bit array with m bits, each bit is 0, namely the elements of the whole binary digit array are all set to be 0;
2) mapping each element in the set R ═ { x1, x2, …, xn } into a range of {1, …, m } using k mutually independent Hash functions (Hash functions), respectively;
when any element y is added in Bloom Filter (BF), k hash functions are used to obtain k hash values, and then corresponding bits in the binary bit array are set to be 1, namely the position hashi (y) mapped by the ith hash function is set to be 1(i, k are natural numbers, i is more than or equal to 1 and less than or equal to k);
in the step 2), when one position is set to be 1 for multiple times, only the first setting is valid, and the later settings are all invalid.
3) When judging whether the element x belongs to the set R, k hash values are obtained by using k hash functions for the element x, if the positions of all hashi (x) are all 1(i is more than or equal to 1 and less than or equal to k), namely k positions are all set to be 1, the element x is considered to be an element in the set R, otherwise, the element x is not considered to be an element in the set R.
Double-layer buckets are an algorithm design idea. When a pile of large amount of data cannot be processed directly without using a direct addressing table, the data can be divided into small units, and then the small units are processed according to a certain strategy, thereby achieving the purpose. In the second step, a large amount of data to be processed is divided for multiple times by adopting a design idea of a double-layer barrel algorithm, the range is determined step by step, and finally data units which can be processed independently are formed; and when the elements need to be sequenced, processing each data unit by adopting a Bit-map method respectively.
By reducing multiple times, a double layer is only an example, and divide and conquer is the root (only "divide and conquer"). This idea can also be used when it is sometimes necessary to construct a large data with a small range of data, in contrast to the inverse of this.
For example, to find the number of non-repeating integers out of 2.5 million integers, the memory space is not sufficient to accommodate the 2.5 million integers. Just like the pigeon nest principle, when the integer number is 2^32, the 2^32 number can be divided into 2^8 ^ 256 areas (for example, a single file represents an area), then the data is separated into different areas, and then the different areas are processed by using the Bit Map algorithm. The solution can be conveniently realized as long as enough disk space is available.
In the second step, when n (m, n are natural numbers, m > n) elements in the m elements need to be sorted, the implementation steps of sorting by using a Bit Map algorithm are as follows:
1) opening up the space of m Bit positions by adopting a Bit-map method, and setting the m Bit positions as 0;
2) and traversing n elements needing to be sorted in sequence, and setting the corresponding bit positions of the elements to be 1.
For example, 5 elements (4, 7, 2, 5, 3) within 0-7 are to be sorted (assuming there is no repetition of these elements). To represent 8 numbers, only 8 bits (1Bytes) are needed, and first a space of 1Byte is created, and all Bit positions of the space are set to 0.
Then go through these 5 elements, first the first element is 4, then the corresponding position of 4 is 1 (p + (i/8) | (0x01< (i% 8) can be operated in this way), where the operation relates to the case of Big-ending and Little-ending, here the Big-ending is defaulted), because it is from zero, the fifth position is 1 (as shown in fig. 1):
then the second element 7 is processed again, the eighth Bit is set to 1, then the third element is processed again until all elements are processed finally, the corresponding position is 1, and the state of the Bit of the memory at this time is as shown in fig. 1.
The index is a data structure for fast data search, and a hash table, a binary search and a block search can also be regarded as an index, and the index has the value of obtaining the most relevant, the most complete and the deepest data set in a shorter time. Most commonly used indexes are those based on a sequence table, a hash table, or a B + tree. The inverted index (InvertedIndex) is mainly used in the field of information retrieval, and is a most commonly used data index storage structure, and is used to store a document set corresponding to a word in the document set and a position of the word existing in a file, and therefore is also called a reverse index, and corresponding to the reverse index, is a forward index, and the forward index is used to store a word list of each file and record the position thereof, and the following files and corresponding sentence contents are exemplified as follows.
Figure BDA0003248398170000071
And searching and storing the file corresponding to the element (algorithm, data and mathematics) in the document set and the position of the element x in the file by using an inverted index searching method.
Figure BDA0003248398170000072
Figure BDA0003248398170000081
The above-described embodiment is only one specific embodiment of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (8)

1. An abnormal log monitoring method is characterized in that: the method for optimizing and improving the network monitoring point deployment algorithm and the log data acquisition method in the log data acquisition specifically comprises the following steps:
firstly, storing all known elements to form a set R, and judging whether an element x exists in the set R by adopting a Bloom Filter as a probability algorithm;
secondly, separating data into different areas by adopting a double-layer bucket algorithm design idea, and then respectively sequencing elements in each area by adopting a Bit-map method;
and thirdly, searching and storing a set R corresponding to the element x in the document set and the position of the element x in the set R by adopting an inverted index searching method.
2. The anomaly log monitoring method according to claim 1, wherein: in the first step, the Bloom Filter maps the URL of each element in the set R to a bit in the binary digit array by using the Hash table data structure, and if the bit of the binary digit array corresponding to the element x is already set to 1, it indicates that the element x exists in the set R.
3. The anomaly log monitoring method according to claim 2, wherein: in the first step, different Hash table data structures are obtained by utilizing a plurality of different Hash functions to judge whether the element x exists in the set R or not;
when the Hash table data structure obtained by any Hash function judges that the element x does not exist in the set R, the element x can be determined not to exist in the set R;
when the Hash table data structures obtained by all Hash functions judge that the element x exists in the set R, the element x is determined to exist in the set R.
4. The anomaly log monitoring method according to claim 3, wherein: in the first step, the implementation steps of judging whether the element x exists in the set R by the Bloom Filter are as follows:
1) the Bloom Filter uses a binary digit array with m bits to store information, and in an initial state, the binary digit array comprises the bit array with m bits, each bit is 0, namely, the elements of the whole binary digit array are all set to be 0;
2) mapping each element in a set R ═ { x1, x2, …, xn } into a range of {1, …, m } respectively using k mutually independent hash functions;
3) when judging whether the element x belongs to the set R, k hash values are obtained by using k hash functions for the element x, if the positions of all hashi (x) are all 1, namely k positions are set to be 1, the element x is considered to be an element in the set R, otherwise, the element x is considered not to be an element in the set R.
5. The anomaly log monitoring method according to claim 4, wherein: in the step 2), when any element y is added to the Bloom Filter, k hash functions are used to obtain k hash values, and then the corresponding bit in the binary bit array is set to 1, that is, the position hashi (y) mapped by the ith hash function is set to 1.
6. The anomaly log monitoring method according to claim 5, wherein: in the step 2), when one position is set to be 1 for multiple times, only the first setting is valid, and the later settings are all invalid.
7. The anomaly log monitoring method according to claim 1, wherein: in the second step, a large amount of data to be processed is divided for multiple times by adopting a design idea of a double-layer barrel algorithm, the range is determined step by step, and finally data units which can be processed independently are formed; and when the elements need to be sequenced, processing each data unit by adopting a Bit-map method respectively.
8. The abnormality log monitoring method according to claim 1 or 7, characterized in that: in the second step, when n elements in the m elements need to be sorted, the implementation steps of sorting by using a Bit Map algorithm are as follows:
1) opening up the space of m Bit positions by adopting a Bit-map method, and setting the m Bit positions as 0;
2) and traversing n elements needing to be sorted in sequence, and setting the corresponding bit positions of the elements to be 1.
CN202111039082.7A 2021-09-06 2021-09-06 Abnormal log monitoring method Pending CN113900886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111039082.7A CN113900886A (en) 2021-09-06 2021-09-06 Abnormal log monitoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111039082.7A CN113900886A (en) 2021-09-06 2021-09-06 Abnormal log monitoring method

Publications (1)

Publication Number Publication Date
CN113900886A true CN113900886A (en) 2022-01-07

Family

ID=79188744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111039082.7A Pending CN113900886A (en) 2021-09-06 2021-09-06 Abnormal log monitoring method

Country Status (1)

Country Link
CN (1) CN113900886A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117520410A (en) * 2023-11-03 2024-02-06 华青融天(北京)软件股份有限公司 Service data processing method, device, electronic equipment and computer readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799783A (en) * 2009-01-19 2010-08-11 中国人民大学 Data storing and processing method, searching method and device thereof
CN102821164A (en) * 2012-08-31 2012-12-12 河海大学 Efficient parallel-distribution type data processing system
CN105577455A (en) * 2016-03-07 2016-05-11 达而观信息科技(上海)有限公司 Method and system for performing real-time UV statistic of massive logs
EP2487610B1 (en) * 2011-02-10 2019-01-16 Deutsche Telekom AG A method for generating a randomized data structure for representing sets, based on bloom filters

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799783A (en) * 2009-01-19 2010-08-11 中国人民大学 Data storing and processing method, searching method and device thereof
EP2487610B1 (en) * 2011-02-10 2019-01-16 Deutsche Telekom AG A method for generating a randomized data structure for representing sets, based on bloom filters
CN102821164A (en) * 2012-08-31 2012-12-12 河海大学 Efficient parallel-distribution type data processing system
CN105577455A (en) * 2016-03-07 2016-05-11 达而观信息科技(上海)有限公司 Method and system for performing real-time UV statistic of massive logs

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
佚名: "Bitmap的原理和应用", pages 1, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/67920410> *
樊重俊等: "大数据分析与应用", vol. 1, 31 January 2016, 立信会计出版社, pages: 247 - 251 *
苏高: "大数据时代的营销与商业分析", vol. 1, 31 October 2014, 中国铁道出版社, pages: 311 - 312 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117520410A (en) * 2023-11-03 2024-02-06 华青融天(北京)软件股份有限公司 Service data processing method, device, electronic equipment and computer readable medium

Similar Documents

Publication Publication Date Title
US11182366B2 (en) Comparing data stores using hash sums on disparate parallel systems
CN109213756B (en) Data storage method, data retrieval method, data storage device, data retrieval device, server and storage medium
JP6716727B2 (en) Streaming data distributed processing method and apparatus
US20100312749A1 (en) Scalable lookup service for distributed database
CN102648468B (en) Table search device, table search method, and table search system
CN102609446B (en) Distributed Bloom filter system and application method thereof
CN102142032B (en) Method and system for reading and writing data of distributed file system
CN105069111A (en) Similarity based data-block-grade data duplication removal method for cloud storage
CN104077423A (en) Consistent hash based structural data storage, inquiry and migration method
CN106407224A (en) Method and device for file compaction in KV (Key-Value)-Store system
CN112434000A (en) Small file merging method, device and equipment based on HDFS
Moia et al. Similarity digest search: A survey and comparative analysis of strategies to perform known file filtering using approximate matching
CN107180034A (en) The group system of MySQL database
CN107391761A (en) A kind of data managing method and device based on data de-duplication technology
CN113900886A (en) Abnormal log monitoring method
CN110598467A (en) Memory data block integrity checking method
CN111078975B (en) Multi-node incremental data acquisition system and acquisition method
Feng et al. An efficient caching mechanism for network-based url filtering by multi-level counting bloom filters
Blustein et al. Bloom filters. a tutorial, analysis, and survey
CN116226139A (en) Distributed storage and processing method and system suitable for large-scale ocean data
CN106709045B (en) Node selection method and device in distributed file system
CN116318800A (en) BGP route data monitoring method and device and electronic equipment
CN106250440B (en) Document management method and device
CN114880297A (en) Distributed data deduplication method and system based on fingerprints
Phyu et al. Using Bloom filter array (BFA) to speed up the lookup in distributed storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination