CN114741368A - Log data statistical method based on artificial intelligence and related equipment - Google Patents

Log data statistical method based on artificial intelligence and related equipment Download PDF

Info

Publication number
CN114741368A
CN114741368A CN202210378426.5A CN202210378426A CN114741368A CN 114741368 A CN114741368 A CN 114741368A CN 202210378426 A CN202210378426 A CN 202210378426A CN 114741368 A CN114741368 A CN 114741368A
Authority
CN
China
Prior art keywords
data
data set
target
log data
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210378426.5A
Other languages
Chinese (zh)
Inventor
冯洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210378426.5A priority Critical patent/CN114741368A/en
Publication of CN114741368A publication Critical patent/CN114741368A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a log data statistical method and device based on artificial intelligence, an electronic device and a storage medium, wherein the log data statistical method based on artificial intelligence comprises the following steps: receiving a log data statistical request according to a search system, and verifying the statistical request; if the verification is passed, the search system searches the log data acquired from the server according to the statistical request to acquire a first target data set; dividing the first target data set according to a preset threshold value to obtain a second target data set; compressing the second target data set according to preset logic to obtain a target index data set; and carrying out data statistics according to the target index data set to obtain target log data. According to the method and the device, the log data can be rapidly counted by using the constructed index value on the basis of saving the storage space of the log data, and the counting efficiency of the large-scale log data is improved.

Description

Log data statistical method based on artificial intelligence and related equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a log data statistics method and apparatus, an electronic device, and a storage medium based on artificial intelligence.
Background
An elastic search (ES for short) is a distributed full-text search server based on the Lucene bottom layer technology, and can realize rapid query to a certain extent through a mechanism for improving data storage and filtering performance.
The analysis statistics of the logs are an important basis in the work of the log system, and many log systems in the industry store the logs in the Elasticissearch cluster. However, when performing statistical analysis on large-scale log data, the Elasticsearch cluster response may be slow or directly error-reported, thereby greatly reducing the statistical efficiency on large-scale log data.
Disclosure of Invention
In view of the foregoing, it is necessary to provide an artificial intelligence based log data statistics method and related apparatus to solve the technical problem of how to improve the statistical efficiency of large-scale log data, where the related apparatus includes an artificial intelligence based log data statistics apparatus, an electronic device and a storage medium.
The application provides a log data statistical method based on artificial intelligence, which comprises the following steps:
receiving a log data statistical request according to a search system, and verifying the statistical request;
if the verification is passed, the search system searches the log data acquired from the server according to the statistical request to acquire a first target data set;
dividing the first target data set according to a preset threshold value to obtain a second target data set;
compressing the second target data set according to preset logic to obtain a target index data set;
and carrying out data statistics according to the target index data set to obtain target log data.
Therefore, the log data are classified and stored, and the index value is constructed after the log data are compressed according to the preset logic, so that the log data can be rapidly counted by using the constructed index value on the basis of saving the storage space of the log data, and the counting efficiency of the large-scale log data is improved.
In some embodiments, said receiving a log data statistics request pursuant to a search system and validating said statistics request comprises:
setting encoding labels for log data of different data types according to a preset mode;
and judging whether the data type in the statistical request contains a corresponding encoding label or not based on the encoding label so as to determine whether the statistical request is qualified or not, and if so, passing the verification.
Therefore, whether the statistical request is qualified or not can be judged through the set coding label, so that the accuracy of the statistical request of a user is ensured, and the waste of system resources caused by abnormal statistical requests is prevented.
In some embodiments, if the verification is successful, the searching, by the search system, for the log data obtained from the server according to the statistical request to obtain the first target data set includes:
the search system collects corresponding log data based on the data type, the time range and the value range of the log data, and stores the collected log data in a column mode to serve as the first target data set.
Therefore, the search system can quickly acquire corresponding log data from the server according to the statistical request given by the user, and provides accurate data support for the subsequent process.
In some embodiments, the dividing the first target data set according to the preset threshold to obtain the second target data set includes:
judging the data volume of the first target data set according to a preset threshold value to obtain a judgment result;
partitioning the first target data set based on the judgment result to obtain a partitioned data set;
and performing batch division on each partition data in the partition data set to obtain the second target data set.
Therefore, by further dividing the data in the first target data set, concurrent statistics can be simultaneously performed on a plurality of pieces of log data in the partition data set in the subsequent process, so that the statistical efficiency of the log data is improved.
In some embodiments, the partitioning the first target dataset based on the determination to obtain a partitioned dataset comprises:
if the data volume of the first target data set is smaller than a preset threshold value, taking the first target data set as the partition data set;
and if the data volume of the first target data set is larger than a preset threshold, dividing the first target data set by taking the preset threshold as a unit to obtain the partitioned data set.
Therefore, when a large amount of log data is processed, the search range of the log data in the subsequent process can be effectively reduced by partitioning the first target data set, and the statistical efficiency is further improved.
In some embodiments, the batch partitioning each partition data in the partition data sets to obtain the second target data set includes:
sorting the data of the partitions from large to small according to the data quantity of the data in the same partition to obtain a sorting data table;
calculating cosine similarity between every two adjacent data in the sorting data table according to a cosine similarity algorithm;
and dividing each partition data in the partition data set in batches according to a user-defined clustering algorithm and cosine similarity between each adjacent data in the sorting data table to obtain the second target data set.
Therefore, data with high similarity can be arranged together, corresponding index values can be generated conveniently in the subsequent process, corresponding and relevant log data can be rapidly counted according to the index values, and the counting efficiency is improved.
In some embodiments, the compressing the second target data set according to the preset logic to obtain the target index data set includes:
compressing the data in the second target data set according to a preset logic to obtain a compressed data set;
and converting the data in the compressed data set according to a compression algorithm to construct the target index data set.
Therefore, the log data can be compressed, so that a corresponding target index data set is constructed on the basis of effectively reducing the storage space, and the rapid statistics of the log data by using the indexes is realized.
The embodiment of the present application further provides a log data statistics device based on artificial intelligence, the device includes:
the verification unit is used for receiving a log data statistical request according to a search system and verifying the statistical request;
the acquisition unit is used for searching the log data acquired from the server by the search system according to the statistical request to acquire a first target data set if the verification is passed;
the dividing unit is used for dividing the first target data set according to a preset threshold value to obtain a second target data set;
the compression unit is used for compressing the second target data set according to preset logic to obtain a target index data set;
and the statistical unit is used for carrying out data statistics according to the target index data set so as to obtain target log data.
An embodiment of the present application further provides an electronic device, where the electronic device includes:
a memory storing at least one instruction;
and the processor executes the instructions stored in the memory to realize the artificial intelligence based log data statistical method.
The embodiment of the present application further provides a computer-readable storage medium, where at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in an electronic device to implement the artificial intelligence based log data statistics method.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of an artificial intelligence based statistical method of log data to which the present application relates.
Fig. 2 is a flow chart of a preferred embodiment of the present application for dividing the first target data set according to a predetermined threshold to obtain a second target data set.
FIG. 3 is a functional block diagram of a preferred embodiment of an artificial intelligence based log data statistics apparatus according to the present application.
Fig. 4 is a schematic structural diagram of an electronic device according to a preferred embodiment of the artificial intelligence based log data statistical method.
Fig. 5 is a schematic structural diagram of a global dictionary table and a batch dictionary table according to the present application.
Fig. 6 is a schematic structural diagram of a B-tree index to which the present application relates.
Detailed Description
For a clearer understanding of the objects, features and advantages of the present application, reference is made to the following detailed description of the present application along with the accompanying drawings and specific examples. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict. In the following description, numerous specific details are set forth to provide a thorough understanding of the present application, and the described embodiments are merely a subset of the embodiments of the present application and are not intended to be a complete embodiment.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The embodiment of the present Application provides an artificial intelligence based log data statistical method, which can be applied to one or more electronic devices, where an electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and hardware of the electronic device includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The electronic device may be any electronic product capable of performing human-computer interaction with a client, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), an intelligent wearable device, and the like.
The electronic device may also include a network device and/or a client device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.
The Network where the electronic device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
FIG. 1 is a flow chart of a preferred embodiment of the log data statistical method based on artificial intelligence according to the present application. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.
And S10, receiving the log data statistical request according to the search system, and verifying the statistical request.
In an alternative embodiment, the search system may use the clickwause system, which is a columnar database management system that may be used for online analytical processing (OLAP). The OLAP is a main application of the data warehouse system, supports complex analysis operation, emphasizes decision support and provides visual and understandable query results.
In this alternative embodiment, unlike the scenario of online transaction processing (OLTP), such as shopping cart adding, order placing, payment, etc. in an e-market scenario, a large number of insert, update, delete operations need to be performed in situ, and a data analysis (OLAP) scenario generally includes flexible exploration, BI tool insights, report making, etc. in any dimension after data is imported in batch. After the data is written once, data mining and analyzing from various angles are tried until information such as business value, business change trend and the like is found. This is a process that requires trial and error, constant adjustment, and continuous optimization, where the number of data reads is much greater than the number of writes, which requires the underlying database to be specifically designed for this feature.
In the optional embodiment, because the ClickHouse is a column-type database, different from the MySQL database used on line and locally, the query speed is very high, the data storage amount is very large, the query result can be returned in seconds for billions of data queries, and the high efficiency of the system is embodied by using the ClickHouse. But clickhouse does not support modifying data, it is very suitable to store log information of a user because log information is incremental data that does not require modification.
In an alternative embodiment, receiving a log data statistics request pursuant to a search system, and validating the statistics request includes:
and S101, setting encoding labels for log data of different data types according to a preset mode.
In an alternative embodiment, the log data of different data types may be provided with encoding labels according to a preset manner, and the encoding labels may be numbers, symbols or letters, which is not required by the present scheme.
S102, judging whether the data type in the statistical request contains a corresponding encoding label based on the encoding label, thereby determining whether the statistical request is qualified, and if so, passing the verification.
In this optional embodiment, after the coding tags are set for the different types of log data, whether the data type in the statistical request includes the corresponding coding tag may be determined based on the coding tags, so as to determine whether the current statistical request is qualified, if so, the verification is passed, the search system receives the statistical request, and if not, the verification is not passed, and the search system directly rejects the statistical request.
Therefore, whether the statistical request is qualified or not can be judged through the set coding label, so that the accuracy of the statistical request of a user is ensured, and the waste of system resources caused by abnormal statistical requests is prevented.
And S11, if the verification is passed, the search system searches the log data acquired from the server according to the statistical request to acquire a first target data set.
In this optional embodiment, a user may specify the data type, the corresponding time range, and the data value range of the log data to be counted through the client of the search system to generate the statistical request, and then send the statistical request to the server of the search system, so as to preliminarily determine the overall range and the corresponding data amount of the log data to be counted.
In this optional embodiment, after receiving the statistical request from the user, the search system may access the log data of the server from Kafka to ClickHouse in real time through Kafka (open source streaming processing platform) of ClickHouse to perform columnar storage as the first target data set according to the data type, time range, and data range specified in the user request. In addition, the clickwouse may also store offline log data, and this part of log data stream needs to be accessed in an offline manner to ensure that the clickwouse stores the full amount of log data for N days, and the period is usually equal to 15 days in the system.
In this alternative embodiment, Kafka is a distributed, partition-supporting, multi-copy distributed message system, and its greatest characteristic is that it can process a large amount of data in real time, and has the advantages of high throughput, low latency, scalability, durability, reliability, fault tolerance, and high concurrency, so as to meet various demand scenarios, such as log collection, user activity tracking, streaming processing, and the like.
In this alternative embodiment, the log data may be different types of log data generated by the network security device, such as a security detection log, a network traffic log, a protocol audit log, and a third party device input log.
In this alternative embodiment, the reason why the obtained log data is stored in a column is that:
in the row storage mode, data are stored continuously according to rows, data of all columns are stored in one block, columns not participating in calculation are read out completely at IO, and reading operation is amplified seriously. In the column storage mode, only the columns participating in calculation need to be read, so that IO cost is greatly reduced, and query is accelerated.
The data in the same column belong to the same type, and the compression effect is remarkable. Column storage usually has a compression ratio as high as ten times or even higher, so that a large amount of storage space is saved, and the storage cost is reduced; a higher compression ratio means a smaller datasize, and reading the corresponding data from the disk takes less time; the high compression ratio also means that the memory with the same size can store more data, and the caching effect of the system is better. Therefore, compared with the line storage, when providing the data query service, the ClickHouse is less affected by the data scale, has better performance of providing the query service with large data volume, and can improve the query efficiency.
Therefore, the search system can quickly acquire corresponding log data from the server according to the statistical request given by the user, and provides accurate data support for the subsequent process.
S12, dividing the first target data set according to a preset threshold to obtain a second target data set.
Referring to fig. 2, in an alternative embodiment, the dividing the first target data set according to the preset threshold to obtain the second target data set includes:
s121, judging the data volume of the first target data set according to a preset threshold value to obtain a judgment result.
In this optional embodiment, a preset threshold may be set to 1T, and the determination result is obtained by comparing the preset threshold with the data amount of the first target data set, if the data amount of the first target data set is greater than the preset threshold, the determination result is a partition, and if the data amount of the first target data set is less than the preset threshold, the determination result is a non-partition.
And S122, partitioning the first target data set based on the judgment result to obtain a partitioned data set.
In this optional embodiment, if the data amount of the first target data set is smaller than a preset threshold, the first target data set is used as the partition data set; and if the data volume of the first target data set is larger than a preset threshold, dividing the first target data set by taking the preset threshold as a unit to obtain the partitioned data set.
And S123, performing batch division on each partition data in the partition data set to obtain the second target data set.
In this optional embodiment, the process of performing batch division on each partition data in the partition data set is as follows: the partitioned data are sorted from large to small according to the data volume of each data in the same partition to obtain a sorted data table, cosine similarity between every two adjacent data in the sorted data table is calculated according to a cosine similarity algorithm, and then batch division is performed on each partitioned data in the partitioned data set according to a user-defined clustering algorithm and the cosine similarity between every two adjacent data in the sorted data table to obtain the second target data set. Where batches are typically used within batch operations of a database, to improve performance, for example: the batch size is 1000, which means 1000 pieces of data are processed in each database interaction.
In this optional embodiment, the main process of dividing each partitioned data in the partitioned data set into batches according to the customized clustering algorithm and the cosine similarity between each adjacent data in the sorting data table to obtain the second target data set includes:
and in the log data in the same partition, taking any log data which are not accessed as a central point in sequence, and expanding the central point according to a preset cosine similarity threshold, wherein the step length of the expansion is 1. For one log data, if the cosine similarity between the log data and the adjacent log data is greater than a preset cosine similarity threshold, clustering is started by taking the log data point as a center, if the nearby log data point is less than the preset similarity threshold, the log data point is marked as a noise log data point, and the preset cosine similarity threshold can be 0.6;
after clustering starts, calculating an average value of cosine similarity between adjacent log data points of the log data points in the current cluster and all log data points in the current cluster, and calculating and judging whether the average value is greater than a preset cosine similarity threshold value or not, if so, continuing clustering the periphery according to the same step length, and bringing the log data points not less than the preset cosine similarity threshold value into the cluster;
repeating the steps until all the log data points are accessed, marking each log data point as belonging to one cluster or noise log data point, taking all the noise log data points as a cluster category, carrying out batch division on the partitioned data together with the obtained other clusters, namely taking the data corresponding to each cluster category as a batch, and taking all the log data after the batch division is finished as the second target data set.
Illustratively, 100 log data are stored in the current partition, and after custom clustering, 5 clusters and 10 noise log data points are obtained, at this time, the 10 noise log data points are classified into the same class, and the obtained 5 clusters have 6 cluster classes, so that the current partition is divided into 6 batches, and the second target data set is formed according to the batches of all the partitions subjected to batch division.
Therefore, by further dividing the data in the first target data set, concurrent statistics can be performed on a plurality of pieces of log data in the partition data set in the subsequent process, so that the statistical efficiency of the log data is improved.
S13, compressing the second target data set according to a predetermined logic to obtain a target index data set.
In an optional embodiment, compressing the second target data set according to a preset logic to obtain the target index data set includes:
s131, compressing the data in the second target data set according to a preset logic to obtain a compressed data set.
In this alternative embodiment, since the data in the second target data set is batched data, there may be duplicate log data in each batch, and a corresponding string of each log data has a corresponding global ID stored in the global dictionary table, as shown in fig. 5, a global dictionary corresponding to the entire second target data set and a batch dictionary corresponding to each batch are provided.
In this alternative embodiment, a batch dictionary table may be created in each batch, where the table stores global IDs corresponding to all log data in the batch, and each global ID corresponds to a batch ID, and in this way, a character string corresponding to one log data may be mapped to one global ID through the global dictionary table and then mapped to one batch ID through the batch dictionary table, so that at this time, a character string corresponding to real log data is not stored in each batch, but a batch ID corresponding to a character string of log data is stored, thereby completing compression of the second target data set, and the compressed global dictionary table is used as the compressed data set. Thus, a column storing a string of log data is converted into a column storing a 32-bit integer value, and the data space is greatly reduced.
Illustratively, when a value actually represented by the 2 nd element in the batch dictionary 0 in the statistical chart 5 is to be queried, the value 2 of the element needs to be used to query the batch dictionary table to obtain the global ID corresponding to the element as 4, and then 4 to the global dictionary table are used to query to obtain the character string corresponding to 4 as "ij", so that the log data corresponding to the 2 nd element in the batch dictionary 0 can be known as "ij".
S132, converting the data in the compressed data set according to a compression algorithm to construct the target index data set.
In this alternative embodiment, the obtained compressed data set may be converted by using a compression algorithm Bit-Vector Encoding, and the core idea is to convert all values of the same column attribute in one column into a binary group (a Bitmap with a column attribute value appearing at a position in the column), which may be represented by using a Bitmap, and by using Bit-Vector Encoding, the entire column may be represented by using two simple binary groups, and by using this algorithm, one column may be converted into a plurality of binary groups, and management of the column may be achieved by constructing a B-Tree (B-Tree) index on these binary groups.
Illustratively, a list of log data stored in batch 1 is (1000,2000, 1000,2000, 1000), and the obtained binary groups are (1000, 1001101) and (2000, 0110010) after the conversion is performed by the compression algorithm Bit-Vector Encoding.
In this alternative embodiment, the B-Tree index is the most common index structure, and the index created by the search system by default is the B-Tree index. The B-tree index is based on a binary tree structure, which has 3 basic components: root nodes, branch nodes, and leaf nodes. Wherein the root node is located at the topmost end of the index structure, the leaf nodes are located at the bottommost end of the index structure, and the middle is a molecular node. The Leaf node (Leaf node) includes entries directly pointing to the data lines in the table, the Branch node (Branch node) includes entries pointing to other Branch nodes or Leaf nodes in the index, and the root node (Branch node) is only one in a B-tree index, which is actually the Branch node located at the top of the tree, the organization structure of the B-tree index is similar to a tree, the main data is concentrated on the Leaf node, the Leaf node includes the value of the index column and the physical address ROWID corresponding to the record line, as shown in fig. 6, the corresponding data converted by the compression algorithm can be obtained through the physical address ROWID, and further, the B-tree index constructed in the corresponding data scheme in the compression data set is obtained as the target index data set.
Therefore, the log data can be compressed, so that a corresponding target index data set is constructed on the basis of effectively reducing the storage space, and the rapid statistics of the log data by using the indexes is realized.
And S14, performing data statistics according to the target index data set to obtain target log data.
In this optional embodiment, the log data to be counted may be quickly queried according to the index value in the obtained target index data set, so as to complete the counting.
Illustratively, as shown in fig. 6, the index value of the log data currently needing statistics is 1019, then 1019 is compared with the size of 1001 and 1013 values at the root node, respectively, 1019 is found after 1013, thereby determining 1019 at the right child node, then 1013, 1017 and 1021 of 1019 and the right child node are compared, respectively, 1019 is found between 1017 and 1021 (child node), and then leaf node 1019 is found by comparing 1019 with 1017, 1018 and 1019 of the middle child node, thereby obtaining the actual log data according to the corresponding physical address ROWID.
Therefore, when log data statistics is carried out, the log data can be quickly matched and searched according to the indexes in the target index data set, and therefore the corresponding log data can be obtained.
Referring to fig. 3, fig. 3 is a functional block diagram of a preferred embodiment of the log data statistics apparatus based on artificial intelligence according to the present application. The artificial intelligence based log data statistical apparatus 11 includes a verification unit 110, an acquisition unit 111, a division unit 112, a compression unit 113, and a statistical unit 114. A module/unit as referred to herein is a series of computer readable instruction segments capable of being executed by the processor 13 and performing a fixed function, and is stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.
In an alternative embodiment, the verification unit 110 is configured to receive a statistical request of log data according to a search system and verify the statistical request.
In an optional embodiment, the receiving a log data statistics request according to a search system, and verifying the statistics request includes:
setting encoding labels for log data of different data types according to a preset mode;
and judging whether the data type in the statistical request contains a corresponding encoding label or not based on the encoding label so as to determine whether the statistical request is qualified or not, and if so, verifying to pass.
In an alternative embodiment, the search system may use the clickwause system, which is a columnar database management system that may be used for online analytical processing (OLAP). The OLAP is a main application of the data warehouse system, supports complex analysis operation, emphasizes decision support and provides intuitive and understandable query results.
In this alternative embodiment, unlike the scenario of online transaction processing (OLTP), such as shopping cart adding, order placing, payment, etc. in an e-market scenario, a large number of insert, update, delete operations need to be performed in situ, and a data analysis (OLAP) scenario generally includes flexible exploration, BI tool insights, report making, etc. in any dimension after data is imported in batch. After the data is written once, data mining and analyzing from various angles are tried until information such as business value, business change trend and the like is found. This is a process that requires trial and error, constant adjustment, and continuous optimization, where the number of data reads is much greater than the number of writes, which requires the underlying database to be specifically designed for this feature.
In the optional embodiment, because the ClickHouse is a columnar database, different from the MySQL database used on line and locally, the query speed is very high, the storage data volume is very large, the query result can be returned in seconds in the face of the query of billions of data, and the high efficiency of the system is embodied by using the ClickHouse. But clickhouse does not support modifying data, it is very suitable to store log information of a user because log information is incremental data that does not require modification.
In an alternative embodiment, the log data of different data types may be provided with encoding labels according to a preset manner, and the encoding labels may be numbers, symbols or letters, which is not required by the present scheme.
In this optional embodiment, after the encoding tags are set for the different types of log data, whether the data type in the statistical request includes the corresponding encoding tag may be determined based on the encoding tags, so as to determine whether the current statistical request is qualified, if so, the verification is passed, the search system receives the statistical request, and if not, the verification is not passed, and the search system directly rejects the statistical request.
In an optional embodiment, the obtaining unit 111 is configured to, if the verification is passed, search the log data obtained from the server by the search system according to the statistical request to obtain the first target data set.
In this optional embodiment, a user may specify the data type, the corresponding time range, and the data value range of the log data to be counted by using the client of the search system to generate the statistical request, and then send the statistical request to the server of the search system, thereby preliminarily determining the overall range and the corresponding data amount of the log data to be counted.
In this optional embodiment, after receiving the statistical request from the user, the search system may access the log data of the server from Kafka to ClickHouse in real time through Kafka (open source streaming processing platform) of ClickHouse to perform columnar storage as the first target data set according to the data type, time range, and data range specified in the user request. In addition, the clickwouse may also store offline log data, and this part of log data stream needs to be accessed in an offline manner to ensure that the clickwouse stores the full amount of log data for N days, and the period is usually equal to 15 days in the system.
In this alternative embodiment, Kafka is a distributed, partition-supporting, multi-copy distributed message system, and its greatest characteristic is that it can process a large amount of data in real time, and has the advantages of high throughput, low latency, scalability, durability, reliability, fault tolerance, and high concurrency, so as to meet various demand scenarios, such as log collection, user activity tracking, streaming processing, and the like.
In this alternative embodiment, the log data may be different types of log data generated by the network security device, such as a security detection log, a network traffic log, a protocol audit log, and a third party device input log.
In this alternative embodiment, the reason why the obtained log data is stored in a column is that:
in the row storage mode, data are stored continuously according to rows, data of all columns are stored in one block, columns not participating in calculation are read out completely at IO, and reading operation is amplified seriously. In the column storage mode, only the columns participating in calculation need to be read, so that IO cost is greatly reduced, and query is accelerated.
The data in the same column belong to the same type, and the compression effect is remarkable. Column storage usually has a compression ratio as high as ten times or even higher, so that a large amount of storage space is saved, and the storage cost is reduced; a higher compression ratio means a smaller datasize, and reading the corresponding data from the disk takes less time; the high compression ratio also means that the memory with the same size can store more data, and the caching effect of the system is better. Therefore, compared with the line storage, when providing the data query service, the ClickHouse is less affected by the data scale, has better performance of providing the query service with large data volume, and can improve the query efficiency.
In an alternative embodiment, the dividing unit 112 is configured to divide the first target data set according to a preset threshold to obtain a second target data set.
In an optional embodiment, the dividing the first target data set according to the preset threshold to obtain the second target data set includes:
judging the data volume of the first target data set according to a preset threshold value to obtain a judgment result;
partitioning the first target data set based on the judgment result to obtain a partitioned data set;
and carrying out batch division on each partition data in the partition data set to obtain the second target data set.
In this optional embodiment, a preset threshold may be set to 1T, and the determination result is obtained by comparing the preset threshold with the data amount of the first target data set, if the data amount of the first target data set is greater than the preset threshold, the determination result is a partition, and if the data amount of the first target data set is less than the preset threshold, the determination result is a non-partition.
In this optional embodiment, if the data amount of the first target data set is smaller than a preset threshold, the first target data set is used as the partition data set; if the data volume of the first target data set is larger than a preset threshold, dividing the first target data set by taking the preset threshold as a unit to obtain the partitioned data set.
In this optional embodiment, the process of performing batch division on each partition data in the partition data set is as follows: the partitioned data are sorted from large to small according to the data volume of each data in the same partition to obtain a sorted data table, cosine similarity between every two adjacent data in the sorted data table is calculated according to a cosine similarity algorithm, and then batch division is performed on each partitioned data in the partitioned data set according to a user-defined clustering algorithm and the cosine similarity between every two adjacent data in the sorted data table to obtain the second target data set. Where batches are typically used within batch operations of a database, to improve performance, for example: the batch size is 1000, which means 1000 pieces of data are processed in each database interaction.
In this optional embodiment, the main process of dividing each partitioned data in the partitioned data set into batches according to the customized clustering algorithm and the cosine similarity between each adjacent data in the sorting data table to obtain the second target data set includes:
and in the log data in the same partition, taking any log data which are not accessed as a central point in sequence, and expanding the central point according to a preset cosine similarity threshold, wherein the step length of the expansion is 1. For one log data, if the cosine similarity between the log data and the adjacent log data is greater than a preset cosine similarity threshold, clustering is started by taking the log data point as a center, if the nearby log data point is less than the preset similarity threshold, the log data point is marked as a noise log data point, and the preset cosine similarity threshold can be 0.6;
after clustering starts, calculating an average value of cosine similarity between adjacent log data points of the log data points in the current cluster and all log data points in the current cluster, and calculating and judging whether the average value is greater than a preset cosine similarity threshold value or not, if so, continuing clustering the periphery according to the same step length, and bringing the log data points not less than the preset cosine similarity threshold value into the cluster;
repeating the steps until all the log data points are accessed, marking each log data point as belonging to one cluster or noise log data point, taking all the noise log data points as a cluster category, carrying out batch division on the partitioned data together with the obtained other clusters, namely taking the data corresponding to each cluster category as a batch, and taking all the log data after the batch division is finished as the second target data set.
Illustratively, 100 log data are stored in the current partition, and after custom clustering, 5 clusters and 10 noise log data points are obtained, at this time, the 10 noise log data points are classified into the same class, and the obtained 5 clusters have 6 cluster classes, so that the current partition is divided into 6 batches, and the second target data set is formed according to the batches of all the partitions subjected to batch division.
In an optional embodiment, the compressing unit 113 is configured to compress the second target data set according to a preset logic to obtain a target index data set.
In an optional embodiment, the compressing the second target data set according to a preset logic to obtain a target index data set includes:
compressing the data in the second target data set according to preset logic to obtain a compressed data set;
and converting the data in the compressed data set according to a compression algorithm to construct the target index data set.
In this alternative embodiment, since the data in the second target data set is batched data, there may be duplicate log data in each batch, and a corresponding string of each log data has a corresponding global ID stored in the global dictionary table, as shown in fig. 5, a global dictionary corresponding to the entire second target data set and a batch dictionary corresponding to each batch are provided.
In this alternative embodiment, a batch dictionary table may be created in each batch, where the table stores global IDs corresponding to all log data in the batch, and each global ID corresponds to a batch ID, and in this way, by using the secondary dictionary table, a character string corresponding to one log data may be mapped to one global ID through the global dictionary table, and then mapped to one batch ID through the batch dictionary table, so that at this time, a character string corresponding to real log data is no longer stored in each batch, but a batch ID corresponding to a character string of log data is stored, thereby completing compression of the second target data set, and the compressed global dictionary table is used as the compressed data set. Thus, a column storing a log data string is converted into a column storing a 32-bit integer value, and the data space is greatly reduced.
Illustratively, when a value actually represented by the 2 nd element in the batch dictionary 0 in the statistical chart 5 is to be queried, the value 2 of the element needs to be used to query the batch dictionary table to obtain the global ID corresponding to the element as 4, and then 4 to the global dictionary table are used to query to obtain the character string corresponding to 4 as "ij", so that the log data corresponding to the 2 nd element in the batch dictionary 0 can be known as "ij".
In this alternative embodiment, the obtained compressed data set may be converted by using a compression algorithm Bit-Vector Encoding, and the core idea is to convert all values of the same column attribute in one column into a binary group (a Bitmap with a column attribute value appearing at a position in the column), which may be represented by using a Bitmap, and by using Bit-Vector Encoding, the entire column may be represented by using two simple binary groups, and by using this algorithm, one column may be converted into a plurality of binary groups, and management of the column may be achieved by constructing a B-Tree (B-Tree) index on these binary groups.
Illustratively, a list of log data stored in batch 1 is (1000,2000, 1000,2000, 1000), and the obtained binary groups are (1000, 1001101) and (2000, 0110010) after the conversion by the compression algorithm Bit-Vector Encoding.
In this alternative embodiment, the B-Tree index is the most common index structure, and the index created by the search system by default is the B-Tree index. The B-tree index is based on a binary tree structure, which has 3 basic components: root nodes, branch nodes, and leaf nodes. Wherein the root node is located at the topmost end of the index structure, the leaf nodes are located at the bottommost end of the index structure, and the middle is a molecular node. The Leaf node (Leaf node) includes entries directly pointing to the data lines in the table, the Branch node (Branch node) includes entries pointing to other Branch nodes or Leaf nodes in the index, and the root node (Branch node) is only one in a B-tree index, which is actually the Branch node located at the top of the tree, the organization structure of the B-tree index is similar to a tree, the main data is concentrated on the Leaf node, the Leaf node includes the value of the index column and the physical address ROWID corresponding to the record line, as shown in fig. 6, the corresponding data converted by the compression algorithm can be obtained through the physical address ROWID, and further, the B-tree index constructed in the corresponding data scheme in the compression data set is obtained as the target index data set.
In an alternative embodiment, the statistical unit 114 is configured to perform data statistics according to the target index data set to obtain target log data.
In this optional embodiment, the log data to be counted may be quickly queried according to the index value in the obtained target index data set, so as to complete the counting.
Illustratively, as shown in fig. 6, the index value of the log data currently needing statistics is 1019, then 1019 is compared with the size of 1001 and 1013 values at the root node, respectively, 1019 is found after 1013, thereby determining 1019 at the right child node, then 1013, 1017 and 1021 of 1019 and the right child node are compared, respectively, 1019 is found between 1017 and 1021 (child node), and then leaf node 1019 is found by comparing 1019 with 1017, 1018 and 1019 of the middle child node, thereby obtaining the actual log data according to the corresponding physical address ROWID.
According to the technical scheme, the log data can be classified and stored, the index value is constructed after the log data are compressed according to the preset logic, so that the log data can be rapidly counted by utilizing the constructed index value on the basis of saving the storage space of the log data, and the counting efficiency of the large-scale log data is improved.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 1 comprises a memory 12 and a processor 13. The memory 12 is used for storing computer readable instructions, and the processor 13 is used for executing the computer readable instructions stored in the memory to implement the artificial intelligence based log data statistical method according to any one of the above embodiments.
In an alternative embodiment, the electronic device 1 further comprises a bus, a computer program stored in said memory 12 and executable on said processor 13, such as an artificial intelligence based log data statistics program.
Fig. 4 only shows the electronic device 1 with the memory 12 and the processor 13, and it will be understood by a person skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
In conjunction with fig. 1, the memory 12 in the electronic device 1 stores a plurality of computer-readable instructions to implement an artificial intelligence based statistical method of log data, and the processor 13 executes the plurality of instructions to implement:
receiving a log data statistical request according to a search system, and verifying the statistical request;
if the verification is passed, the search system searches the log data acquired from the server according to the statistical request to acquire a first target data set;
dividing the first target data set according to a preset threshold value to obtain a second target data set;
compressing the second target data set according to preset logic to obtain a target index data set;
and carrying out data statistics according to the target index data set to obtain target log data.
Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
It will be understood by those skilled in the art that the schematic diagram is only an example of the electronic device 1, and does not constitute a limitation to the electronic device 1, the electronic device 1 may have a bus-type structure or a star-shaped structure, the electronic device 1 may further include more or less hardware or software than those shown in the figures, or different component arrangements, for example, the electronic device 1 may further include an input and output device, a network access device, etc.
It should be noted that the electronic device 1 is only an example, and other existing or future electronic products, such as those that may be adapted to the present application, should also be included in the scope of protection of the present application, and are included by reference.
Memory 12 includes at least one type of readable storage medium, which may be non-volatile or volatile. The readable storage medium includes flash memory, removable hard disks, multimedia cards, card type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, for example a removable hard disk of the electronic device 1. The memory 12 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device 1. The memory 12 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of an artificial intelligence based log data statistics program, etc., but also for temporarily storing data that has been output or is to be output.
The processor 13 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the whole electronic device 1 by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing an artificial intelligence based log data statistics program and the like) stored in the memory 12 and calling data stored in the memory 12.
The processor 13 executes an operating system of the electronic device 1 and various installed application programs. The processor 13 executes the application program to implement the steps of the above-mentioned artificial intelligence based log data statistical method embodiments, such as the steps shown in fig. 1 to 2.
Illustratively, the computer program may be partitioned into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to accomplish the present application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the electronic device 1. For example, the computer program may be divided into a verification unit 110, an acquisition unit 111, a division unit 112, a compression unit 113, a statistics unit 114.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the artificial intelligence based log data statistics method according to the embodiments of the present application.
The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the processes in the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer-readable storage medium and executed by a processor, to implement the steps of the embodiments of the methods described above.
Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), random-access Memory and other Memory, etc.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 4, but this does not indicate only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 or the like.
The present application further provides a computer-readable storage medium (not shown), in which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor in an electronic device to implement the artificial intelligence based log data statistical method according to any of the foregoing embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the specification may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present application and not for limiting, and although the present application is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application.

Claims (10)

1. An artificial intelligence based log data statistical method, characterized in that the method comprises:
receiving a log data statistical request according to a search system, and verifying the statistical request;
if the verification is passed, the search system searches the log data acquired from the server according to the statistical request to acquire a first target data set;
dividing the first target data set according to a preset threshold value to obtain a second target data set;
compressing the second target data set according to preset logic to obtain a target index data set;
and carrying out data statistics according to the target index data set to obtain target log data.
2. The artificial intelligence based log data statistical method of claim 1, wherein receiving a log data statistical request pursuant to a search system and verifying the statistical request comprises:
setting encoding labels for log data of different data types according to a preset mode;
and judging whether the data type in the statistical request contains a corresponding encoding label or not based on the encoding label so as to determine whether the statistical request is qualified or not, and if so, passing the verification.
3. The artificial intelligence based log data statistical method of claim 1, wherein if the verification is passed, the search system searches the log data obtained from the server according to the statistical request to obtain the first target data set comprises:
the search system collects corresponding log data based on the data type, the time range and the value range of the log data, and stores the collected log data in a column mode to serve as the first target data set.
4. The artificial intelligence based log data statistics method of claim 1 wherein said partitioning the first target data set to obtain a second target data set in accordance with a preset threshold comprises:
judging the data volume of the first target data set according to a preset threshold value to obtain a judgment result;
partitioning the first target data set based on the judgment result to obtain a partitioned data set;
and carrying out batch division on each partition data in the partition data set to obtain the second target data set.
5. The artificial intelligence based log data statistics method of claim 4 wherein said partitioning the first target dataset based on the determination to obtain a partitioned dataset comprises:
if the data volume of the first target data set is smaller than a preset threshold value, taking the first target data set as the partition data set;
if the data volume of the first target data set is larger than a preset threshold, dividing the first target data set by taking the preset threshold as a unit to obtain the partitioned data set.
6. The artificial intelligence based log data statistics method of claim 5 wherein the batch partitioning of each of the partitioned data sets to obtain the second target data set comprises:
sorting the data of the partitions from large to small according to the data quantity of the data in the same partition to obtain a sorting data table;
calculating cosine similarity between every two adjacent data in the sorting data table according to a cosine similarity algorithm;
and dividing each partition data in the partition data set in batches according to a user-defined clustering algorithm and cosine similarity between each adjacent data in the sorting data table to obtain the second target data set.
7. The artificial intelligence based log data statistics method of claim 1 wherein the compressing the second target data set according to a predetermined logic to obtain a target index data set comprises:
compressing the data in the second target data set according to a preset logic to obtain a compressed data set;
and converting the data in the compressed data set according to a compression algorithm to construct the target index data set.
8. An artificial intelligence based log data statistics apparatus, characterized in that the apparatus comprises:
the verification unit is used for receiving a log data statistical request according to a search system and verifying the statistical request;
the acquisition unit is used for searching the log data acquired from the server by the search system according to the statistical request to acquire a first target data set if the verification is passed;
the dividing unit is used for dividing the first target data set according to a preset threshold value to obtain a second target data set;
the compression unit is used for compressing the second target data set according to preset logic to obtain a target index data set;
and the statistical unit is used for carrying out data statistics according to the target index data set so as to obtain target log data.
9. An electronic device, characterized in that the electronic device comprises:
a memory storing computer readable instructions; and
a processor executing computer readable instructions stored in the memory to implement the artificial intelligence based log data statistics method of any of claims 1 to 7.
10. A computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor implement the artificial intelligence based log data statistics method of any of claims 1-7.
CN202210378426.5A 2022-04-12 2022-04-12 Log data statistical method based on artificial intelligence and related equipment Pending CN114741368A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210378426.5A CN114741368A (en) 2022-04-12 2022-04-12 Log data statistical method based on artificial intelligence and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210378426.5A CN114741368A (en) 2022-04-12 2022-04-12 Log data statistical method based on artificial intelligence and related equipment

Publications (1)

Publication Number Publication Date
CN114741368A true CN114741368A (en) 2022-07-12

Family

ID=82280804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210378426.5A Pending CN114741368A (en) 2022-04-12 2022-04-12 Log data statistical method based on artificial intelligence and related equipment

Country Status (1)

Country Link
CN (1) CN114741368A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115455088A (en) * 2022-10-24 2022-12-09 建信金融科技有限责任公司 Data statistical method, device, equipment and storage medium
CN117078139A (en) * 2023-10-16 2023-11-17 国家邮政局邮政业安全中心 Cross-border express supervision method, system, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115455088A (en) * 2022-10-24 2022-12-09 建信金融科技有限责任公司 Data statistical method, device, equipment and storage medium
CN117078139A (en) * 2023-10-16 2023-11-17 国家邮政局邮政业安全中心 Cross-border express supervision method, system, electronic equipment and storage medium
CN117078139B (en) * 2023-10-16 2024-02-09 国家邮政局邮政业安全中心 Cross-border express supervision method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111488363B (en) Data processing method, device, electronic equipment and medium
CN108268586B (en) Data processing method, device, medium and computing equipment across multiple data tables
GB2513472A (en) Resolving similar entities from a database
CN114741368A (en) Log data statistical method based on artificial intelligence and related equipment
CN111258966A (en) Data deduplication method, device, equipment and storage medium
US20090077078A1 (en) Methods and systems for merging data sets
CN110659282B (en) Data route construction method, device, computer equipment and storage medium
CN112115152B (en) Data increment updating and inquiring method and device, electronic equipment and storage medium
CN115146865A (en) Task optimization method based on artificial intelligence and related equipment
CN112579586A (en) Data processing method, device, equipment and storage medium
Franke et al. Parallel Privacy-preserving Record Linkage using LSH-based Blocking.
US10599614B1 (en) Intersection-based dynamic blocking
CN114510487A (en) Data table merging method, device, equipment and storage medium
CN103345527B (en) Intelligent data statistical system
CN116719822B (en) Method and system for storing massive structured data
CN114372060A (en) Data storage method, device, equipment and storage medium
CN113806492A (en) Record generation method, device and equipment based on semantic recognition and storage medium
CN116150185A (en) Data standard extraction method, device, equipment and medium based on artificial intelligence
CN115687352A (en) Storage method and device
CN115293809A (en) Typhoon and rainstorm risk rating method based on artificial intelligence and related equipment
CN115221174A (en) Data grading storage method, device, equipment and medium based on artificial intelligence
CN114818686A (en) Text recommendation method based on artificial intelligence and related equipment
CN109063097B (en) Data comparison and consensus method based on block chain
CN114564501A (en) Database data storage and query methods, devices, equipment and medium
CN112131215B (en) Bottom-up database information acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination