CN118152355A - Log acquisition management method and system - Google Patents

Log acquisition management method and system Download PDF

Info

Publication number
CN118152355A
CN118152355A CN202410341243.5A CN202410341243A CN118152355A CN 118152355 A CN118152355 A CN 118152355A CN 202410341243 A CN202410341243 A CN 202410341243A CN 118152355 A CN118152355 A CN 118152355A
Authority
CN
China
Prior art keywords
data
model
log
abnormal
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410341243.5A
Other languages
Chinese (zh)
Inventor
喻斌
朱立
肖通新
朱淘淘
张帆
罗梓铭
姜川
黄强
杨美华
姚道金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Kingroad Technology Development Co ltd
Original Assignee
Jiangxi Kingroad Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Kingroad Technology Development Co ltd filed Critical Jiangxi Kingroad Technology Development Co ltd
Priority to CN202410341243.5A priority Critical patent/CN118152355A/en
Publication of CN118152355A publication Critical patent/CN118152355A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of log management, in particular to a log acquisition management method and system, comprising the following steps: based on the historical log data, a long-term and short-term memory network algorithm is adopted to analyze the time sequence characteristics of the log data, and the time sequence characteristics of the abnormal mode are captured through learning the long-term dependence and the mode change of the time sequence data to generate a time sequence characteristic model. According to the invention, by combining a long-term memory network, a self-encoder, a logistic regression, a random forest and gradient lifting decision tree algorithm and information entropy calculation, deep learning of time sequence features, accurate recognition of abnormal modes and dynamic compression of log data are realized in the field of log data analysis, a data processing flow is optimized, analysis precision and efficiency are improved, meanwhile, storage requirements are reduced, complex modes can be rapidly recognized when a large amount of log data are processed, data storage and transmission efficiency is remarkably improved, and an efficient and economic solution is provided for a log management system.

Description

Log acquisition management method and system
Technical Field
The invention relates to the technical field of log management, in particular to a log acquisition management method and system.
Background
The field of log management technology relates to methods and techniques for collecting, analyzing, storing and processing log data from various systems and applications, with the core objective of improving data transparency, security and operation and maintenance efficiency while ensuring compliance and depth of data analysis. In modern IT infrastructure, log management not only helps technical teams monitor and maintain system health, but also provides a valuable data source for security analysis, fault diagnosis, business intelligence analysis.
Among other things, log collection management methods focus primarily on efficiently collecting, preprocessing, and normalizing log data from distributed systems and diverse data sources. The purpose is to ensure the quality and consistency of log data, providing a reliable basis for subsequent log analysis, monitoring and alerting. By adopting advanced data processing algorithms and models, such as long-short-term memory networks, self-encoders and the like, the log acquisition management method aims at deeply mining time sequence features and abnormal modes in log data and effectively classifying and compressing the data. The method improves the efficiency and accuracy of log data processing, reduces the cost of storage and analysis, enhances the monitoring capability of the running state and the security threat of the system, and provides data support for decision making.
The traditional method faces the problems of low processing efficiency, high data storage cost and the like when processing large-scale and high-dimension log data. Because the traditional method lacks an effective data reduction and feature extraction mechanism, long-term dependence and mode change in time series data cannot be fully captured, so that the accuracy of anomaly detection and classification is low, and the data compression strategy which is dynamically adjusted is lacking, so that the data storage and transmission efficiency is low, and the rapidly-increased data processing requirement cannot be met. The traditional method adopts a fixed processing flow in actual operation, lacks self-adaptive capability aiming at data characteristics, and increases the complexity and cost of data processing and management.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a log acquisition management method and system.
In order to achieve the above purpose, the present invention adopts the following technical scheme, and a log acquisition management method includes the following steps:
S1: based on historical log data, a long-term and short-term memory network algorithm is adopted to analyze the time sequence characteristics of the log data, and the time sequence characteristics of an abnormal mode are captured through learning the long-term dependence and the mode change of the time sequence data to generate a time sequence characteristic model;
S2: based on the time sequence feature model, performing unsupervised learning by adopting a self-encoder, performing dimension reduction and key feature extraction of multi-dimensional log data through reconstruction errors between input and output in training, and generating an abnormal feature vector model;
S3: based on the abnormal feature vector model, analyzing an abnormal mode by adopting a logistic regression method, judging whether the log is abnormal or not by carrying out weight distribution on the feature vector and calculating a probability score, and generating a logical abnormality judgment model;
S4: based on the logic anomaly judgment model, using an InfluxDB time sequence database and combining an autoregressive integral moving average and a time sequence prediction method to analyze the time mode and trend of log data and generate a time trend analysis model;
S5: based on the time trend analysis model, classifying log data and optimizing anomaly detection by adopting a random forest and gradient lifting decision tree algorithm through a logic anomaly analysis time trend analysis model to generate an anomaly detection and classification model;
S6: based on the anomaly detection and classification model, the log data processed by the anomaly detection and classification model is dynamically compressed by adopting an information entropy calculation method, and a compression strategy is adjusted according to the information quantity and the change rate of the data, so that a dynamic log compression model is generated.
As a further scheme of the invention, the time sequence feature model comprises a time dependency graph of abnormal behaviors, a time sequence mode of normal behaviors and statistical features of time sequence data, the abnormal feature vector model comprises an extracted key abnormal feature vector, feature differences of abnormal and normal logs and a feature reconstruction error index for abnormal detection, the logic abnormal judgment model comprises a probability scoring model of the abnormal logs, a feature weight distribution rule and a logic threshold value of abnormal judgment, the time trend analysis model comprises a time trend prediction curve, seasonal and periodical mode analysis results and an abnormal early warning index of the time mode of log data, the abnormal detection and classification model comprises an enhanced log classification, an enhanced abnormal detection sensitivity, an identification capability of differential type abnormal behaviors, and the dynamic log compression model comprises a dynamic compression rate adjustment mechanism of information entropy, an optimization measure of data storage efficiency after compression and a restoration capability of compressed data.
As a further scheme of the invention, based on historical log data, a long-term and short-term memory network algorithm is adopted to analyze the time sequence characteristics of the log data, and the time sequence characteristics of an abnormal mode are captured by learning the long-term dependence and the mode change of the time sequence data, so that the step of generating a time sequence characteristic model is specifically as follows:
S101: based on the history log data, performing an operation of removing irrelevant items and abnormal values by adopting a data method, and generating a data processing set by carrying out normalization processing on time marks in the data and presenting the time series data into a uniform format;
S102: based on the data processing set, performing extraction operation of key time sequence features by adopting a feature engineering technology, and generating a feature extraction set by analyzing the periodic variation and long-term trend of the data through analyzing the statistical attribute of the time sequence data;
S103: based on the feature extraction set, a long-term and short-term memory network algorithm is adopted to execute long-term dependency learning operation of the time sequence, model training is carried out by setting network structure parameters and utilizing a back propagation algorithm, and a loss function is optimized to generate a time sequence feature model.
As a further scheme of the invention, based on the time sequence feature model, the self-encoder is adopted to perform unsupervised learning, and the steps of performing the dimension reduction and key feature extraction of the multidimensional log data by minimizing the reconstruction error between the input and the output in the training and generating the abnormal feature vector model are specifically as follows:
S201: based on the time sequence feature model, a Z score standardization method is adopted to carry out standardization processing on feature values, and data are converted into a space with zero mean and unit variance by calculating deviation of a plurality of feature values and average values of feature sets, so that a standardized feature set is generated;
S202: based on the standardized feature set, constructing a self-encoder neural network, adopting a back propagation and gradient descent algorithm, optimizing error between reconstruction output and original input data, learning compressed representation of the data, retaining original information, and generating a self-encoder training model;
S203: based on the self-encoder training model, the original data set is subjected to dimension reduction processing, the input multidimensional data is converted into a compressed finite dimension feature vector, and the model parameters are adjusted and optimized through key information of the input data to generate an abnormal feature vector model.
As a further scheme of the invention, based on the abnormal feature vector model, an abnormal mode is analyzed by adopting a logistic regression method, and whether the log is abnormal or not is judged by carrying out weight distribution on the feature vector and calculating a probability score, and the step of generating a logical abnormality judgment model specifically comprises the following steps:
S301: based on the abnormal feature vector model, adopting an interval scaling method for normalization processing, and converting the feature values to be between 0 and 1 by calculating real-time values of a plurality of features at the feature range positions to generate a normalized feature vector set;
S302: based on the normalized feature vector set, cross entropy is used as a loss function, model parameters are optimized, weighted summation is carried out on a plurality of feature vectors, a Sigmoid function is applied to convert the weighted summation into probability output, and weights are iteratively adjusted through a gradient descent method, so that an optimized logistic regression model is generated;
S303: and based on the optimized logistic regression model, carrying out abnormal probability scoring on unknown log data, judging whether the log is abnormal according to a preset threshold value, and comparing the unknown log with the preset threshold value, marking the unknown log as abnormal if the probability score of the log exceeds the preset threshold value, and judging the unknown log as normal if the probability score of the log is lower than the threshold value, so as to generate a logical abnormal judgment model.
As a further scheme of the invention, based on the logic anomaly judgment model, by utilizing an InfluxDB time sequence database and combining an autoregressive integral moving average and a time sequence prediction method, the time mode and trend of log data are analyzed, and the step of generating a time trend analysis model specifically comprises the following steps:
S401: based on the logic anomaly judgment model, processing log data by adopting a data aggregation technology, filling missing values in the data, removing the abnormal values, resampling the data, and generating a processed time sequence data set;
S402: based on the processed time sequence data set, an autoregressive integral moving average model is applied to analyze, the stability of differential order maintenance data of the model is determined, the dependency and random fluctuation of an autoregressive item and a moving average item capturing time sequence are selected, and a time sequence analysis model is generated;
S403: based on the time series analysis model, potential trends and modes are identified through predicting and analyzing data points in a future time period, and the periodicity, the trending and the abnormal behaviors of the data are analyzed by using values predicted by the model, so that a time trend analysis model is generated.
As a further scheme of the invention, based on the time trend analysis model, through a logic anomaly analysis time trend analysis model, a random forest and gradient lifting decision tree algorithm is adopted to classify log data and optimize anomaly detection, and the steps of generating anomaly detection and classification models are specifically as follows:
S501: based on the time trend analysis model, performing dimension reduction on the data features by adopting principal component analysis, reducing the dimension of the data and keeping the variability of the original data by converting the original features, and generating a dimension reduction feature data set;
S502: based on the dimension reduction characteristic data set, carrying out initialized classification and anomaly detection through a random forest algorithm, establishing a plurality of decision trees, integrating prediction results, improving the accuracy and the robustness of the model, and generating a random forest classification and anomaly detection model;
S503: based on the random forest classification and anomaly detection model, a gradient lifting decision tree algorithm is adopted to perform model optimization, a decision tree is gradually added to reduce model prediction residual errors, and modes and relations in data are captured to generate an anomaly detection and classification model.
As a further scheme of the invention, based on the anomaly detection and classification model, the log data processed by the anomaly detection and classification model is dynamically compressed by adopting an information entropy calculation method, and a compression strategy is adjusted according to the information quantity and the change rate of the data, so that the step of generating the dynamic log compression model comprises the following specific steps:
S601: based on the anomaly detection and classification model, carrying out evaluation on the information quantity of the data set by adopting an information entropy calculation method, analyzing the frequency of data points, quantifying the information quantity of the data set, determining the complexity of the data, identifying key information points in the data, and generating a data information quantity evaluation result;
S602: based on the data information quantity evaluation result, a dynamic coding algorithm is applied to formulate a compression strategy, and the coding length and the compression rate are dynamically adjusted according to the information quantity and the change rate of the data to generate a dynamic compression strategy;
s603: based on the dynamic compression strategy, adopting a compression algorithm to dynamically compress the log data, and compressing the real-time change of the data content according to parameters defined in the strategy, including the coding length and the compression rate, so as to generate a dynamic log compression model.
As a further scheme of the invention, the log acquisition management system is used for executing the log acquisition management method and comprises a data processing and feature extraction module, a time sequence analysis module, an abnormal vector generation module, an abnormal judgment module, a trend prediction module, an abnormal detection and classification module and a dynamic compression module.
As a further scheme of the invention, the data processing and feature extraction module adopts a data normalization processing algorithm to remove irrelevant items and abnormal values based on historical log data, performs unified formatting processing on time marks in the data, performs statistical attribute analysis on the data by using a feature engineering technology, extracts key time sequence features and generates a feature extraction set;
The time sequence analysis module adopts a long-term and short-term memory network algorithm to learn the long-term dependency relationship and the mode change of time sequence data based on the feature extraction set, sets network structure parameters, and optimizes a loss function by using a back propagation algorithm to generate a time sequence feature model;
The abnormal vector generation module adopts a self-encoder to perform unsupervised learning based on a time sequence feature model, and performs data reduction and key feature extraction on reconstruction errors between input and output to generate an abnormal feature vector model;
the anomaly judgment module is used for carrying out weight distribution on the feature vectors and calculating probability scores by adopting a logistic regression method based on the anomaly feature vector model, and generating a logical anomaly judgment model by taking cross entropy as a loss function optimization parameter;
The trend prediction module is used for analyzing the time mode and trend of the log data based on a logic anomaly judgment model and combining an InfluxDB and an autoregressive integral moving average method, and the autoregressive integral moving average model is used for analyzing and predicting the data to generate a time trend analysis model;
The anomaly detection and classification module classifies log data and optimizes anomaly detection through a random forest algorithm and a gradient lifting decision tree algorithm based on a time trend analysis model, establishes a plurality of decision trees through the random forest algorithm and integrates a prediction result, optimizes the model through the decision tree algorithm, and generates an anomaly detection and classification model;
the dynamic compression module adopts an information entropy calculation method to evaluate the information quantity of the data set based on the anomaly detection and classification model, adjusts the compression strategy by using a dynamic coding algorithm, and performs dynamic compression processing on the log data to generate a dynamic log compression model.
Compared with the prior art, the invention has the advantages and positive effects that:
According to the invention, deep learning of time sequence characteristics, efficient identification of abnormal modes and dynamic compression of log data are realized in the field of log data analysis and management by adopting a long-short-term memory network algorithm, a self-encoder, a logistic regression method, a random forest and gradient lifting decision tree algorithm and an information entropy calculation method. The method for comprehensively applying the advanced algorithm optimizes the data processing flow, improves the accuracy and efficiency of data analysis, and simultaneously reduces the requirement of storage space. When large-scale log data are processed, the rapid learning and recognition of complex data modes and the remarkable improvement of data storage and transmission efficiency can be realized. And the application of the dynamic compression model intelligently adjusts the compression strategy according to the information quantity and the change rate of the data, further optimizes the data storage and processing efficiency, and provides a more efficient and economical data processing mode for the log management system.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention;
FIG. 2 is a S1 refinement flowchart of the present invention;
FIG. 3 is a S2 refinement flowchart of the present invention;
FIG. 4 is a S3 refinement flowchart of the present invention;
FIG. 5 is a S4 refinement flowchart of the present invention;
FIG. 6 is a S5 refinement flowchart of the present invention;
FIG. 7 is a S6 refinement flowchart of the present invention;
Fig. 8 is a system flow diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In the description of the present invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention. Furthermore, in the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Example 1
Referring to fig. 1, the present invention provides a technical solution, a log collection management method, including the following steps:
S1: based on historical log data, a long-term and short-term memory network algorithm is adopted to analyze the time sequence characteristics of the log data, and the time sequence characteristics of an abnormal mode are captured through learning the long-term dependence and the mode change of the time sequence data to generate a time sequence characteristic model;
S2: based on the time sequence feature model, performing unsupervised learning by adopting a self-encoder, performing dimension reduction and key feature extraction of multi-dimensional log data through reconstruction errors between input and output in training, and generating an abnormal feature vector model;
S3: based on the abnormal feature vector model, analyzing an abnormal mode by adopting a logistic regression method, and judging whether the log is abnormal or not by carrying out weight distribution on the feature vector and calculating a probability score so as to generate a logistic abnormal judgment model;
S4: based on a logic anomaly judgment model, using an InfluxDB time sequence database and combining an autoregressive integral moving average and a time sequence prediction method to analyze the time mode and trend of log data and generate a time trend analysis model;
s5: based on the time trend analysis model, classifying log data and optimizing anomaly detection by adopting a random forest and gradient lifting decision tree algorithm through a logic anomaly analysis time trend analysis model to generate an anomaly detection and classification model;
S6: based on the anomaly detection and classification model, the log data processed by the anomaly detection and classification model is dynamically compressed by adopting an information entropy calculation method, and a compression strategy is adjusted according to the information quantity and the change rate of the data, so that a dynamic log compression model is generated.
The time sequence feature model comprises a time dependency graph of abnormal behaviors, a time sequence mode of normal behaviors and statistical features of time sequence data, the abnormal feature vector model comprises an extracted key abnormal feature vector, feature differences of abnormal and normal logs and feature reconstruction error indexes for abnormal detection, the logic abnormal judgment model comprises a probability scoring model of the abnormal logs, a feature weight distribution rule and a logic threshold value of abnormal judgment, the time trend analysis model comprises a time trend prediction curve of log data, seasonal and periodical pattern analysis results and abnormal early warning indexes of the time patterns, the abnormal detection and classification model comprises reinforced log classification, abnormal detection sensitivity is improved, recognition capability of differential type abnormal behaviors is improved, and the dynamic log compression model comprises a dynamic compression rate adjustment mechanism of information entropy, optimization measures of data storage efficiency after compression and recovery capability of compressed data.
In step S1, the input data is processed historical log data, long-term dependence and mode change in time series data are captured and learned by using LSTM algorithm, and the LSTM network performs deep learning on the log data by setting appropriate network layers (such as input layer, hidden layer, output layer) and adjustment parameters (such as learning rate, iteration number, hidden layer node number). LSTM networks regulate the flow of information through their unique gating mechanisms (forget gate, input gate, output gate), effectively addressing long-range dependency problems in time-series data. And generating a time sequence characteristic model which can reflect the characteristics of the time sequence characteristics and the abnormal modes in the log data, thereby providing a basis for the next abnormal detection.
In step S2, the self-encoder performs the dimension reduction and key feature extraction of the multidimensional data through its encoder and decoder structure, and in the encoder section, the network converts the high-dimensional input into a low-dimensional representation by laminating the data one by one; in the decoder section, the network attempts to reconstruct the original input, with the reconstruction error (i.e., the difference between the input and the output) being the basis for network training. By adjusting the network structure (e.g., number of layers, number of nodes) and training parameters (e.g., learning rate, loss function), the self-encoder can efficiently extract key features and reduce data dimensionality. An abnormal feature vector model is generated, key features of the abnormal mode are captured, and a necessary data basis is provided for subsequent abnormal judgment.
In step S3, the logistic regression method is adopted to distribute the weights of the feature vectors and calculate the probability to judge whether the log data is abnormal, and the model is trained by setting proper logistic regression model parameters (such as regression coefficients and intercept) and adopting a proper optimization algorithm (such as gradient descent method). The model estimates the parameter values by maximizing likelihood functions to determine the importance of each feature in anomaly judgment. The logistic regression model can evaluate the new log data, calculate the probability that it belongs to an anomaly, and generate a logistic anomaly judgment model based on the probability. Not only can the log data be accurately judged whether to be abnormal, but also an effective tool can be provided for abnormality detection and prevention of the system.
In step S4, log data is stored and managed using an InfluxDB time series database, and is analyzed using an autoregressive integral moving average (ARIMA) model and other time series prediction methods, which fits time series data by determining parameters of autoregressive portions, differential orders, and moving average portions of the data, such as the number of autoregressive terms, differential times, and the number of moving average terms, and is capable of analyzing and capturing seasonal and trending features of the data. The predictive power of the model is further optimized in combination with other time series prediction techniques, such as exponential smoothing or seasonal decomposition. A temporal trend analysis model is generated by analyzing temporal patterns and trends of the log data. The time-dependent structure and potential trend of the data can be revealed, and important time dimension information is provided for subsequent anomaly detection and classification.
In the step S5, a random forest and gradient lifting decision tree (GBDT) algorithm is adopted, and the stability and the accuracy of the model are enhanced by constructing a plurality of decision trees and integrating the results, and each decision tree uses a randomly selected data subset and characteristics during training, so that the overfitting risk of the model is reduced. The gradient lifting decision tree algorithm further optimizes the model, gradually improves the prediction accuracy of the model by gradually building a decision tree and reducing the residual error of the previous step in each step. The two algorithms are combined for use, so that classification and anomaly detection of multidimensional log data can be effectively performed. The generated abnormality detection and classification model not only can accurately identify and classify abnormal log data, but also can carry out fine-granularity abnormality detection according to the characteristics of the log data.
In the step S6, the uncertainty and complexity of the data are measured by adopting an information entropy calculation method, the information quantity and the change rate of the log data are effectively evaluated, and the importance and the change characteristics of the data can be determined by calculating the information entropy of the log data processed by the anomaly detection and classification model. According to the characteristics, the compression strategy is dynamically adjusted, for example, a lower compression ratio is adopted for data with high information content and frequent change, and a higher compression ratio is adopted for data with low information content and stability, so that not only is the storage space saved, but also the integrity and usability of the data can be maintained. The generated dynamic log compression model can dynamically adjust the compression strategy according to the actual characteristics of the data, improves the compression efficiency, and simultaneously ensures the validity and the queriability of the log data
Referring to fig. 2, based on historical log data, a long-term and short-term memory network algorithm is adopted to analyze time series characteristics of the log data, and through learning long-term dependence and mode change of the time series data, time series characteristics of an abnormal mode are captured, and the step of generating a time series characteristic model specifically comprises the following steps:
S101: based on the history log data, performing an operation of removing irrelevant items and abnormal values by adopting a data method, and presenting time series data into a unified format by carrying out normalization processing on time marks in the data to generate a data processing set;
S102: based on the data processing set, performing extraction operation of key time sequence features by adopting a feature engineering technology, and generating a feature extraction set by analyzing the statistical properties of the time sequence data and iteratively analyzing the periodic changes and the long-term trends of the data;
s103: based on the feature extraction set, a long-term and short-term memory network algorithm is adopted to execute long-term dependency learning operation of the time sequence, model training is carried out by setting network structure parameters and utilizing a back propagation algorithm, and a loss function is optimized to generate a time sequence feature model.
In the sub-step S101, the history log data is optimized through data cleaning and normalization processing, a data cleaning operation is performed, and a data screening method is used to remove irrelevant items and outliers, so as to ensure that the data set only contains information useful for subsequent analysis. Techniques employed include identifying and excluding items that are not related to the log data content, such as invalid timestamps and irregular entries. Normalization is performed on the time stamps in the data set by means of a uniform time format, e.g. converting all time stamps into the same time zone or uniform date format. The purpose of the normalization process is to ensure that the time series data remains consistent and comparable in subsequent analysis. The generated data processing set is a clean and consistent time series data set, and lays a foundation for the subsequent feature extraction and model training.
In the step S102, the extraction of key time series features is performed on the processed data set by feature engineering technology, which involves in-depth analysis of statistical properties of time series data, such as trend, seasonal, periodic and random fluctuations. Periodic variations and long-term trends of the data are extracted using time series analysis methods such as moving average or exponential smoothing. Iterative analysis is performed to accurately identify and quantify features, including identifying periodic patterns in the data using autocorrelation and partial autocorrelation functions. Not only helps reveal the inherent structure of the data, but also provides the necessary inputs for the effective application of long-term memory network algorithms. The generated feature extraction set is a refined data set that contains time series features that are critical to model training.
In the sub-step S103, long-term dependency learning is performed on the time-series data in the feature extraction set using a long-term memory network (LSTM) algorithm. Structural parameters of the LSTM network are set, including the number of layers, the number of neurons, and the type of activation function. A key feature of LSTM networks is their ability to process and memorize long spans of input data through their unique gating mechanisms, including forget gates, input gates, and output gates. Model training is performed using a back propagation algorithm by adjusting the network weights to optimize the loss function, such as mean square error or cross entropy loss. In the training process, the LSTM can learn long-term dependence and mode change in the time sequence, effectively capture abnormal mode characteristics in the data, and the generated time sequence feature model can accurately reflect the time sequence features and potential abnormal modes of the log data, so that powerful support is provided for subsequent data analysis and decision.
Referring to fig. 3, based on a time series feature model, performing unsupervised learning by using a self-encoder, and performing dimension reduction and key feature extraction of multi-dimensional log data by minimizing a reconstruction error between input and output in training, wherein the step of generating an abnormal feature vector model specifically includes:
S201: based on a time sequence feature model, a Z score standardization method is adopted to carry out standardization processing on feature values, and data are converted into a space with zero mean and unit variance by calculating deviation of a plurality of feature values and a feature set mean value, so that a standardized feature set is generated;
S202: based on the standardized feature set, constructing a self-encoder neural network, adopting a back propagation and gradient descent algorithm to optimize error between reconstruction output and original input data, learning compression representation of the data, retaining original information, and generating a self-encoder training model;
S203: based on a self-encoder training model, the original data set is subjected to dimension reduction processing, input multidimensional data is converted into compressed finite dimension feature vectors, and model parameters are adjusted and optimized through key information of the input data to generate an abnormal feature vector model.
In the S201 substep, a Z-score normalization method is adopted to process a time sequence feature model, and the normalization of data is realized by calculating the deviation of each feature value relative to the average value of the whole feature set and dividing the deviation by the standard deviation. The normalized data has zero mean and unit variance, which helps the self-encoder to learn data features more effectively, because the normalized data has the same scale in different dimensions, avoiding disproportionate influence on the learning process due to a large range of certain feature factor values, and generating a normalized feature set, which lays a foundation for the subsequent self-encoder training. Ensuring that the data can be processed more uniformly and efficiently in the subsequent learning process, thereby improving the accuracy and efficiency of self-encoder learning.
In the S202 substep, a self-encoder neural network is constructed, the self-encoder comprising two main parts: an encoder and a decoder, the encoder functions to convert the input data into a more compact internal representation, while the decoder attempts to reconstruct the original input data from the compact representation. By employing back propagation and gradient descent algorithms, the error between the reconstructed output and the original input is optimized. The network learning compresses the data into a limited dimensional feature space while preserving key information of the original data. To adjusting network weights and biases to minimize the difference between the input and the reconstructed output. The self-encoder training model can effectively capture the internal structure and key features of the data, and provides a basis for subsequent abnormal feature vector generation.
In the sub-step S203, the original dataset is subjected to a dimension reduction process by converting the input multidimensional data into compressed finite dimension feature vectors. In the process of dimension reduction, the self-encoder converts high-dimensional data into low-dimensional characteristic representation through the encoder part, and key information of input data is captured. By adjusting and optimizing model parameters (e.g., learning rate, number of layers, number of neurons per layer, etc.), it is ensured that the model is able to be efficiently transformed. The generated abnormal feature vector model not only reduces the dimension of the data, but also retains key information for subsequent abnormal detection. The efficiency of data processing is enhanced through dimension reduction, and meanwhile, the accuracy and the reliability of anomaly detection are improved.
Referring to fig. 4, based on an abnormal feature vector model, an abnormal mode is analyzed by adopting a logistic regression method, and by performing weight distribution on feature vectors and calculating probability scores, whether a log is abnormal is judged, and the step of generating a logical abnormality judgment model specifically includes:
S301: based on an abnormal feature vector model, adopting an interval scaling method for normalization processing, and converting the feature values to be between 0 and 1 by calculating real-time values of a plurality of features at the feature range positions to generate a normalized feature vector set;
S302: based on the normalized feature vector set, cross entropy is used as a loss function, model parameters are optimized, weighted summation is carried out on a plurality of feature vectors, a Sigmoid function is applied to convert the weighted summation into probability output, and weights are iteratively adjusted through a gradient descent method, so that an optimized logistic regression model is generated;
s303: and (3) based on the optimized logistic regression model, carrying out abnormal probability scoring on the unknown log data, judging whether the log is abnormal or not according to a preset threshold value, and comparing the abnormal probability score with the preset threshold value, marking the log as abnormal if the probability score exceeds the preset threshold value, and judging the log as normal if the probability score is lower than the threshold value, so as to generate a logical abnormal judgment model.
In the step S301, the abnormal feature vector model is normalized by using an interval scaling method, and the feature values are converted into a range between 0 and 1, including calculating the relative position of each feature value with respect to the maximum value and the minimum value thereof, and scaling the feature values. The normalization process enables the model to equally consider each feature in subsequent logistic regression analysis, regardless of its original dimensions. The normalized feature vector set not only reduces the deviation among the features of different measurement levels, but also improves the stability and efficiency of model training. The generated normalized feature vector set provides more consistent and standardized input data for the logistic regression model, and ensures the accuracy and reliability of the model.
In a substep S302, a logistic regression model is constructed and cross entropy is applied as a loss function, the logistic regression model by weighted summing the feature vectors and then converting the weighted sum into a probability output using the Sigmoid function. The cross entropy loss function is used to measure the difference between the model predictive probability and the actual label, and the optimization objective is to minimize the loss. The model training adopts a gradient descent method, and the weight of each characteristic is adjusted through iteration so as to reduce the gap between the predicted output and the actual result. The optimized logistic regression model can effectively convert the input normalized feature vector into abnormal and non-abnormal probability scores, and provides a basis for the next abnormal judgment.
In the step S303, an anomaly probability score is performed on the unknown log data by using the optimized logistic regression model, and whether the log is anomalous is determined by comparing the probability score of each log data with a preset threshold. If the probability score of the log exceeds a set threshold, marking the log as abnormal by the model; otherwise, the test is considered to be normal. The threshold is set based on analysis of historical data and the requirements of the actual application scenario. The logic abnormality judgment model not only provides a quantitative abnormality detection means, but also can adjust the sensitivity of abnormality judgment according to the actual application requirements. The generated logic abnormality judgment model enables the log analysis process to be more automatic and accurate, and improves the efficiency and accuracy of abnormality detection
Referring to fig. 5, based on a logic anomaly determination model, using an InfluxDB time series database, and combining an autoregressive integral moving average and a time series prediction method, the steps of analyzing the time pattern and trend of log data and generating a time trend analysis model are specifically as follows:
s401: based on a logic anomaly judgment model, processing log data by adopting a data aggregation technology, filling missing values in the data, removing the abnormal values, resampling the data, and generating a processed time sequence data set;
s402: based on the processed time sequence data set, an autoregressive integral moving average model is applied to analyze, the stability of differential order maintenance data of the model is determined, the dependence and random fluctuation of an autoregressive item and a moving average item capturing time sequence are selected, and a time sequence analysis model is generated;
S403: based on the time series analysis model, potential trends and modes are identified through predicting and analyzing data points in a future time period, and the periodicity, the trending and the abnormal behaviors of the data are analyzed by using values predicted by the model to generate a time trend analysis model.
In the step S401, the log data is integrated by using a data aggregation technique, including filling the missing values in the data and removing the abnormal values to ensure the integrity and accuracy of the data, and resampling is performed on the data, such as aggregating the data at time intervals (e.g., every hour and every day) to better capture the overall trend of the time-series data. It is critical to ensure the quality and consistency of the data set so that it is suitable for subsequent time series analysis. The processed time series data sets will be used to identify and analyze temporal patterns and trends of the log data, providing a basis for further analysis.
In a substep S402, the processed time series data set is analyzed using an autoregressive integral moving average (ARIMA) model, which is a classical time series analysis tool that captures the time dependence of the data by combining Autoregressive (AR) and Moving Average (MA) components. The differential order of the model is determined to maintain the stationarity of the data, which means that the data needs to be converted into a stationary sequence, such as by performing one or more differential processes on the raw data. The appropriate number of AR and MA terms is selected to capture the dependency and random volatility of the time series. The generated time series analysis model can reveal the internal structure and dynamic change of the data, and provides a basis for further trend prediction.
In the S403 substep, a predictive analysis of temporal trends is performed to identify potential trends and patterns in the data by predicting data points over a future time period. The method has the advantages that the values predicted by the ARIMA model are utilized to analyze the periodicity, the trending and the potential abnormal behaviors of the data, so that the understanding of the change rule of the log data along with time can be facilitated, and early warning is provided for the abnormal or trend change occurring in the future. The generated time trend analysis model not only helps to understand the behavior patterns of the historical data, but also can provide important references for future data monitoring and analysis.
Referring to fig. 6, based on a time trend analysis model, by using a logic anomaly analysis time trend analysis model and adopting a random forest and gradient lifting decision tree algorithm, log data is classified and anomaly detection is optimized, and the steps of generating anomaly detection and classification models are specifically as follows:
s501: based on a time trend analysis model, performing dimension reduction on the data features by adopting principal component analysis, reducing the dimension of the data and reserving the variability of the original data by converting the original features, and generating a dimension reduction feature data set;
s502: based on the dimension reduction characteristic data set, carrying out initialized classification and anomaly detection through a random forest algorithm, establishing a plurality of decision trees and integrating prediction results, improving the accuracy and robustness of the model, and generating a random forest classification and anomaly detection model;
S503: based on the random forest classification and anomaly detection model, a gradient lifting decision tree algorithm is adopted to perform model optimization, the residual error of model prediction is reduced by gradually adding a decision tree, and the mode and the relation in the data are captured to generate an anomaly detection and classification model.
In a sub-step S501, the data features are reduced in dimension using Principal Component Analysis (PCA), which converts the original data features into a set of linearly uncorrelated variables, i.e., principal components, by linear transformation. Features that contribute most to the variability of the dataset are identified, thereby reducing the dimensionality of the data while preserving the information of the original data. The method comprises the steps of calculating covariance matrix of a data set, extracting eigenvalues and eigenvectors, sorting according to the magnitude of the eigenvalues, and selecting the first few principal components as new eigenvalues. The generated dimension-reduction feature data set contains the most important information in the original data set, but has fewer feature numbers, so that the calculation of a subsequent model is simplified and the processing speed is improved.
In the S502 substep, a random forest algorithm is adopted to carry out preliminary classification and anomaly detection, and the accuracy and the robustness of the model are improved by constructing a plurality of decision trees and integrating the prediction results of the decision trees. When constructing random forests, each decision tree is trained on a random subset of the dataset, and the method of bootstrap aggregation (bootstrap aggregating) reduces the dependence of the model on specific data samples, thereby improving the generalization capability of the model. Each decision tree provides a prediction result, and the random forest integrates the results in a voting or averaging mode to generate a random forest classification and anomaly detection model. The log data can be effectively classified preliminarily, and potential abnormal modes can be identified.
In the sub-step S503, the model is optimized by using a gradient lifting decision tree (GBDT) algorithm, and the residual error of the model in the previous step is reduced by gradually adding the decision tree, i.e. the decision tree added in each step is a correction for the residual error in the previous step. Complex patterns and relationships in the data can be further captured, particularly in dealing with non-linear and complex data structures. Through repeated iteration, residual errors are gradually reduced, and GBDT can gradually improve the performance of the model on classification and anomaly detection tasks. The finally generated abnormal detection and classification model not only inherits the advantages of a random forest, but also improves the accuracy and reliability of prediction through GBDT optimization, and provides an effective tool for accurately identifying and classifying abnormal behaviors in log data.
Referring to fig. 7, based on the anomaly detection and classification model, the log data processed by the anomaly detection and classification model is dynamically compressed by adopting an information entropy calculation method, and a compression strategy is adjusted according to the information quantity and the change rate of the data, so as to generate a dynamic log compression model, which specifically comprises the following steps:
S601: based on the anomaly detection and classification model, carrying out evaluation on the information quantity of the data set by adopting an information entropy calculation method, analyzing the frequency of data points, quantifying the information quantity of the data set, determining the complexity of the data, identifying key information points in the data, and generating a data information quantity evaluation result;
S602: based on the data information quantity evaluation result, a dynamic coding algorithm is applied to formulate a compression strategy, and the coding length and the compression rate are dynamically adjusted according to the information quantity and the change rate of the data to generate a dynamic compression strategy;
S603: based on a dynamic compression strategy, a compression algorithm is adopted to dynamically compress the log data, and according to parameters defined in the strategy, including the coding length and the compression rate, the real-time change of the data content is compressed, so that a dynamic log compression model is generated.
In the step S601, the information entropy calculation method is used to evaluate the information of the data set, where the information entropy is an index for measuring randomness and uncertainty of the information, so as to quantify the diversity and complexity of the information in the data set. Analyzing the occurrence frequency of each data point in the log data set, and calculating the overall information entropy of the data set by utilizing the frequency. Not only the probability of each data point is considered, but also the distribution characteristics of the entire data set. The total information quantity of the data set can be accurately estimated, and key information points in the data can be identified. Providing an assessment of the complexity and information richness of the data for the next dynamic compression strategy ensures that critical information is not lost during the compression process.
In the step S602, a dynamic coding algorithm is applied to formulate a compression strategy, and the dynamic coding algorithm dynamically adjusts the coding length and the compression rate according to the information amount and the change rate of the data so as to realize efficient compression of the data. The algorithm analyzes the information entropy evaluation result of the data set, and decides the coding length and compression degree of the data part according to the complexity and change characteristics of the data. For example, for data portions with high information content and frequent changes, shorter codes and lower compression rates are used, while for portions with low information content and less changes, longer codes and higher compression rates may be used. And key information and characteristics of the data are reserved while data compression is ensured, and storage and processing efficiency is improved.
In the sub-step S603, the log data is dynamically compressed by using a compression algorithm, and the real-time variation of the data content is compressed according to parameters defined in the dynamic compression policy, such as the coding length and the compression rate. The dynamic compression method can flexibly adjust the compression strategy according to the actual condition of the data, thereby effectively compressing the data, reducing the requirement of storage space and simultaneously ensuring the integrity and availability of the data. The generated dynamic log compression model not only improves the efficiency and effect of data compression, but also ensures that key characteristics of data are not lost in the compression process, and provides a high-quality data source for subsequent data analysis and processing.
Referring to fig. 8, a log collection management system is configured to execute the log collection management method, where the system includes a data processing and feature extraction module, a time sequence analysis module, an anomaly vector generation module, an anomaly judgment module, a trend prediction module, an anomaly detection and classification module, and a dynamic compression module.
The data processing and feature extraction module is used for removing irrelevant items and abnormal values by adopting a data normalization processing algorithm based on historical log data, carrying out unified formatting processing on time marks in the data, carrying out statistical attribute analysis on the data by using a feature engineering technology, extracting key time sequence features and generating a feature extraction set;
The time sequence analysis module adopts a long-term and short-term memory network algorithm to learn the long-term dependency relationship and the mode change of time sequence data based on the feature extraction set, sets network structure parameters, and optimizes a loss function by using a back propagation algorithm to generate a time sequence feature model;
the abnormal vector generation module performs unsupervised learning by adopting a self-encoder based on the time sequence feature model, performs data reduction and key feature extraction on reconstruction errors between input and output, and generates an abnormal feature vector model;
The anomaly judgment module adopts a logistic regression method to carry out weight distribution on the feature vectors and calculate probability scores based on the anomaly feature vector model, and generates a logic anomaly judgment model by taking cross entropy as a loss function optimization parameter;
The trend prediction module is used for analyzing the time mode and trend of the log data based on the logic anomaly judgment model and combining the InfluxDB and the autoregressive integral moving average method, and the autoregressive integral moving average model is used for analyzing and predicting the data to generate a time trend analysis model;
The anomaly detection and classification module classifies log data and optimizes anomaly detection through a random forest algorithm and a gradient lifting decision tree algorithm based on a time trend analysis model, establishes a plurality of decision trees through the random forest algorithm and integrates a prediction result, optimizes the model through the decision tree algorithm, and generates an anomaly detection and classification model;
The dynamic compression module adopts an information entropy calculation method to evaluate the information quantity of the data set based on the anomaly detection and classification model, adjusts the compression strategy by using a dynamic coding algorithm, and performs dynamic compression processing on the log data to generate a dynamic log compression model.
In the data processing and feature extraction module, a data normalization processing algorithm is adopted to process the historical log data, and the data is subjected to the operation of removing irrelevant items and abnormal values, so that the quality and consistency of a data set are ensured. And carrying out unified formatting processing on the time marks in the data, so that the time series data are in a unified and standardized format, and the comparability of the data from different sources is ensured. Statistical attribute analysis is performed on the data using feature engineering techniques, including identifying and extracting key time series features, such as trends, periodicity, and seasonal features. The generated feature extraction set reflects the core attribute of the original data and provides necessary input for subsequent time series analysis and anomaly detection.
In the time sequence analysis module, a long-short-term memory network (LSTM) algorithm is applied, and the LSTM is suitable for processing time sequence data and can learn long-term dependency and mode change of the data. Structural parameters of the LSTM network, such as the number of layers, the number of neurons, and the type of activation function, are set to accommodate the characteristics of the time series data. Model training is performed using a back propagation algorithm to optimize a loss function, such as mean square error or cross entropy. And generating a time sequence characteristic model by learning complex modes and dependency relations in the time sequence data. The model can capture and reflect the dynamic characteristics of time series data and provide deep holes for anomaly detection.
In the anomaly vector generation module, unsupervised learning is performed using a self-encoder, which is an effective dimension reduction tool that is capable of reconstructing the output by learning a compressed representation of the input data. The self-encoder trains by minimizing reconstruction errors between the input and the output. The generated abnormal feature vector model can accurately represent key information of time series data and is used for subsequent abnormal detection. The model provides a strong basis for identifying and classifying abnormal modes, and enhances the analysis capability of the whole system.
In the abnormality judgment module, the data is analyzed by adopting a logistic regression method, and the abnormal condition of the log data is judged by carrying out weight distribution on each feature vector and calculating a probability score. Cross entropy is used as a loss function to optimize model parameters. The importance of each feature in the anomaly determination is determined and translated into a probability score. The generated logic anomaly judgment model can accurately judge whether the data is abnormal according to the weight scores of the feature vectors, which is important for subsequent anomaly handling and response strategies.
In the trend prediction module, a logic anomaly judgment model, an InfluxDB and an autoregressive integral moving average (ARIMA) method are combined to analyze the time mode and trend of the log data, the ARIMA model is utilized to conduct data analysis and prediction on a feature extraction set, and the dependence and fluctuation of a time sequence are accurately captured by determining the differential order of the model and selecting a proper autoregressive item and a proper moving average item. The generated time trend analysis model can reveal time patterns and trends in log data, and has important roles in predicting data trend and identifying potential abnormal patterns in future time periods.
In the anomaly detection and classification module, a random forest and gradient lifting decision tree (GBDT) algorithm is adopted to conduct data classification and anomaly detection optimization, a plurality of decision trees are established, and tree results are integrated, so that accuracy and robustness of the model are improved. The GBDT algorithm further optimizes the model, and by gradually adding a decision tree to reduce predicted residuals, more complex patterns and relationships in the data can be captured. The finally generated abnormality detection and classification model has higher accuracy and reliability, and is very effective for classifying log data and identifying abnormal modes.
In the dynamic compression module, an information entropy calculation method is adopted to evaluate the information quantity of the data set, and the information quantity of the data set is quantized by analyzing the frequency and the distribution of data points, so that the complexity and the key information points of the data are determined. The compression strategy is adjusted by applying the dynamic coding algorithm according to the information quantity and the change rate of the data, the log data is dynamically compressed, the compression proportion can be dynamically adjusted according to the real-time change of the data content, the requirement on the storage space is effectively reduced, and meanwhile key information is reserved. The generated dynamic log compression model ensures the integrity and usability of data while improving the data storage and transmission efficiency.
The present invention is not limited to the above embodiments, and any equivalent embodiments which can be changed or modified by the technical disclosure described above can be applied to other fields, but any simple modification, equivalent changes and modification made to the above embodiments according to the technical matter of the present invention will still fall within the scope of the technical disclosure.

Claims (10)

1. The log acquisition management method is characterized by comprising the following steps of:
Based on historical log data, a long-term and short-term memory network algorithm is adopted to analyze the time sequence characteristics of the log data, and the time sequence characteristics of an abnormal mode are captured through learning the long-term dependence and the mode change of the time sequence data to generate a time sequence characteristic model;
Based on the time sequence feature model, performing unsupervised learning by adopting a self-encoder, performing dimension reduction and key feature extraction of multi-dimensional log data through reconstruction errors between input and output in training, and generating an abnormal feature vector model;
Based on the abnormal feature vector model, analyzing an abnormal mode by adopting a logistic regression method, judging whether the log is abnormal or not by carrying out weight distribution on the feature vector and calculating a probability score, and generating a logical abnormality judgment model;
Based on the logic anomaly judgment model, using an InfluxDB time sequence database and combining an autoregressive integral moving average and a time sequence prediction method to analyze the time mode and trend of log data and generate a time trend analysis model;
Based on the time trend analysis model, classifying log data and optimizing anomaly detection by adopting a random forest and gradient lifting decision tree algorithm through a logic anomaly analysis time trend analysis model to generate an anomaly detection and classification model;
Based on the anomaly detection and classification model, the log data processed by the anomaly detection and classification model is dynamically compressed by adopting an information entropy calculation method, and a compression strategy is adjusted according to the information quantity and the change rate of the data, so that a dynamic log compression model is generated.
2. The log collection management method according to claim 1, wherein the time-series feature model comprises a time-dependent graph of abnormal behavior, a time-series mode of normal behavior and statistical features of time-series data, the abnormal feature vector model comprises an extracted key abnormal feature vector, feature differences of abnormal and normal logs and feature reconstruction error indexes for abnormal detection, the logic abnormal judgment model comprises a probability scoring model of the abnormal logs, a feature weight distribution rule and a logic threshold value of abnormal judgment, the time trend analysis model comprises a time trend prediction curve of log data, seasonal and periodical mode analysis results and abnormal early warning indexes of the time mode, the abnormal detection and classification model comprises an enhanced log classification, an abnormal detection sensitivity improvement and an identification capability of differential type abnormal behavior, and the dynamic log compression model comprises a dynamic compression rate adjustment mechanism of information entropy, optimization measures of data storage efficiency after compression and a restoration capability of compressed data.
3. The log collection management method according to claim 1, wherein based on the history log data, a long-short-term memory network algorithm is adopted to analyze time series characteristics of the log data, and the time series characteristics of the abnormal mode are captured by learning long-term dependency and mode change of the time series data, and the step of generating the time series characteristic model is specifically as follows:
Based on the history log data, performing an operation of removing irrelevant items and abnormal values by adopting a data method, and generating a data processing set by carrying out normalization processing on time marks in the data and presenting the time series data into a uniform format;
Based on the data processing set, performing extraction operation of key time sequence features by adopting a feature engineering technology, and generating a feature extraction set by analyzing the periodic variation and long-term trend of the data through analyzing the statistical attribute of the time sequence data;
Based on the feature extraction set, a long-term and short-term memory network algorithm is adopted to execute long-term dependency learning operation of the time sequence, model training is carried out by setting network structure parameters and utilizing a back propagation algorithm, and a loss function is optimized to generate a time sequence feature model.
4. The log collection management method according to claim 1, wherein based on the time series feature model, performing unsupervised learning by using a self-encoder, performing dimension reduction and key feature extraction of multi-dimensional log data by minimizing a reconstruction error between input and output in training, and generating an abnormal feature vector model comprises the steps of:
Based on the time sequence feature model, a Z score standardization method is adopted to carry out standardization processing on feature values, and data are converted into a space with zero mean and unit variance by calculating deviation of a plurality of feature values and average values of feature sets, so that a standardized feature set is generated;
Based on the standardized feature set, constructing a self-encoder neural network, adopting a back propagation and gradient descent algorithm, optimizing error between reconstruction output and original input data, learning compressed representation of the data, retaining original information, and generating a self-encoder training model;
Based on the self-encoder training model, the original data set is subjected to dimension reduction processing, the input multidimensional data is converted into a compressed finite dimension feature vector, and the model parameters are adjusted and optimized through key information of the input data to generate an abnormal feature vector model.
5. The log collection management method according to claim 1, wherein based on the abnormal feature vector model, the abnormal mode is analyzed by adopting a logistic regression method, and the step of judging whether the log is abnormal or not by performing weight distribution on the feature vector and calculating a probability score is specifically as follows:
based on the abnormal feature vector model, adopting an interval scaling method for normalization processing, and converting the feature values to be between 0 and 1 by calculating real-time values of a plurality of features at the feature range positions to generate a normalized feature vector set;
Based on the normalized feature vector set, cross entropy is used as a loss function, model parameters are optimized, weighted summation is carried out on a plurality of feature vectors, a Sigmoid function is applied to convert the weighted summation into probability output, and weights are iteratively adjusted through a gradient descent method, so that an optimized logistic regression model is generated;
And based on the optimized logistic regression model, carrying out abnormal probability scoring on unknown log data, judging whether the log is abnormal according to a preset threshold value, and comparing the unknown log with the preset threshold value, marking the unknown log as abnormal if the probability score of the log exceeds the preset threshold value, and judging the unknown log as normal if the probability score of the log is lower than the threshold value, so as to generate a logical abnormal judgment model.
6. The log collection management method according to claim 1, wherein based on the logic anomaly determination model, using an InfluxDB time series database, and combining an autoregressive integral moving average and a time series prediction method, the steps of analyzing a time pattern and a trend of log data, and generating a time trend analysis model are specifically as follows:
based on the logic anomaly judgment model, processing log data by adopting a data aggregation technology, filling missing values in the data, removing the abnormal values, resampling the data, and generating a processed time sequence data set;
based on the processed time sequence data set, an autoregressive integral moving average model is applied to analyze, the stability of differential order maintenance data of the model is determined, the dependency and random fluctuation of an autoregressive item and a moving average item capturing time sequence are selected, and a time sequence analysis model is generated;
based on the time series analysis model, potential trends and modes are identified through predicting and analyzing data points in a future time period, and the periodicity, the trending and the abnormal behaviors of the data are analyzed by using values predicted by the model, so that a time trend analysis model is generated.
7. The log collection management method according to claim 1, wherein the step of classifying log data and optimizing anomaly detection by using a random forest and gradient lifting decision tree algorithm based on the time trend analysis model through a logic anomaly analysis time trend analysis model, and generating anomaly detection and classification model specifically comprises the steps of:
Based on the time trend analysis model, performing dimension reduction on the data features by adopting principal component analysis, reducing the dimension of the data and keeping the variability of the original data by converting the original features, and generating a dimension reduction feature data set;
Based on the dimension reduction characteristic data set, carrying out initialized classification and anomaly detection through a random forest algorithm, establishing a plurality of decision trees, integrating prediction results, improving the accuracy and the robustness of the model, and generating a random forest classification and anomaly detection model;
Based on the random forest classification and anomaly detection model, a gradient lifting decision tree algorithm is adopted to perform model optimization, a decision tree is gradually added to reduce model prediction residual errors, and modes and relations in data are captured to generate an anomaly detection and classification model.
8. The log collection management method according to claim 1, wherein based on the anomaly detection and classification model, the log data processed by the anomaly detection and classification model is dynamically compressed by adopting an information entropy calculation method, and a compression strategy is adjusted according to the information quantity and the change rate of the data, so that the step of generating the dynamic log compression model is specifically as follows:
Based on the anomaly detection and classification model, carrying out evaluation on the information quantity of the data set by adopting an information entropy calculation method, analyzing the frequency of data points, quantifying the information quantity of the data set, determining the complexity of the data, identifying key information points in the data, and generating a data information quantity evaluation result;
based on the data information quantity evaluation result, a dynamic coding algorithm is applied to formulate a compression strategy, and the coding length and the compression rate are dynamically adjusted according to the information quantity and the change rate of the data to generate a dynamic compression strategy;
Based on the dynamic compression strategy, adopting a compression algorithm to dynamically compress the log data, and compressing the real-time change of the data content according to parameters defined in the strategy, including the coding length and the compression rate, so as to generate a dynamic log compression model.
9. The log collection management system according to any one of claims 1 to 8, wherein the system comprises a data processing and feature extraction module, a time sequence analysis module, an anomaly vector generation module, an anomaly judgment module, a trend prediction module, an anomaly detection and classification module, and a dynamic compression module.
10. The log collection management system according to claim 9, wherein the data processing and feature extraction module uses a data normalization algorithm to remove irrelevant items and outliers based on historical log data, performs unified formatting processing on time stamps in the data, performs statistical attribute analysis on the data using a feature engineering technology, extracts key time sequence features, and generates a feature extraction set;
The time sequence analysis module adopts a long-term and short-term memory network algorithm to learn the long-term dependency relationship and the mode change of time sequence data based on the feature extraction set, sets network structure parameters, and optimizes a loss function by using a back propagation algorithm to generate a time sequence feature model;
The abnormal vector generation module adopts a self-encoder to perform unsupervised learning based on a time sequence feature model, and performs data reduction and key feature extraction on reconstruction errors between input and output to generate an abnormal feature vector model;
the anomaly judgment module is used for carrying out weight distribution on the feature vectors and calculating probability scores by adopting a logistic regression method based on the anomaly feature vector model, and generating a logical anomaly judgment model by taking cross entropy as a loss function optimization parameter;
The trend prediction module is used for analyzing the time mode and trend of the log data based on a logic anomaly judgment model and combining an InfluxDB and an autoregressive integral moving average method, and the autoregressive integral moving average model is used for analyzing and predicting the data to generate a time trend analysis model;
The anomaly detection and classification module classifies log data and optimizes anomaly detection through a random forest algorithm and a gradient lifting decision tree algorithm based on a time trend analysis model, establishes a plurality of decision trees through the random forest algorithm and integrates a prediction result, optimizes the model through the decision tree algorithm, and generates an anomaly detection and classification model;
the dynamic compression module adopts an information entropy calculation method to evaluate the information quantity of the data set based on the anomaly detection and classification model, adjusts the compression strategy by using a dynamic coding algorithm, and performs dynamic compression processing on the log data to generate a dynamic log compression model.
CN202410341243.5A 2024-03-25 2024-03-25 Log acquisition management method and system Pending CN118152355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410341243.5A CN118152355A (en) 2024-03-25 2024-03-25 Log acquisition management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410341243.5A CN118152355A (en) 2024-03-25 2024-03-25 Log acquisition management method and system

Publications (1)

Publication Number Publication Date
CN118152355A true CN118152355A (en) 2024-06-07

Family

ID=91292527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410341243.5A Pending CN118152355A (en) 2024-03-25 2024-03-25 Log acquisition management method and system

Country Status (1)

Country Link
CN (1) CN118152355A (en)

Similar Documents

Publication Publication Date Title
CN113255848B (en) Water turbine cavitation sound signal identification method based on big data learning
CN111259947A (en) Power system fault early warning method and system based on multi-mode learning
Mao et al. Anomaly detection for power consumption data based on isolated forest
Ge et al. Comparative study on defect prediction algorithms of supervised learning software based on imbalanced classification data sets
CN117611015B (en) Real-time monitoring system for quality of building engineering
CN117670066B (en) Questor management method, system, equipment and storage medium based on intelligent decision
CN113569462A (en) Distribution network fault level prediction method and system considering weather factors
CN116861331A (en) Expert model decision-fused data identification method and system
Kovantsev et al. Analysis of multivariate time series predictability based on their features
CN117807374A (en) Spare part abnormal leading data identification method, device and computer equipment
WO2024027487A1 (en) Health degree evaluation method and apparatus based on intelligent operations and maintenance scene
CN116738354A (en) Method and system for detecting abnormal behavior of electric power Internet of things terminal
Bond et al. A hybrid learning approach to prognostics and health management applied to military ground vehicles using time-series and maintenance event data
CN114580472B (en) Large-scale equipment fault prediction method with repeated cause and effect and attention in industrial internet
Yu et al. A hybrid learning-based model for on-line monitoring and diagnosis of out-of-control signals in multivariate manufacturing processes
CN118152355A (en) Log acquisition management method and system
Haghighati et al. Feature extraction in control chart patterns with missing data
Yang et al. Prediction of criminal tendency of high-risk personnel based on combination of principal component analysis and support vector machine
CN117649209B (en) Enterprise revenue auditing method, system, equipment and storage medium
CN117577981B (en) Photovoltaic power generation energy storage control method and system
CN115831339B (en) Medical system risk management and control pre-prediction method and system based on deep learning
Mikhailova et al. Unsupervised deep-learning-powered anomaly detection for instrumented infrastructure
CN117708602B (en) Building safety monitoring method and system based on Internet of things
Liu et al. Prediction of hydraulic pumps remaining useful life based on LSTM and Transform with dual self-attention
Wang et al. Research on Industrial System Operation Pattern Recognition Based on Deep Generative Model

Legal Events

Date Code Title Description
PB01 Publication