CN115712834A

CN115712834A - Alarm false alarm detection method, device, equipment and storage medium

Info

Publication number: CN115712834A
Application number: CN202211474139.0A
Authority: CN
Inventors: 丁雄; 何帅
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-02-24

Abstract

The application discloses a method, a device, equipment and a storage medium for detecting false alarm, which relate to the technical field of machine learning and comprise the following steps: processing the historical alarm log by using a sample selection rule determined based on the current interested feature dimension; training a false alarm detection model by using the initial training sample set obtained after the processing, and detecting a real-time alarm log by using a current training model to be optimized obtained after the training; updating the initial training sample set according to the current log detection result, performing model training on the current training model to be optimized based on the updated training sample set, performing manual model optimization operation to obtain the current training model to be optimized, and skipping to the step of detecting the real-time alarm log through the current training model to be optimized. This application carries out the increase training in order to promote the generalization ability of model through combining artifical inspection and machine learning to the wrong report detection model, and then realizes reducing the wrong report and promotes the effect of operating efficiency.

Description

Method, device, equipment and storage medium for false alarm detection

Technical Field

The invention relates to the technical field of machine learning, in particular to a method, a device, equipment and a storage medium for detecting false alarm.

Background

In the scene of abnormal detection in the security field, most detection modes adopt methods such as threshold values, expert rules and the like, so that the problem of excessive alarm quantity can be often found, and the result of excessive alarm quantity is that analysts cannot manually check from the beginning, so that real alarm can be submerged in false alarm, thereby greatly influencing real positioning of threat and wasting a large amount of manpower to analyze alarm. If the problem of too many alarms is not solved, the real threat cannot be found, and simultaneously, the waste of material resources (platform systems and the like) and safety analysis human resources is caused, so that the problem of too many safety alarms is urgently needed to be solved.

The current method is that a model is iterated on the basis of the current method, a training set is fixed, a fixed model is output, alarms are aggregated in a fixed mode by adopting certain rules to reduce the occurrence frequency of the alarms, but the number of the alarms is only reduced simply, because of more entities and more alarm rules, a plurality of alarms can still be generated, and for an analyst, the analyst cannot manually check and verify each alarm one by one; in addition, the current scheme does not analyze whether the alarm is false alarm, and the problem of excessive false alarm cannot be really solved because the false alarm cannot be correctly determined.

Disclosure of Invention

In view of this, an object of the present invention is to provide a method, an apparatus, a device, and a storage medium for detecting a false alarm, which can improve the generalization capability of a model by performing incremental training on a false alarm detection model by combining artificial verification and machine learning, and achieve the effect of reducing the false alarm. The specific scheme is as follows:

in a first aspect, the present application discloses a method for detecting false alarm, comprising:

acquiring a historical alarm log generated by a preset alarm system, and processing the historical alarm log by using a preset artificial marking rule and a sample selection rule determined based on the current interested feature dimension to obtain an initial training sample set;

training a false alarm detection model by using the initial training sample set based on a preset model training rule to obtain a current to-be-optimized training model, and detecting a real-time alarm log generated in a preset alarm system through the current to-be-optimized training model to obtain a corresponding current log detection result;

updating the initial training sample set according to the current log detection result to obtain an updated training sample set, and performing model training on the current training model to be optimized based on a first iteration updating period and the updated training sample set to obtain a first optimized training model;

and carrying out manual model optimization operation on the first optimized training model based on a second iteration updating period to obtain a second training model to be optimized, taking the second training model to be optimized as the current training model to be optimized, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized.

Optionally, the obtaining of the historical alarm log generated by the preset alarm system, and processing the historical alarm log by using a preset artificial labeling rule and a sample selection rule determined based on the current interested feature dimension to obtain an initial training sample set includes:

acquiring a historical alarm log generated by a preset alarm system in a first preset time period, and extracting information data of corresponding parameters in the historical alarm log according to preset extraction parameters;

and summarizing the information data corresponding to the historical alarm logs to obtain a plurality of training samples, and processing the training samples by utilizing a preset artificial marking rule and a sample selection rule determined based on the current interested feature dimension to obtain an initial training sample set.

Optionally, the training of the false alarm detection model by using the initial training sample set based on a preset model training rule to obtain the current training model to be optimized includes:

dividing the initial training sample set into a plurality of groups of training sets and corresponding test sets based on a preset sample distribution rule, and performing first round model training on the false alarm detection model by using a plurality of preset training algorithms and the plurality of groups of training sets to obtain a plurality of first training models;

detecting the corresponding test set by using the plurality of first training models to obtain corresponding detection results, and selecting a first preset number of first training models of which the detection results meet first preset detection conditions;

and after the parameters of the first training model are adjusted and optimized, performing second-round model training by using the plurality of groups of training sets to obtain the current training model to be optimized meeting second preset detection conditions and the optimized model parameters corresponding to the current training model to be optimized.

Optionally, the updating the initial training sample set according to the current log detection result to obtain an updated training sample set includes:

acquiring a current log detection result and a corresponding manual marking result obtained by detecting the historical alarm log in a second preset time period, and judging whether the current log detection result is the same as the corresponding manual marking result; the second preset duration is determined based on the optimization model parameters;

if so, randomly selecting the historical alarm logs in a preset proportion and adding the historical alarm logs to the initial training sample set to obtain the updated training sample set;

if not, all the historical alarm logs are added to the initial training sample set based on the manual marking result to obtain the updated training sample set.

Optionally, before the updating the initial training sample set according to the current log detection result to obtain an updated training sample set, the method further includes:

randomly selecting a second preset number of alarm logs to be marked from the real-time alarm logs generated in the preset alarm system;

judging whether the actual tag type corresponding to the alarm log to be marked is a real tag type or not through manual check, and correspondingly marking the alarm log to be marked to obtain a corresponding manual marking result; the second preset number is determined based on a model training phase of the current training model to be optimized.

Optionally, after performing model training on the current training model to be optimized based on the first iterative update cycle and the updated training sample set to obtain a first optimized training model, the method further includes:

judging whether the current time is the end time of the second iteration updating period or not;

if yes, executing the artificial model optimization operation;

if not, determining the first optimized training model as a current training model to be optimized in the next iteration period, determining the updated training sample set as an initial training sample set in the next iteration period, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized.

Optionally, the performing, based on a second iterative update cycle, a manual model optimization operation on the first optimized training model to obtain a second training model to be optimized includes:

acquiring an alarm log carrying a label type obtained in the second iteration updating period;

judging whether a feature dimension needs to be newly added through manual model optimization operation, if so, updating the feature dimension, and determining an optimization model training set from the alarm log based on the updated feature dimension;

and training the corresponding first optimized training model at present by using the optimized model training set to obtain a corresponding second training model to be optimized.

In a second aspect, the present application discloses a false alarm detection device, comprising:

the system comprises a sample set acquisition module, a comparison module and a comparison module, wherein the sample set acquisition module is used for acquiring a historical alarm log generated by a preset alarm system and processing the historical alarm log by utilizing a preset artificial marking rule and a sample selection rule determined based on a current interested feature dimension to obtain an initial training sample set;

the model training module is used for training the false alarm detection model by utilizing the initial training sample set based on a preset model training rule to obtain a current training model to be optimized;

the log detection module is used for detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized to obtain a corresponding current log detection result;

the sample set updating module is used for updating the initial training sample set according to the current log detection result to obtain an updated training sample set;

the model iteration module is used for carrying out model training on the current training model to be optimized based on a first iteration updating period and the updated training sample set to obtain a first optimized training model;

and the model optimization module is used for performing manual model optimization operation on the first optimized training model based on a second iteration updating period to obtain a second training model to be optimized, taking the second training model to be optimized as the current training model to be optimized, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized again.

In a third aspect, the present application discloses an electronic device, comprising:

a memory for storing a computer program;

and the processor is used for executing the computer program to realize the alarm false alarm detection method.

In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program, which when executed by a processor implements the aforementioned method for false alarm detection.

According to the method, firstly, historical alarm logs generated by a preset alarm system are obtained, and the historical alarm logs are processed by utilizing a preset artificial marking rule and a sample selection rule determined based on the current interested feature dimension to obtain an initial training sample set; training a false alarm detection model by using the initial training sample set based on a preset model training rule to obtain a current to-be-optimized training model, and detecting a real-time alarm log generated in a preset alarm system through the current to-be-optimized training model to obtain a corresponding current log detection result; updating the initial training sample set according to the current log detection result to obtain an updated training sample set, and performing model training on the current training model to be optimized based on a first iteration updating period and the updated training sample set to obtain a first optimized training model; and carrying out manual model optimization operation on the first optimized training model based on a second iteration updating period to obtain a second training model to be optimized, taking the second training model to be optimized as the current training model to be optimized, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized. Therefore, the method can be seen in that firstly, the alarm log is trained to obtain a to-be-optimized training model, then the process of adding the artificial mark is right, the current to-be-optimized training model is continuously subjected to iterative updating, and in the preset iterative period, the model after iterative updating is required to be subjected to artificial model optimization operation through manual work, so that whether the alarm log is subjected to false alarm or not can be accurately judged through the process of adding the artificial mark, the self-adaptive training model is further adopted, the false alarm recognition capability and the generalization capability of the model are continuously improved, and the model can be subjected to more accurate adjustment and iteration through the artificial model optimization operation, so that the model can accurately recognize real alarm, the false alarm is reduced, and the labor cost and the time cost are greatly reduced while the system cost and the resource cost are reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method for false alarm detection as disclosed herein;

FIG. 2 is a flow chart of a specific false alarm detection method disclosed in the present application;

FIG. 3 is a flow chart of a specific false alarm detection method disclosed herein;

FIG. 4 is a schematic structural diagram of an apparatus for false alarm detection of an alarm disclosed in the present application;

fig. 5 is a block diagram of an electronic device disclosed in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, an embodiment of the present application discloses a method for detecting false alarms, including:

step S11: the method comprises the steps of obtaining a historical alarm log generated by a preset alarm system, and processing the historical alarm log by utilizing a preset artificial marking rule and a sample selection rule determined based on the current interested feature dimension to obtain an initial training sample set.

In this embodiment, first, a historical alarm log generated in a preset alarm system needs to be obtained, and a preset manual marking rule is used to mark the historical alarm log, that is, a security expert or an analyst checks the historical alarm log to determine whether the historical alarm log is a real alarm, if so, the historical alarm log is marked as 1, if so, the historical alarm log is marked as 0, then, a corresponding sample selection rule is determined based on a current interested feature dimension, and the marked historical alarm log is selected and trained to obtain a corresponding initial training sample set. It should be further noted that, since the generally extracted alarm feature dimensions are all relatively small, and the effect is poor if the method is directly applied to model training, the combined feature dimensions are obtained by the method of feature derivation generation so that the training effect of a subsequent model is better, the feature derivation generation can be performed by encoding the class-type features in a way of class encoding, for example, one-hot encoding is performed on an IP address to obtain corresponding feature dimensions, for the time-type features, day, time, and grade feature dimensions can be extracted, the correlation of each feature to the result is calculated based on the correlation coefficient, then two-two combination is performed based on the magnitude of the correlation, the first five feature dimensions are extracted to form a new ranking feature dimension, so that an initial training sample set required to be utilized in the model training process is obtained by using the combined feature dimension training feature sample. Therefore, the authenticity of the historical alarm log is determined in advance in a manual marking mode, the initial training sample set is determined through derived feature dimensions, and the model training effect can be improved.

Step S12: and training a false alarm detection model by using the initial training sample set based on a preset model training rule to obtain a current training model to be optimized, and detecting a real-time alarm log generated in the preset alarm system by using the current training model to be optimized to obtain a corresponding current log detection result.

In this embodiment, after an initial training sample set used for training a model is obtained, a False alarm detection model is trained correspondingly by using the initial training sample set based on a preset model training rule to obtain a current training model to be optimized, and then the current training model to be optimized is deployed into an actual preset alarm system to detect a real-time alarm log generated in the preset alarm system to obtain a corresponding current log detection result, if the real-time alarm log is detected to be a real alarm, the real-time alarm log is marked as model _ True, and if the real-time alarm log is detected to be a False alarm, the real-time alarm log is marked as model _ False. Therefore, the obtained model is applied to a preset alarm system, and the alarm log is detected in real time to obtain a corresponding log detection result, so that the accuracy of the model is determined based on the log detection result and a corresponding model iteration updating process is performed conveniently.

Step S13: and updating the initial training sample set according to the current log detection result to obtain an updated training sample set, and performing model training on the current training model to be optimized based on a first iteration updating period and the updated training sample set to obtain a first optimized training model.

In this embodiment, the initial training sample set is updated according to the current log detection result to obtain a corresponding updated sample set, and the current training model to be optimized is correspondingly model-trained based on the first iterative update cycle and the updated training sample set to achieve iterative update of the model, so as to obtain a first optimized training model. The first iterative update period may be set by a user according to a user's requirement, for example, if the first iterative update period is set to one day, the model is updated iteratively every night. Therefore, by continuously updating the training set and iteratively updating the model, the accuracy of the model for detecting the false alarm log can be effectively improved, so that the generalization capability and the detection effect of the model are improved, and the effect of reducing the false alarm is achieved.

Step S14: and carrying out manual model optimization operation on the first optimized training model based on a second iteration updating period to obtain a second training model to be optimized, taking the second training model to be optimized as the current training model to be optimized, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized.

In this embodiment, based on a second iterative update cycle, a manual model optimization operation is performed on the first optimized training model to obtain a corresponding second training model to be optimized, the second training model to be optimized is used as a current training model to be optimized, and then the step of detecting a real-time alarm log generated in the preset alarm system through the current training model to be optimized in step S12 is skipped again to implement a cyclic iterative update process of the model. It should be noted that, the optimization process of the model in this step is different from that in step S13, the process of iteratively updating the model in step S13 is to train the model automatically, information such as parameters and feature dimensions of the model is not changed, the model training process does not need to be manually participated, the model optimization process in this step is implemented manually, and the second iterative update period can also be set in a user-defined manner, it can be understood that the second iterative update period can be greater than the first iterative update period, for example, the second iterative update period is set to one month, the first iterative update period is set to one day, the system automatically iteratively updates the model by using the updated training set every night, and after continuously executing for one month, an algorithm person can perform manual model optimization operation on the first optimized training model obtained by training on the thirty days to obtain a second model to be optimized, and determine the second model to be optimized as the current model to be optimized and deploy the second model to the preset alarm system, so as to perform an automatic iterative update process of the next month in a loop. Therefore, the model can be adjusted and iterated more accurately through the process of optimizing the model through manual intervention, so that the model can identify real alarm more accurately, and further the misinformation is reduced.

As can be seen, in the embodiment, a historical alarm log generated by a preset alarm system is obtained, and the historical alarm log is processed by using a preset artificial marking rule and a sample selection rule determined based on a current interested feature dimension to obtain an initial training sample set; training a false alarm detection model by using the initial training sample set based on a preset model training rule to obtain a current training model to be optimized, and detecting a real-time alarm log generated in a preset alarm system by using the current training model to be optimized to obtain a corresponding current log detection result; updating the initial training sample set according to the current log detection result to obtain an updated training sample set, and performing model training on the current training model to be optimized based on a first iteration updating period and the updated training sample set to obtain a first optimized training model; and carrying out manual model optimization operation on the first optimized training model based on a second iteration updating period to obtain a second training model to be optimized, taking the second training model to be optimized as the current training model to be optimized, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized. Therefore, in the embodiment, firstly, the alarm log is trained to obtain the training model to be optimized, then the process of adding the artificial mark is performed to continuously perform iterative update on the current training model to be optimized, and in addition, the model after iterative update needs to be manually subjected to artificial model optimization operation in the preset iterative period, so that whether the alarm log is subjected to false alarm or not can be accurately judged through the process of adding the artificial mark, the self-adaptive training model is further adopted, the false alarm recognition capability and the generalization capability of the model are continuously improved, the model can be more accurately adjusted and iterated through the artificial model optimization operation, the model can more accurately recognize real alarm, the false alarm is further reduced, the system and resource expenditure is reduced, and meanwhile, the labor cost and the time cost are greatly reduced.

Based on the above embodiments, it can be seen that, in the present application, a trained false alarm detection model is obtained first and is continuously updated iteratively to achieve the effect of reducing false alarms, and before the model is iterated, a trained false alarm detection model needs to be obtained first, and then the training process of the model is described in detail.

Referring to fig. 2, the embodiment of the present application discloses a specific method for false alarm detection, which includes:

step S21: the method comprises the steps of obtaining a historical alarm log generated by a preset alarm system in a first preset duration, and extracting information data of corresponding parameters in the historical alarm log according to preset extraction parameters.

In this embodiment, a history alarm log generated during a first preset duration in a preset alarm system is first obtained, and key information data of corresponding parameters in the history alarm log is extracted according to preset extraction parameters, where the first preset duration may be set by a user on the basis of a user requirement, for example, if the first preset duration is set to be one month, all history alarm logs stored in a previous month in the preset alarm system may be extracted. The preset extraction parameters include but are not limited to a source IP address, a destination IP address, alarm time, access times, an alarm name, an access action and the like, so that the key information in the historical alarm log is extracted, and resource waste caused by obtaining a large amount of log information can be avoided.

Step S22: summarizing the information data corresponding to the historical alarm logs to obtain a plurality of training samples, and processing the training samples by utilizing a preset artificial marking rule and a sample selection rule determined based on the current interested feature dimension to obtain an initial training sample set.

In this embodiment, after obtaining corresponding key information data, the information data corresponding to the historical alarm log is summarized to form a record, the record is used as a training sample, a preset manual marking rule is used, a sample selection rule is determined based on a current interested feature dimension, and the training sample is processed to obtain an initial training sample set.

Step S23: the initial training sample set is divided into a plurality of groups of training sets and corresponding testing sets based on preset sample distribution rules, and a plurality of preset training algorithms and the plurality of groups of training sets are utilized to perform first round model training on the false alarm detection model so as to obtain a plurality of first training models.

In this embodiment, the initial training sample set is divided into a plurality of training sets and test sets based on a preset sample distribution rule, and a first round of model training is performed on the false alarm detection model by using a plurality of preset training algorithms and the plurality of training sets to obtain a plurality of first training models. The preset sample allocation rule is determined based on the first preset time, for example, if the first preset time is set to be one month, that is, thirty days, an initial sample set is determined by obtaining historical alarm logs of 30 days, then alarm log data of the first day is selected as a training set, alarm log data of the subsequent 29 days is selected as a test set, then alarm log data of the previous 2 days is selected as the training set, and alarm log data of the subsequent 28 days is selected as the test set, based on the above method, that is, alarm log data of the previous M (0-M-30) days is selected as the training set each time, alarm log data of the subsequent 30-M days is selected as the test set, alarm log data of the previous 29 days is selected as the training set, and alarm log data of the last day is selected as the test set. Based on the method, 29 training sets and test sets can be obtained, and a plurality of preset training algorithms are used for carrying out first round model training on the false alarm detection model respectively based on the 29 training sets, wherein the preset training algorithms include but are not limited to random forests, naive Bayes, support vector machines, K nearest neighbor algorithms and the like, and if four algorithms of random forests, naive Bayes, support vector machines and K nearest neighbor algorithms are used for carrying out first round model training respectively based on the 29 training sets, 29 × 4 first training models, namely 116 first training models are obtained. Thus, by setting a plurality of groups of training sets and test sets with different numbers, the model can be trained for a plurality of times, and corresponding comparison is formed to obtain the model with the best effect.

Step S24: and detecting the corresponding test set by using the plurality of first training models to obtain corresponding detection results, and selecting a first preset number of first training models of which the detection results meet first preset detection conditions.

In this embodiment, after obtaining a plurality of first training models, the plurality of first training models are respectively detected by using corresponding test sets to obtain a plurality of detection results, where the detection process may be to calculate relevant evaluation indicators of the first training models, where the evaluation indicators include, but are not limited to, accuracy, recall, F1-score (F1 score), AUC (Area Under ROC Curve), and the like, and then screen out a first preset number of first training models whose detection results satisfy a first preset detection condition, where the first preset number may be set based on an actual situation, the first preset detection condition may be determined based on corresponding evaluation indicators, for example, a machine learning model with the highest accuracy or the highest AUC value, and the first preset number is set to 3, and then, based on 116 first training models of 29 obtained in step S23, 3 most effective machine learning models are selected from 4 models in each group, and a second training model 87 is obtained. It should be further noted that when the model effect is judged to be good or bad, different evaluation indexes need to be selected according to actual conditions in different scenes, and in a scene of alarm and false alarm, the accuracy and the AUC value can be selected, and the higher the accuracy and the larger the AUC value are, the better the model effect is. Therefore, the machine learning models with the best effect are obtained by training the machine learning models and performing comparison experiments, the problems that the model is not good in effect and not strong in generalization ability and the like due to the fact that only one machine learning model is used for obtaining the model can be solved, and the model is accurate to the maximum extent.

Step S25: and after parameter tuning is carried out on the first training model, performing second round model training by using the plurality of groups of training sets to obtain the current training model to be optimized meeting second preset detection conditions and the optimization model parameters corresponding to the current training model to be optimized.

In this embodiment, parameter tuning is performed on the first training model obtained after screening by using a grid search method, that is, different hyper-parameters are set, then a second round of model training is performed on the tuned model by using the plurality of sets of corresponding training sets, and the model obtained by the second round of model training is tested by using the corresponding test set to obtain a corresponding test result, the current model to be optimized and the optimization model parameters corresponding to the current training model to be optimized are selected based on the test result, where the hyper-parameters include, but are not limited to, a learning rate, a tree, a maximum depth, a kernel function, and the like, and the optimization model parameters include, but are not limited to, a training set corresponding to the model, a test set, an optimized hyper-parameter, and an alarm log number N selected in the training set, where the second preset test condition is similar to the first preset test condition, the effect of the model can be determined based on evaluation indexes of the model, but one model which needs to be screened out the best effect of the model in the second preset test condition is determined, for example, the current model to be optimized, the model to be obtained based on S24 training sets, and the optimization model, and the model to be selected as the next optimization model, and the optimization model is selected, and the optimization model to be used in the current training set, and the optimization model to be selected in the subsequent optimization test set, so that the optimization model is selected, and the optimization model is performed, and the optimization model to be selected based on the training set 87, and the optimization model to be used in the training set to be selected after the current training set. Therefore, each model can be more fit with the required training effect by performing parameter tuning operation on the models, the model with the best effect is obtained by comparison, and the corresponding most appropriate parameters such as the training set, the testing set and the like are determined, so that an optimal model is provided for the subsequent iteration updating process, and the situation that the subsequent model iteration effect deviates from the expectation due to the poor effect of the initial model is prevented.

Therefore, before the model is iterated, the initial sample set needs to be processed to obtain a plurality of groups of training sets and test sets, two different model training processes are carried out on the false alarm detection model based on a plurality of different machine learning algorithms, parameters of the model are optimized, and a model with the best model effect is obtained through a comparison experiment and serves as an initial model of the subsequent model iteration.

Based on the above embodiments, it can be known that, in the present application, when a model to be optimized is trained, two model training processes are performed through multiple sets of training sets and test sets and multiple machine learning models and grid parameter optimization, so as to obtain a model with the best effect, which facilitates a subsequent model iteration process, and a process of performing model iteration update on a current model to be optimized will be described in detail below. Referring to fig. 3, an embodiment of the present application discloses a specific method for detecting false alarm, which includes:

step S31: and detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized to obtain a corresponding current log detection result.

Step S32: acquiring a current log detection result and a corresponding manual marking result obtained by detecting the historical alarm log in a second preset time period, and judging whether the current log detection result is the same as the corresponding manual marking result; the second preset duration is determined based on the optimization model parameters.

In this embodiment, the log detection result and the corresponding manual labeling result obtained by detecting the historical alarm log within a second preset duration are obtained, and whether the current log detection result is the same as the corresponding log labeling result is judged, where the second preset duration is a corresponding optimization model parameter obtained when the initial model to be optimized is obtained, based on step S25, it can be known that the optimization model parameter includes the number of alarm log days N selected in the training set, and the second preset duration is N-1 day, that is, the historical alarm log within the previous N-1 day is obtained for judgment.

In this embodiment, before the updating the initial training sample set according to the current log detection result to obtain an updated training sample set, the method further includes: randomly selecting a second preset number of alarm logs to be marked from the real-time alarm logs generated in the preset alarm system; judging whether the actual label type corresponding to the alarm log to be marked is a real label type or not through manual check, and correspondingly marking the alarm log to be marked to obtain a corresponding manual marking result; the second preset number is determined based on a model training phase of the current training model to be optimized. That is, a related technician may randomly select a second preset number of alarm logs to be marked from real-time alarm logs generated in the preset alarm system every day, and then analyze and judge whether an actual tag corresponding to the alarm log to be marked is a real tag type through an artificial checking manner, if so, mark the real alarm log to be marked as human _ tube, and if the alarm log to be marked is a False alarm, mark the alarm log to be marked as human _ False, so as to perform corresponding marking to obtain the artificial marking result. It should be further noted that the second preset number is determined based on the model training phase of the current training model to be optimized, if the model training phase of the current training model to be optimized is an initial stage, a half proportion of the real alarm logs included in the real-time alarm logs and the false alarm logs included in the real-time alarm logs can be selected for manual check, and the initial stage of model optimization is a period of time when the number of alarm logs requiring manual check is the largest; if the model training phase of the current training model to be optimized is the middle stage, the number of alarm logs with real selected log detection results can be increased, the number of alarm logs with false log detection results can be reduced, and the number of artificially checked alarm logs can be properly reduced in the middle stage of model optimization; if the model training stage of the current training model to be optimized is the later stage, the log detection result is mainly selected as a real alarm log and manual check is carried out, the model is actually used in the later stage of model optimization, and the purpose of manual check is to find real alarms instead of optimizing the model. And in the model training stage, the similarity between the log detection result detected by the model every day and the manual marking result is mainly judged, if the similarity is low, the accuracy of the current model to be optimized is not enough, the optimization needs to be continued, and if the similarity is high, the accuracy of the current model to be optimized basically meets the alarm analysis requirement of the log, so that the model can be directly put into use. Therefore, by manually checking the real-time alarm log every day, whether the log detection result obtained by the model detection is correct or not can be determined, so that the model is continuously optimized, and the recognition capability and generalization capability of the model and the accuracy of false alarm detection are improved.

Step S33: if yes, randomly selecting the historical alarm logs in a preset proportion and adding the historical alarm logs to the initial training sample set to obtain an updated training sample set.

In this embodiment, if the log detection result of the historical alarm log is the same as the manual tagging result, the historical alarm log of a preset proportion is randomly selected and added to the initial training sample set to obtain an updated training sample set, for example, if the preset proportion is set to 80%, 80% of the logs in the historical alarm log may be randomly selected and added to the initial training sample set.

Step S34: and if not, all the historical alarm logs are added to the initial training sample set based on the manual marking result to obtain an updated training sample set.

In this embodiment, if the log detection result is inconsistent with the manual marking result, the historical alarm log is completely added to the initial training sample set on the basis of the manual marking result to obtain a corresponding updated training sample set. Therefore, the alarm log is judged based on the result of the artificial checking, the initial training sample set is updated by using the artificial marking result obtained by the artificial checking, so that the model is conveniently learned and trained based on the updated training sample set, and the recognition capability and generalization capability of the model and the accuracy of false alarm detection are improved.

Step S35: and performing model training on the current training model to be optimized based on the first iteration updating period and the updated training sample set to obtain a first optimized training model, and judging whether the current time is the ending time of the second iteration updating period.

In this embodiment, after the updated training sample set is obtained, model training is performed on the current training model to be optimized based on the first iterative update cycle and the updated training sample set to obtain a first optimized training model, and then it is determined whether the current time is the end time of the second iterative update cycle.

Step S36: if not, determining the first optimized training model as a current training model to be optimized in the next iteration period, determining the updated training sample set as an initial training sample set in the next iteration period, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized.

In this embodiment, if the current time is not the end time of the second iterative update cycle, the first optimized training model is determined as the current training model to be optimized in the next iterative cycle, the updated training sample is determined as the initial training sample set in the next iterative cycle, and the step S31 is skipped again to execute the next iterative update of the model. For example, if the first iterative update cycle is 1 day, the second iterative update cycle is 30 days, and if the current iterative update cycle is 3 days, after the model iterative operation on the 3 rd day is executed, the first optimized training model obtained by training on the third day is determined as the current training model to be optimized for the model iteration on the 4 th day, and the updated training sample obtained on the 3 rd day is determined as the initial training sample set for the training on the 4 th day.

Step S37: and if so, acquiring the alarm log carrying the label type obtained in the second iteration updating period.

In this embodiment, if the current time is the end time of the second iterative update period, the alarm log with the tag type obtained in the second iterative update period is obtained, so that the training set is reselected by using the alarm log.

Step S38: and judging whether the feature dimension needs to be newly added or not through manual model optimization operation, if so, updating the feature dimension, and determining an optimization model training set from the alarm log based on the updated feature dimension.

In this embodiment, relevant technicians determine whether a new feature dimension needs to be added by determining whether an effective feature dimension is found in an actual situation, if so, perform incremental iteration on a feature level by using a feature importance analysis mode, a feature combination mode, a feature dimension reduction mode and the like to update the feature dimension, respectively select alarm logs of the previous N-1 day, the previous N days and the previous N +1 days from the alarm logs as training sets to determine an optimization model training set based on the updated feature dimension, and use the remaining alarm logs as an optimization model test set.

Step S39: and training the corresponding first optimized training model by using the optimization model training set to obtain a corresponding second training model to be optimized, taking the second training model to be optimized as the current training model to be optimized, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system by using the current training model to be optimized.

In this embodiment, after obtaining a corresponding optimized model training set and an optimized model test set, a grid search method is used to optimize multiple sets of hyper-parameters, a plurality of machine learning models such as random forests are used to perform model training on the currently corresponding first optimized training model by using the optimized model training set to obtain a second to-be-optimized training model, the second to-be-optimized training model is used to test the optimized model test set to obtain a corresponding test result, the best optimized model parameter is selected based on the evaluation parameters such as the accuracy and the AUC of the test result, at this time, the manual optimization operation on the model is completed, the second to-be-optimized training model is used as the current to-be-optimized training model, and the process jumps to step S31 again to use the second to-be-optimized training model and the corresponding optimized model parameter as the reference for performing automated model training in the next second iteration update period. Therefore, a series of tuning and optimization are carried out on the training set, the hyper-parameters, the characteristic dimensions and the model through manual optimization operation, the increase and decrease of the alarm categories and the influence of the alarm caused by the change of the rules or the detection mode can be dealt with in time, the generalization capability of the model is continuously improved through incremental learning and iteration of the characteristic and the sample dimensions in a machine learning mode, the manual operation and maintenance analysis and processing are greatly reduced, and the occupation of resources and the labor time cost are saved.

For the specific process of the step S31, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

It can be seen that, in the embodiment, the model is continuously updated in an iterative manner by combining the automated model iterative training mode and the manual model optimization mode, adaptive training can be performed according to the label manually added or updated every day, real alarm and misinformation can be better identified, so that in the later stage of model training, the goal that only a small amount of real alarms are achieved every day can be achieved, the time and the energy of analysts are greatly liberated, the increase and decrease of alarm categories and the influence of alarm caused by the change of rules or detection modes can be timely dealt with by adding the manual model optimization, the generalization capability of the model is continuously improved by incremental learning and iteration of features and sample dimensions in a machine learning mode, the manual operation and maintenance analysis and processing are greatly reduced, and the occupation of resources and the labor and time cost are saved.

Referring to fig. 4, an embodiment of the present application further discloses a device for detecting false alarm, which includes:

the system comprises a sample set acquisition module 11, a parameter setting module and a parameter setting module, wherein the sample set acquisition module is used for acquiring a historical alarm log generated by a preset alarm system and processing the historical alarm log by using a preset artificial marking rule and a sample selection rule determined based on a current interested feature dimension to obtain an initial training sample set;

the model training module 12 is configured to train a false alarm detection model based on a preset model training rule by using the initial training sample set to obtain a current training model to be optimized;

the log detection module 13 is configured to detect a real-time alarm log generated in the preset alarm system through the current training model to be optimized, so as to obtain a corresponding current log detection result;

a sample set updating module 14, configured to update the initial training sample set according to the current log detection result to obtain an updated training sample set;

a model iteration module 15, configured to perform model training on the current training model to be optimized based on a first iteration update period and the updated training sample set to obtain a first optimized training model;

and the model optimization module 16 is configured to perform manual model optimization operation on the first optimized training model based on a second iteration update cycle to obtain a second training model to be optimized, use the second training model to be optimized as a current training model to be optimized, and skip again to the step of detecting a real-time alarm log generated in the preset alarm system by using the current training model to be optimized.

As can be seen, in the embodiment, a historical alarm log generated by a preset alarm system is obtained, and the historical alarm log is processed by using a preset artificial marking rule and a sample selection rule determined based on a current interested feature dimension to obtain an initial training sample set; training a false alarm detection model by using the initial training sample set based on a preset model training rule to obtain a current training model to be optimized, and detecting a real-time alarm log generated in a preset alarm system by using the current training model to be optimized to obtain a corresponding current log detection result; updating the initial training sample set according to the current log detection result to obtain an updated training sample set, and performing model training on the current training model to be optimized based on a first iteration updating period and the updated training sample set to obtain a first optimized training model; and carrying out manual model optimization operation on the first optimized training model based on a second iteration updating period to obtain a second training model to be optimized, taking the second training model to be optimized as the current training model to be optimized, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized. Therefore, according to the embodiment, firstly, the alarm log is trained to obtain a training model to be optimized, then, the process of adding the artificial mark is right, the current training model to be optimized is continuously subjected to iterative update, and in a preset iterative period, artificial model optimization operation is required to be carried out on the model subjected to iterative update through manpower, so that whether the alarm log is subjected to false alarm or not can be accurately judged through the process of adding the artificial mark, the model is subjected to self-adaptive training model, the false alarm recognition capability and the generalization capability of the model are continuously improved, the model can be subjected to more accurate adjustment and iteration through the artificial model optimization operation, the model can accurately recognize real alarm, the false alarm is reduced, and the labor cost and the time cost are greatly reduced while the system and resource expenditure are reduced.

In some specific embodiments, the sample set obtaining module 11 may specifically include:

the device comprises an information data extraction unit, a data processing unit and a data processing unit, wherein the information data extraction unit is used for acquiring a historical alarm log generated by a preset alarm system in a first preset duration period and extracting information data of corresponding parameters in the historical alarm log according to preset extraction parameters;

and the sample set acquisition unit is used for summarizing the information data corresponding to the historical alarm logs to obtain a plurality of training samples, and processing the training samples by utilizing a preset artificial marking rule and a sample selection rule determined based on the current interested feature dimension to obtain an initial training sample set.

In some specific embodiments, the model training module 12 may specifically include:

the first model training unit is used for dividing the initial training sample set into a plurality of groups of training sets and corresponding test sets based on preset sample distribution rules, and performing first round of model training on the false alarm detection model by utilizing a plurality of preset training algorithms and the plurality of groups of training sets to obtain a plurality of first training models;

the model detection unit is used for detecting the corresponding test set by utilizing the plurality of first training models to obtain corresponding detection results, and selecting a first preset number of first training models of which the detection results meet first preset detection conditions;

and the second model training unit is used for performing second-round model training by using the plurality of groups of training sets after the parameters of the first training model are adjusted and optimized so as to obtain the current training model to be optimized meeting second preset detection conditions and the optimized model parameters corresponding to the current training model to be optimized.

In some specific embodiments, the sample set updating module 14 may specifically include:

the result judging unit is used for acquiring the current log detection result and the corresponding manual marking result obtained by detecting the historical alarm log in a second preset time period, and judging whether the current log detection result is the same as the corresponding manual marking result or not; the second preset duration is determined based on the optimization model parameters;

a first sample set updating unit, configured to randomly select a preset proportion of the historical alarm logs and add the selected historical alarm logs to the initial training sample set to obtain the updated training sample set when the current log detection result is the same as the corresponding artificial marking result;

and the second sample set updating unit is used for adding all the historical alarm logs to the initial training sample set based on the artificial marking result to obtain the updated training sample set when the current log detection result is different from the corresponding artificial marking result.

In some specific embodiments, the apparatus for detecting false alarm may further include:

the alarm log selection module is used for randomly selecting a second preset number of alarm logs to be marked from the real-time alarm logs generated in the preset alarm system;

the artificial marking module is used for judging whether the actual label type corresponding to the alarm log to be marked is a real label type through artificial check, and correspondingly marking the alarm log to be marked to obtain a corresponding artificial marking result; the second preset number is determined based on a model training phase of the current training model to be optimized.

the time judgment module is used for judging whether the current time is the end time of the second iteration updating period or not;

the operation execution module is used for executing the artificial model optimization operation when the current moment is the ending moment of the second iteration updating period;

and the step skipping module is used for determining the first optimized training model as the current training model to be optimized in the next iteration cycle when the current moment is not the ending moment of the second iteration updating cycle, determining the updated training sample set as the initial training sample set in the next iteration cycle, and skipping to the step of detecting the real-time alarm log generated in the preset alarm system through the current training model to be optimized.

In some specific embodiments, the model optimization module 16 may specifically include:

the log obtaining unit is used for obtaining the alarm log carrying the label type obtained in the second iteration updating period;

the characteristic dimension updating unit is used for judging whether a characteristic dimension needs to be newly added or not through manual model optimization operation, if so, updating the characteristic dimension and determining an optimization model training set from the alarm log based on the updated characteristic dimension;

and the model optimization unit is used for training the currently corresponding first optimized training model by using the optimized model training set to obtain a corresponding second to-be-optimized training model.

Further, an electronic device is disclosed in the embodiments of the present application, and fig. 5 is a block diagram of an electronic device 20 according to an exemplary embodiment, which should not be construed as limiting the scope of the application.

Fig. 5 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the alarm false alarm detection method disclosed in any one of the foregoing embodiments. In addition, the electronic device 20 in this embodiment may be specifically an electronic computer.

In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol that can be applied to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.

In addition, the storage 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon may include an operating system 221, a computer program 222, etc., and the storage manner may be a transient storage or a permanent storage.

The operating system 221 is used for managing and controlling each hardware device on the electronic device 20 and the computer program 222, and may be Windows Server, netware, unix, linux, or the like. The computer programs 222 may further include computer programs that can be used to perform other specific tasks in addition to the computer programs that can be used to perform the alarm false positive detection method disclosed in any of the embodiments described above and executed by the electronic device 20.

Further, the present application also discloses a computer-readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the alarm false alarm detection method disclosed above. For the specific steps of the method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The technical solutions provided by the present application are introduced in detail, and specific examples are applied in the description to explain the principles and embodiments of the present application, and the descriptions of the above examples are only used to help understanding the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for detecting alarm false alarm is characterized by comprising the following steps:

2. The method according to claim 1, wherein the obtaining of the historical alarm log generated by a preset alarm system and the processing of the historical alarm log by using a preset artificial labeling rule and a sample selection rule determined based on a current feature dimension of interest to obtain an initial training sample set comprises:

summarizing the information data corresponding to the historical alarm logs to obtain a plurality of training samples, and processing the training samples by utilizing a preset artificial marking rule and a sample selection rule determined based on the current interested feature dimension to obtain an initial training sample set.

3. The method according to claim 1, wherein the training of the false alarm detection model by using the initial training sample set based on a preset model training rule to obtain the current training model to be optimized comprises:

dividing the initial training sample set into a plurality of groups of training sets and corresponding test sets based on a preset sample distribution rule, and performing first round model training on a false alarm detection model by using a plurality of preset training algorithms and the plurality of groups of training sets to obtain a plurality of first training models;

4. The alarm false alarm detection method of claim 3, wherein the updating the initial training sample set according to the current log detection result to obtain an updated training sample set comprises:

and if not, all the historical alarm logs are added to the initial training sample set based on the manual marking result to obtain the updated training sample set.

5. The alarm false alarm detection method of claim 4, wherein before the updating the initial training sample set according to the current log detection result to obtain an updated training sample set, the method further comprises:

judging whether the actual label type corresponding to the alarm log to be marked is a real label type or not through manual check, and correspondingly marking the alarm log to be marked to obtain a corresponding manual marking result; the second preset number is determined based on a model training phase of the current training model to be optimized.

6. The method according to claim 1, wherein after the model training of the current training model to be optimized based on the first iterative update cycle and the updated training sample set to obtain the first optimized training model, the method further comprises:

if yes, executing the artificial model optimization operation;

7. The method according to any one of claims 1 to 6, wherein the performing a manual model optimization operation on the first optimized training model based on a second iterative update cycle to obtain a second training model to be optimized includes:

obtaining a history alarm log carrying the label type obtained in the second iteration updating period;

and training the currently corresponding first optimized training model by using the optimization model training set to obtain a corresponding second to-be-optimized training model.

8. An alarm false alarm detection device, comprising:

a sample set updating module, configured to update the initial training sample set according to the current log detection result to obtain an updated training sample set;

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the alarm false positive detection method according to any one of claims 1 to 7.

10. A computer-readable storage medium for storing a computer program which, when executed by a processor, implements the alarm false positive detection method of any one of claims 1 to 7.