CN114936615A - Small sample log information anomaly detection method based on characterization consistency correction - Google Patents

Small sample log information anomaly detection method based on characterization consistency correction Download PDF

Info

Publication number
CN114936615A
CN114936615A CN202210876386.7A CN202210876386A CN114936615A CN 114936615 A CN114936615 A CN 114936615A CN 202210876386 A CN202210876386 A CN 202210876386A CN 114936615 A CN114936615 A CN 114936615A
Authority
CN
China
Prior art keywords
network
self
learning
consistency
characterization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210876386.7A
Other languages
Chinese (zh)
Other versions
CN114936615B (en
Inventor
许扬汶
刘天鹏
韩冬
孙腾中
刘灵娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Big Data Group Co ltd
Original Assignee
Nanjing Big Data Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Big Data Group Co ltd filed Critical Nanjing Big Data Group Co ltd
Priority to CN202210876386.7A priority Critical patent/CN114936615B/en
Publication of CN114936615A publication Critical patent/CN114936615A/en
Application granted granted Critical
Publication of CN114936615B publication Critical patent/CN114936615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a small sample log information abnormity detection method based on characterization consistency proofreading, which comprises the following steps of: data preprocessing, extracting event characteristics and carrying out serialization processing; the self-learning characteristic representation network is used for iterative training, an original special-shaped computing network is used for learning a characteristic extractor from a small sample classification task, a characteristic consistency proofreading module is constructed, the original special-shaped computing network and the self-learning characteristic representation network are respectively trained through the characteristic consistency proofreading functions of the original special-shaped computing network and the self-learning characteristic representation network, and the trained self-learning characteristic representation network is used as an embedded network; inputting test set data to obtain a classification result; and performing corresponding processing according to the output result. The method utilizes the supervision information of the original special-shaped computing network to guide the self-learning characteristics to represent network training, is more suitable for model training under the condition of small samples, and simultaneously improves the classification effect of the abnormal detection model under the condition of small samples.

Description

Small sample log information anomaly detection method based on characterization consistency correction
Technical Field
The invention relates to a classification detection method, in particular to a small sample log information abnormity detection method based on characterization consistency proofreading.
Background
With the development of new technologies such as internet, big data, cloud computing and the like, more and more industries and scenes start digital operation. The services enable the life of common users to be convenient and efficient, but also bring a profit channel for the black and gray industry, so that a series of new network security problems are derived. Aiming at the endless network security problem, the traditional detection method can not meet the current requirements on network security defense. The anomaly detection technology based on artificial intelligence technologies such as a neural network and the like has self-learning capability and dynamic monitoring capability, so that the network security technology is qualitatively improved. A large number of samples are needed as training data for a traditional neural network detection model, however, in practical application, user data acquisition difficulty is large, time consumption is long, and labeling cost is high, so that effective samples are scarce, and a high-efficiency detection model is difficult to train in the face of a small sample task.
The proofreading learning is an important learning paradigm, and mainly utilizes auxiliary tasks to mine own supervision information from large-scale unsupervised data, and trains a network through the constructed supervision information, so that valuable characteristics of downstream tasks can be learned, and the method comprises main methods based on context, time sequence and the like. However, conventional collation learning methods tend to rely on a large number of training samples. In a small sample scene, due to the lack of enough samples, the obtained supervision information is mainly concentrated on the difference of base class samples, and valuable semantic information of a new class is ignored. The direct application of the learning task in a small sample scene may learn some inappropriate "shortcuts" instead of the key semantic information, i.e. learn a biased representation, thereby leading to misdirection of the main task, causing a performance degradation of the small sample learning.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a small sample log information abnormity detection method based on characterization consistency proofreading, which can improve the small sample learning performance, and can efficiently distinguish abnormal behaviors aiming at mobile phone application.
The technical scheme is as follows: the invention relates to a small sample log information abnormity detection method based on characterization consistency proofreading, which comprises the following steps of:
(1) data preprocessing, namely analyzing log information to obtain structured log data, extracting and classifying event characteristics, performing serialization processing, and converting the event characteristics into numerical vector data;
(2) the method comprises the steps that a model feature extractor is subjected to iterative training, preprocessed data are divided into a training set and a testing set, based on training set data, a small sample classification task is executed on each epsilon by using a task-based epicode training strategy, an original special-shaped computing network is used for learning a feature extractor from the small sample classification task, then a characterization consistency checking module is constructed, characterization consistency checking functions of the original special-shaped computing network and a self-learning feature representation network are computed, the characterization consistency checking functions are used for respectively training the original special-shaped computing network and the self-learning feature representation network, parameters of the original special-shaped computing network and the self-learning feature representation network are continuously updated in the iterative process, and finally the trained self-learning feature representation network is used as an embedded network;
(3) inputting the preprocessed test set data as a model, using the trained self-learning characteristic representation network, calculating the similarity of the test sample and each category, and obtaining the category with the highest similarity as a classification result;
(4) and performing corresponding processing according to an output result of the prediction stage, and if abnormal behaviors are found, giving out a warning prompt to remind system management personnel to pay attention to the abnormal behaviors so as to guarantee the system safety.
Preferably, the event characteristics in step (1) are an event behavior description string _ id and a security label, where the event behavior description string _ id includes three categories, namely, File operation File, Process operation Process, and Registry operation registration, and the three categories of events include 16 event operation behaviors, and are divided as shown in the following table:
Figure 411306DEST_PATH_IMAGE001
preferably, the 16 event operation behaviors are sequentially stored in a reference vector with the vector size of 16, and a vector matrix corresponding to the reference vector is initialized with the matrix size of
Figure 2956DEST_PATH_IMAGE002
Represented by a 16-bit binary number, each bit representing an event operation behavior attribute flag value executed by the program, 0 indicating the absence of such an event type, and 1 indicating the presence of such an event type.
Preferably, the vector matrix is spliced with the security label to form an event behavior vector, and the vector size is
Figure 401576DEST_PATH_IMAGE003
The first 16 bits represent attribute marking values of 16 event operation behaviors, the last 1-bit security label represents a normal event behavior or an abnormal event behavior marking value, when the security label is 0, the security label represents no abnormal behavior, when the security label is 1, the security label represents that a file operation abnormal behavior exists, when the security label is 2, the security label represents that a process operation abnormal behavior exists, and when the security label is 3, the security label represents that a registry operation abnormal behavior exists.
Preferably, the step (2) comprises the following steps:
(2.1) using the training set as input to the model, computing the network using the original idiotypes
Figure 986141DEST_PATH_IMAGE004
Computing a class prototype in which
Figure 107812DEST_PATH_IMAGE005
Are learnable network parameters. In particular, for a small sample task
Figure 119630DEST_PATH_IMAGE006
Figure 943230DEST_PATH_IMAGE007
In order to support the set of data,
Figure 331486DEST_PATH_IMAGE008
computing a category for a set of queries
Figure 307663DEST_PATH_IMAGE009
Is prototyped as
Figure 490383DEST_PATH_IMAGE010
Wherein the content of the first and second substances,
Figure 66858DEST_PATH_IMAGE011
representing a sample label in feature space of
Figure 727646DEST_PATH_IMAGE009
The class prototype of (a) is,
Figure 558330DEST_PATH_IMAGE012
presentation support set
Figure 177530DEST_PATH_IMAGE013
Wherein the label is
Figure 975722DEST_PATH_IMAGE009
Is determined by the data set of (a),
Figure 705781DEST_PATH_IMAGE014
representing a data set
Figure 181849DEST_PATH_IMAGE015
The size of (a) is (b),
Figure 971951DEST_PATH_IMAGE016
a feature vector representing the sample is then generated,
Figure 257438DEST_PATH_IMAGE017
a label representing the corresponding sample;
for a query from a set of queries
Figure 276341DEST_PATH_IMAGE008
New sample of
Figure 330885DEST_PATH_IMAGE018
Each category is obtained by distance discrimination
Figure 26308DEST_PATH_IMAGE009
Normalized classification score of (a):
Figure 533513DEST_PATH_IMAGE019
wherein
Figure 356107DEST_PATH_IMAGE020
Represents the softmax function; specifying classification penalty functions
Figure 530736DEST_PATH_IMAGE021
Comprises the following steps:
Figure 397061DEST_PATH_IMAGE022
wherein, the first and the second end of the pipe are connected with each other,x q a feature vector representing the sample is determined,y q a label representing the corresponding sample;
(2.2) building a self-learning feature representation network for a query from a set of queries
Figure 657141DEST_PATH_IMAGE008
Input data of
Figure 17846DEST_PATH_IMAGE016
Generating transformations using a method of random enhancement
Figure 46982DEST_PATH_IMAGE023
Forming pairs of training samples, calculating the objective function of the self-learning feature representation network
Figure 84208DEST_PATH_IMAGE024
Comprises the following steps:
Figure 831585DEST_PATH_IMAGE025
(2.3) constructing a characteristic consistency proofreading function
Figure 792718DEST_PATH_IMAGE026
Computing the original profile into a network
Figure 676361DEST_PATH_IMAGE027
And self-learning feature representation network
Figure 150067DEST_PATH_IMAGE028
And (4) performing proofreading:
Figure 869893DEST_PATH_IMAGE029
wherein
Figure 821668DEST_PATH_IMAGE030
And
Figure 825396DEST_PATH_IMAGE031
are all learnable network parameters;
(2.4) computing network for primitive specialties
Figure 938846DEST_PATH_IMAGE004
And fusing a classification loss function and a characterization consistency correction function, and calculating a final original special type calculation network training function as follows:
Figure 145968DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 901434DEST_PATH_IMAGE033
is a weight variable;
(2.5) representing networks for self-learning features
Figure 556406DEST_PATH_IMAGE034
And fusing a target function and a characteristic consistency correction function of the self-learning characteristic representation network, and calculating a final self-learning characteristic representation network training function as follows:
Figure 575178DEST_PATH_IMAGE035
wherein, the first and the second end of the pipe are connected with each other,
Figure 66333DEST_PATH_IMAGE036
is a weight variable;
(2.6) in model training, designing and using an original special-shaped computing network and a self-learning feature representation network interactive iterative updating method, and training
Figure 359911DEST_PATH_IMAGE004
And
Figure 72652DEST_PATH_IMAGE034
using the final
Figure 803936DEST_PATH_IMAGE034
As an embedded network.
Preferably, the interactive iterative updating method includes: first, the original special-type computing network is initialized respectively
Figure 234917DEST_PATH_IMAGE004
And self-learning characterizing networks
Figure 332186DEST_PATH_IMAGE034
(ii) a Fixed self-learning feature representation of parameters in a network
Figure 899434DEST_PATH_IMAGE031
To obtain a characteristic consistency check function
Figure 807478DEST_PATH_IMAGE037
Further using the primitive prototype to compute a network training function
Figure 725756DEST_PATH_IMAGE038
For parameter
Figure 361136DEST_PATH_IMAGE030
Performing a one-step optimization, followed by updated parameters
Figure 782890DEST_PATH_IMAGE030
To obtain a new characterization consistency check function
Figure 127415DEST_PATH_IMAGE037
Representing network training functions using self-learning features
Figure 1830DEST_PATH_IMAGE039
Performing one-step optimization to obtain updated parameters
Figure 706481DEST_PATH_IMAGE031
And repeating the interactive iteration updating step until the training function is converged.
Preferably, the step (3) includes inputting the preprocessed test set data as a model, and using the trained self-learning feature representation network
Figure 248321DEST_PATH_IMAGE034
Extracting the characteristics of the sample, calculating the average value of the support set samples corresponding to each class as a prototype of the class, then calculating the similarity between the test sample and each class prototype through a small sample log information abnormity judgment function, and finally obtaining the class with the highest similarity as a final detection result.
Preferably, the small sample log information anomaly determination function is:
Figure 498168DEST_PATH_IMAGE040
the invention also provides a computer readable storage medium, a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the method for detecting the small sample log information abnormity based on the characterization consistency proofreading is realized.
Has the beneficial effects that: compared with the prior art, the invention has the following remarkable advantages: the method has the advantages that the self-supervision learning of the characteristic consistency proofreading is provided, the supervision information of the original special type computing network is used for guiding the self-learning characteristic representation network training, so that the two modules are matched, the characteristic consistency proofreading learning utilizes the inherent supervision information in the marked data, the learning characteristic manifold is improved, the representation deviation is reduced, more effective semantic information is mined, the information is integrated to form uniform distribution, and the original characterization method is further enriched and expanded; the interactive iteration updating method can further converge the target function, is more suitable for model training under the condition of small samples, improves the classification effect of the abnormal detection model under the condition of small samples, effectively detects abnormal behaviors in log files and ensures the application safety of the mobile phone.
Drawings
FIG. 1 is a flow chart of the operation of the present invention;
FIG. 2 is a schematic flow chart of the model iterative training phase of the present invention;
FIG. 3 is a comparison of classification discrimination accuracy between the method of the present invention and the prior art.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
For the task of judging whether the system has abnormal behavior or not by the given log record file, a high-efficiency abnormal detection model can be trained by using a training set, and then the user record is monitored in real time and early warned by using the model. As shown in fig. 1, the method for detecting the abnormality of the log information of the small sample based on the token consistency proofreading includes the following steps: the method comprises a data preprocessing stage, a model iterative training stage, a prediction stage and a response stage.
(1) A data preprocessing stage:
analyzing the log information to obtain structured log data, extracting event features, classifying and sorting the extracted features, serializing the features, and converting the features into numerical vector data. The method specifically comprises the following steps:
the log data set D is composed of a plurality of records, each record is composed of log data content and a label, and in the data preprocessing stage, key fields including event behavior description string _ id and a security label are extracted from the log data. In this embodiment, the following information can be extracted for one log record:
{
"start_time":"2020-08-16T20:55:00",
"end_time":"2020-08-16T20:57:00",
"size":2741,
"Processes":{
"pid":3500,
"name":[python]\\python.exe",
"events":{
"time":"2020-08-16T20:55:00",
"event_id":2233,
"ignored":false,
"string-id":"File:Permissions:|temp|\\000c34576f5c",
"action":"Permissions",
"target":"[temp]\\000c34576f5c",
"abstraction":""
}
}
"label": 1
}
the event behavior description string _ id is 'File: Permissions | temp | \ \000c34576f5 c', the event type can be obtained as 'File', the specific event operation behavior is 'Permissions', and the illegal change permission operation for the File is realized. The security label is 1, which indicates that the log record has abnormal behavior for the file system.
In this embodiment, the event vector matrix is 0010000000000000, where bit 3 is 1, indicating that there is a "Permissions" event type. By splicing the vector matrix with the security label, the vector size is formed
Figure 125458DEST_PATH_IMAGE041
00100000000000001. All data records are preprocessed as described above, each obtained data is a feature vector with the size of 1 × 17, and the data is randomly divided into a training set and a test set.
(2) And (3) in the model iterative training stage, optimizing the self-learning characteristic representation network, as shown in figure 2.
(2.1) using the training set as input to the model, computing the network using the original idiotypes
Figure 102641DEST_PATH_IMAGE004
I.e. prototype network, computing class prototypes, in which
Figure 764567DEST_PATH_IMAGE005
Are learnable network parameters. The method comprises the following specific steps: randomly extracting, for each epsilon, from a training set using a task-based epsilon training strategyNEach class is extracted respectivelyKThe samples form a support setSAnd then from thisNThe remaining samples in the individual class extract a portion of the data as a query setQThe resulting classification problem is called the N-way K-shot small sample task. Performing a small sample classification task for each epicode
Figure 185315DEST_PATH_IMAGE006
Figure 299902DEST_PATH_IMAGE007
In order to support the set of data,
Figure 80776DEST_PATH_IMAGE008
computing a category for a set of queries
Figure 597208DEST_PATH_IMAGE009
Is prototyped as
Figure 188857DEST_PATH_IMAGE010
Wherein, the first and the second end of the pipe are connected with each other,
Figure 790740DEST_PATH_IMAGE011
representing a sample label in feature space of
Figure 375305DEST_PATH_IMAGE009
The prototype of the category of (1),
Figure 746243DEST_PATH_IMAGE012
presentation support set
Figure 508794DEST_PATH_IMAGE013
Wherein the label is
Figure 332394DEST_PATH_IMAGE009
Is determined by the data set of (a),
Figure 720650DEST_PATH_IMAGE014
representing a data set
Figure 946095DEST_PATH_IMAGE015
The size of (a) is smaller than (b),
Figure 879547DEST_PATH_IMAGE016
a feature vector representing the sample is then generated,
Figure 252759DEST_PATH_IMAGE017
a label representing the corresponding sample;
for a query from a set of queries
Figure 382389DEST_PATH_IMAGE008
New sample of
Figure 27389DEST_PATH_IMAGE018
Using the following distance judgmentGet each classification
Figure 646589DEST_PATH_IMAGE009
Normalized classification score of (a):
Figure 444781DEST_PATH_IMAGE019
wherein
Figure 909260DEST_PATH_IMAGE020
Represents the softmax function; specifying classification loss functions
Figure 860030DEST_PATH_IMAGE021
Comprises the following steps:
Figure 650131DEST_PATH_IMAGE022
wherein the content of the first and second substances,x q a feature vector representing the sample is determined,y q a label representing the corresponding sample;
(2.2) building a self-learning feature representation network for a query from a set of queries
Figure 670040DEST_PATH_IMAGE008
Input data of
Figure 203789DEST_PATH_IMAGE016
Generating transformations using a method of random enhancement
Figure 9066DEST_PATH_IMAGE023
Forming pairs of training samples, calculating the objective function of the self-learning feature representation network
Figure 704489DEST_PATH_IMAGE024
Comprises the following steps:
Figure 477273DEST_PATH_IMAGE025
(2.3) construction ofCharacterizing consistency check functions
Figure 549134DEST_PATH_IMAGE026
Computing the original profile into a network
Figure 208917DEST_PATH_IMAGE027
And self-learning feature representation network
Figure 340821DEST_PATH_IMAGE028
And (3) performing proofreading:
Figure 335322DEST_PATH_IMAGE029
wherein
Figure 210874DEST_PATH_IMAGE030
And
Figure 990742DEST_PATH_IMAGE031
are all learnable network parameters;
(2.4) computing network for primitive specialties
Figure 27968DEST_PATH_IMAGE004
And fusing a classification loss function and a characterization consistency correction function, and calculating a final original special type calculation network training function as follows:
Figure 775344DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 189008DEST_PATH_IMAGE033
is a weight variable;
(2.5) representing networks for self-learning features
Figure 823383DEST_PATH_IMAGE034
Fusing target function and characterization consistency correction function of the self-learning characteristic representation network, and calculating final self-learning characteristic representation network trainingThe function is:
Figure 31510DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 735024DEST_PATH_IMAGE036
is a weight variable;
(2.6) in model training, designing and using an original special-shaped computing network and a self-learning feature representation network interactive iterative updating method, and training
Figure 952379DEST_PATH_IMAGE004
And
Figure 441260DEST_PATH_IMAGE034
the method specifically comprises the following steps:
first, the original special-type computing network is initialized respectively
Figure 85868DEST_PATH_IMAGE004
And self-learning feature representation network
Figure 542257DEST_PATH_IMAGE034
(ii) a Fixed self-learning feature representation of parameters in a network
Figure 845194DEST_PATH_IMAGE031
To obtain a characteristic consistency check function
Figure 703428DEST_PATH_IMAGE037
Further using the primitive prototype to compute a network training function
Figure 518938DEST_PATH_IMAGE042
For parameter
Figure 197044DEST_PATH_IMAGE030
Performing a one-step optimization, followed by updated parameters
Figure 501074DEST_PATH_IMAGE030
To obtain a new characterization consistency check function
Figure 213815DEST_PATH_IMAGE037
Representing network training functions using self-learning features
Figure 200226DEST_PATH_IMAGE043
Performing one-step optimization to obtain updated parameters
Figure 631207DEST_PATH_IMAGE031
Repeating the interactive iterative updating steps until the training function is converged, and using the finally trained self-learning characteristic to represent the network
Figure 479208DEST_PATH_IMAGE034
As an embedded network.
(3) A prediction stage: the preprocessed test set data is used as model input, and a self-learning characteristic representation network is used
Figure 46456DEST_PATH_IMAGE034
Extracting the characteristics of the sample, calculating the average value of the support set samples corresponding to each class as the prototype of the class, then calculating the similarity between the test sample and each class prototype through a small sample log information abnormity decision function, and finally obtaining the class with the highest similarity as the final detection result, wherein the small sample log information abnormity decision function is
Figure 203768DEST_PATH_IMAGE040
And aiming at the vector matrix 0010000000000000, finally obtaining the distance which is the closest to the class prototype distance of label =1, namely considering that the prediction label of the segment record is 1, and the abnormal behavior aiming at file operation exists.
(4) A response phase: and performing corresponding processing according to the prediction result, and the system finds that the event has abnormal behavior aiming at the file system, timely sends out a warning prompt and gives record information of the abnormal behavior so as to facilitate the manager to further troubleshoot errors.
The small sample log information anomaly detection method based on the characterization consistency proofreading is verified through a simulation experiment, a model training method and a model testing method are realized by using python, the method is compared with small sample learning methods such as an original special type calculation network, a matching network and a relation network, and the comparison result under a 5way 5shot task is shown in FIG. 3. ProtoNet represents an original special type calculation network, MatchingNet represents a matching network, relationship Net represents a relational network, MAML represents a model independent algorithm, and RAS represents the small sample learning method based on the characterization consistency proofreading. All procedures were performed on a standard server equipped with Intel Core i7-8700 CPU, 3.20GHz, 32 GBRAM and NVIDIA TITAN RTX, using a neural network whose activation function was the ReLu function, an Adam optimizer, using 0.01 as the initial learning rate and stepping it down during training. As can be seen from FIG. 3, the classification and identification accuracy of the small sample log information anomaly detection method based on the characterization consistency proofreading is higher than that of other methods by more than 5%, so that the method has the advantages of being more suitable for the special task of small sample learning, and meanwhile, the anomaly detection model classification effect under the condition of small samples is improved.

Claims (9)

1. A small sample log information abnormity detection method based on characterization consistency proofreading is characterized by comprising the following steps:
(1) preprocessing data, analyzing log information, extracting event characteristics and classifying;
(2) the method comprises the steps of iterative training of a self-learning characteristic representation network, dividing preprocessed data into a training set and a testing set, based on training set data, firstly using an epicode training strategy based on tasks, executing a small sample classification task for each epicode, using an original special-type computing network to learn a characteristic extractor from the small sample classification task, then constructing a characterization consistency proofreading module, computing characterization consistency proofreading functions of the original special-type computing network and the self-learning characteristic representation network, respectively training the original special-type computing network and the self-learning characteristic representation network by using the characterization consistency proofreading functions, continuously updating parameters of the original special-type computing network and the self-learning characteristic representation network by using an interactive iterative updating method, and finally using the trained self-learning characteristic representation network as an embedded network;
(3) and (3) using the trained self-learning characteristic representation network to calculate the similarity between the test set data and each category, and obtaining the category with the highest similarity as a final detection result.
2. The method for detecting the small sample log information abnormality based on the characterization consistency proofreading according to claim 1, wherein the event characteristics in the step (1) include an event behavior description string _ id and a security label, wherein the event behavior description string _ id includes a File operation File, a Process operation Process and a Registry operation registration.
3. The small-sample log information anomaly detection method based on the characterization consistency proofreading according to claim 2, characterized in that event behavior description string _ id is represented by binary numbers, each binary bit represents an event operation behavior attribute flag value executed by a program, 0 represents that no such event type exists, and 1 represents that such event type exists, so as to form a vector matrix; the security label represents a normal event behavior or an abnormal event behavior marking value, when the security label is 0, the security label indicates that no abnormal behavior exists, when the security label is 1, the security label indicates that a file operation abnormal behavior exists, when the security label is 2, the security label indicates that a process operation abnormal behavior exists, and when the security label is 3, the security label indicates that a registry operation abnormal behavior exists; and splicing the vector matrix with the security label to form an event behavior vector.
4. The method for detecting the anomaly of the log information of the small sample based on the characterization consistency check according to claim 1, wherein the step (2) comprises the following steps:
(2.1) using the training set as input to the model, using the original profile computational network
Figure 600048DEST_PATH_IMAGE001
Computing a class prototype in which
Figure 64659DEST_PATH_IMAGE002
Is a learnable network parameter; for a small sample task
Figure 109975DEST_PATH_IMAGE003
Figure 301922DEST_PATH_IMAGE004
In order to support the set of data,
Figure 647453DEST_PATH_IMAGE005
computing a category for a set of queries
Figure 751806DEST_PATH_IMAGE006
The prototype of (a) is:
Figure 549998DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 280056DEST_PATH_IMAGE008
representing a sample label in feature space of
Figure 480094DEST_PATH_IMAGE009
The class prototype of (a) is,
Figure 15068DEST_PATH_IMAGE010
presentation support set
Figure 300556DEST_PATH_IMAGE011
Wherein the label is
Figure 568726DEST_PATH_IMAGE009
Is determined by the data set of (a),
Figure 374002DEST_PATH_IMAGE012
representing a data set
Figure 335005DEST_PATH_IMAGE013
The size of (a) is (b),
Figure 107789DEST_PATH_IMAGE014
a feature vector representing the sample is then generated,
Figure 914071DEST_PATH_IMAGE015
a label representing the corresponding sample;
for a query from a set of queries
Figure 839433DEST_PATH_IMAGE016
New sample of
Figure 705758DEST_PATH_IMAGE017
Each category is obtained by distance discrimination
Figure 231417DEST_PATH_IMAGE009
Normalized classification score of (a):
Figure 841390DEST_PATH_IMAGE018
wherein
Figure 621258DEST_PATH_IMAGE019
Represents the softmax function; specifying classification loss functions
Figure 924064DEST_PATH_IMAGE020
Comprises the following steps:
Figure 671440DEST_PATH_IMAGE021
wherein the content of the first and second substances,
Figure 835836DEST_PATH_IMAGE017
a feature vector representing the sample is then generated,
Figure 719478DEST_PATH_IMAGE022
a label representing the corresponding sample;
(2.2) building a self-learning feature representation network for a query from a set of queries
Figure 193185DEST_PATH_IMAGE016
Feature vector of the sample
Figure 162278DEST_PATH_IMAGE014
Generating transformations using a method of random enhancement
Figure 130365DEST_PATH_IMAGE023
Forming pairs of training samples, calculating the objective function of the self-learning feature representation network
Figure 134093DEST_PATH_IMAGE024
Comprises the following steps:
Figure 778701DEST_PATH_IMAGE025
(2.3) constructing a characteristic consistency proofreading function
Figure 235090DEST_PATH_IMAGE026
Computing the original profile into a network
Figure 741289DEST_PATH_IMAGE027
And self-learning feature representation network
Figure 333944DEST_PATH_IMAGE028
And (4) performing proofreading:
Figure 149454DEST_PATH_IMAGE029
wherein
Figure 93139DEST_PATH_IMAGE002
And
Figure 403029DEST_PATH_IMAGE030
are all learnable network parameters;
(2.4) computing network for primitive specialties
Figure 115770DEST_PATH_IMAGE001
And fusing a classification loss function and a characterization consistency correction function, and calculating a final original special type calculation network training function as follows:
Figure 836601DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure 267582DEST_PATH_IMAGE032
is a weight variable;
(2.5) representing the network for self-learning characteristics
Figure 109725DEST_PATH_IMAGE033
And fusing a target function and a characteristic consistency correction function of the self-learning characteristic representation network, and calculating a final self-learning characteristic representation network training function as follows:
Figure 942551DEST_PATH_IMAGE034
wherein, the first and the second end of the pipe are connected with each other,
Figure 568705DEST_PATH_IMAGE035
is a weight variable;
(2.6) Using the characterization consistency check functionTraining a line model, namely training a final self-learning characteristic representation network by using an original special-shaped calculation and self-learning characteristic representation interactive iteration updating method
Figure 486982DEST_PATH_IMAGE033
As an embedded network.
5. The method for detecting the anomaly of the log information of the small samples based on the characterization consistency proofreading according to claim 4, wherein the interactive iterative updating method in the step (2.6) comprises the following steps:
first, the original special-type computing network is initialized respectively
Figure 138674DEST_PATH_IMAGE001
And self-learning feature representation network
Figure 560429DEST_PATH_IMAGE033
After which the fixed self-learning features represent parameters in the network
Figure 888642DEST_PATH_IMAGE030
Calculating a characteristic consistency check function
Figure 28636DEST_PATH_IMAGE026
Further using the primitive profile to compute a network training function
Figure 484019DEST_PATH_IMAGE036
To the parameter
Figure 25859DEST_PATH_IMAGE037
Performing one-step optimization and then updating by optimization
Figure 524974DEST_PATH_IMAGE037
Recalculating the characterization consistency check function
Figure 152264DEST_PATH_IMAGE026
Using self-learning characterizing functions
Figure 145759DEST_PATH_IMAGE038
To the parameter
Figure 542105DEST_PATH_IMAGE030
Performing one-step optimization to obtain optimized and updated parameters
Figure 212121DEST_PATH_IMAGE030
And repeating the iteration updating step until the training function is converged.
6. The method for detecting the abnormality of the log information of the small samples based on the characterization consistency check as claimed in claim 1, wherein the step (3) comprises using the preprocessed test set data as the model input and using the trained self-learning feature representation network
Figure 326707DEST_PATH_IMAGE033
Extracting the characteristics of the samples, calculating the average value of the support set samples corresponding to each class as a prototype of the class, and then calculating the similarity between the test sample and each class prototype through a small sample log information abnormity judgment function to obtain the class with the highest similarity as a classification result.
7. The method for detecting the small sample log information abnormality based on the characterization consistency proofreading according to claim 6, wherein the small sample log information abnormality determination function is:
Figure 842002DEST_PATH_IMAGE039
8. the method for detecting the small sample log information abnormality based on the characterization consistency proofreading as claimed in claim 1, further comprising the step (4) of performing early warning and response processing according to a detection result.
9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-8.
CN202210876386.7A 2022-07-25 2022-07-25 Small sample log information anomaly detection method based on characterization consistency correction Active CN114936615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210876386.7A CN114936615B (en) 2022-07-25 2022-07-25 Small sample log information anomaly detection method based on characterization consistency correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210876386.7A CN114936615B (en) 2022-07-25 2022-07-25 Small sample log information anomaly detection method based on characterization consistency correction

Publications (2)

Publication Number Publication Date
CN114936615A true CN114936615A (en) 2022-08-23
CN114936615B CN114936615B (en) 2022-10-14

Family

ID=82868605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210876386.7A Active CN114936615B (en) 2022-07-25 2022-07-25 Small sample log information anomaly detection method based on characterization consistency correction

Country Status (1)

Country Link
CN (1) CN114936615B (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050160340A1 (en) * 2004-01-02 2005-07-21 Naoki Abe Resource-light method and apparatus for outlier detection
US20080294580A1 (en) * 2007-05-24 2008-11-27 Paul Adams Neuromorphic Device for Proofreading Connection Adjustments in Hardware Artificial Neural Networks
CN109062774A (en) * 2018-06-21 2018-12-21 平安科技(深圳)有限公司 Log processing method, device and storage medium, server
CN109961089A (en) * 2019-02-26 2019-07-02 中山大学 Small sample and zero sample image classification method based on metric learning and meta learning
US20200034694A1 (en) * 2018-07-25 2020-01-30 Element Ai Inc. Multiple task transfer learning
CN111273870A (en) * 2020-01-20 2020-06-12 深圳奥思数据科技有限公司 Method, equipment and storage medium for iterative migration of mass data between cloud storage systems
CN112069921A (en) * 2020-08-18 2020-12-11 浙江大学 Small sample visual target identification method based on self-supervision knowledge migration
CN112529878A (en) * 2020-12-15 2021-03-19 西安交通大学 Multi-view semi-supervised lymph node classification method, system and equipment
CN112764997A (en) * 2021-01-28 2021-05-07 北京字节跳动网络技术有限公司 Log storage method and device, computer equipment and storage medium
CN113128613A (en) * 2021-04-29 2021-07-16 南京大学 Semi-supervised anomaly detection method based on transfer learning
CN113391900A (en) * 2021-06-18 2021-09-14 长春吉星印务有限责任公司 Abnormal event processing method and system in discrete production environment
CN113450300A (en) * 2020-03-24 2021-09-28 北京基石生命科技有限公司 Machine learning-based primary tumor cell picture identification method and system
CN113610139A (en) * 2021-08-02 2021-11-05 大连理工大学 Multi-view-angle intensified image clustering method
CN113705699A (en) * 2021-08-31 2021-11-26 平安科技(深圳)有限公司 Sample abnormity detection method, device, equipment and medium based on machine learning
CN113723387A (en) * 2021-07-08 2021-11-30 常州工学院 Chinese ancient book non-standard font recognition system based on deep learning
CN113963165A (en) * 2021-09-18 2022-01-21 中国科学院信息工程研究所 Small sample image classification method and system based on self-supervision learning
CN114092747A (en) * 2021-11-30 2022-02-25 南通大学 Small sample image classification method based on depth element metric model mutual learning
CN114169442A (en) * 2021-12-08 2022-03-11 中国电子科技集团公司第五十四研究所 Remote sensing image small sample scene classification method based on double prototype network
CN114299326A (en) * 2021-12-07 2022-04-08 浙江大学 Small sample classification method based on conversion network and self-supervision

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050160340A1 (en) * 2004-01-02 2005-07-21 Naoki Abe Resource-light method and apparatus for outlier detection
US20080294580A1 (en) * 2007-05-24 2008-11-27 Paul Adams Neuromorphic Device for Proofreading Connection Adjustments in Hardware Artificial Neural Networks
CN109062774A (en) * 2018-06-21 2018-12-21 平安科技(深圳)有限公司 Log processing method, device and storage medium, server
US20200034694A1 (en) * 2018-07-25 2020-01-30 Element Ai Inc. Multiple task transfer learning
CN109961089A (en) * 2019-02-26 2019-07-02 中山大学 Small sample and zero sample image classification method based on metric learning and meta learning
CN111273870A (en) * 2020-01-20 2020-06-12 深圳奥思数据科技有限公司 Method, equipment and storage medium for iterative migration of mass data between cloud storage systems
CN113450300A (en) * 2020-03-24 2021-09-28 北京基石生命科技有限公司 Machine learning-based primary tumor cell picture identification method and system
CN112069921A (en) * 2020-08-18 2020-12-11 浙江大学 Small sample visual target identification method based on self-supervision knowledge migration
CN112529878A (en) * 2020-12-15 2021-03-19 西安交通大学 Multi-view semi-supervised lymph node classification method, system and equipment
CN112764997A (en) * 2021-01-28 2021-05-07 北京字节跳动网络技术有限公司 Log storage method and device, computer equipment and storage medium
CN113128613A (en) * 2021-04-29 2021-07-16 南京大学 Semi-supervised anomaly detection method based on transfer learning
CN113391900A (en) * 2021-06-18 2021-09-14 长春吉星印务有限责任公司 Abnormal event processing method and system in discrete production environment
CN113723387A (en) * 2021-07-08 2021-11-30 常州工学院 Chinese ancient book non-standard font recognition system based on deep learning
CN113610139A (en) * 2021-08-02 2021-11-05 大连理工大学 Multi-view-angle intensified image clustering method
CN113705699A (en) * 2021-08-31 2021-11-26 平安科技(深圳)有限公司 Sample abnormity detection method, device, equipment and medium based on machine learning
CN113963165A (en) * 2021-09-18 2022-01-21 中国科学院信息工程研究所 Small sample image classification method and system based on self-supervision learning
CN114092747A (en) * 2021-11-30 2022-02-25 南通大学 Small sample image classification method based on depth element metric model mutual learning
CN114299326A (en) * 2021-12-07 2022-04-08 浙江大学 Small sample classification method based on conversion network and self-supervision
CN114169442A (en) * 2021-12-08 2022-03-11 中国电子科技集团公司第五十四研究所 Remote sensing image small sample scene classification method based on double prototype network

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
JIN WANG等: "LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things", 《SENSORS》 *
SHUMIN DENG等: "Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection", 《WSDM"20:PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING》 *
TONG WEI等: "ROBUST LONG-TAILED LEARNING UNDER LABEL NOISE", 《ARXIV:2108.11569V1》 *
汤济伟: "基于长短期记忆网络的日志分析工具的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王科: "社交媒体事件检测与演化方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
邵伟志等: "基于一致性正则化与熵最小化的半监督学习算法", 《郑州大学学报(理学版)》 *
郑倩慧志: "基于异常检测模型的日志开销优化方法研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN114936615B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
Zhao et al. A malware detection method of code texture visualization based on an improved faster RCNN combining transfer learning
CN113590698B (en) Artificial intelligence technology-based data asset classification modeling and hierarchical protection method
CN111143838B (en) Database user abnormal behavior detection method
CN113011889B (en) Account anomaly identification method, system, device, equipment and medium
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN111047173B (en) Community credibility evaluation method based on improved D-S evidence theory
CN110704616B (en) Equipment alarm work order identification method and device
CN113595998A (en) Bi-LSTM-based power grid information system vulnerability attack detection method and device
CN111126820A (en) Electricity stealing prevention method and system
CN110909542A (en) Intelligent semantic series-parallel analysis method and system
CN113656805A (en) Event map automatic construction method and system for multi-source vulnerability information
CN108229170A (en) Utilize big data and the software analysis method and device of neural network
CN109543038B (en) Emotion analysis method applied to text data
CN112148997A (en) Multi-modal confrontation model training method and device for disaster event detection
CN110765285A (en) Multimedia information content control method and system based on visual characteristics
CN114936615B (en) Small sample log information anomaly detection method based on characterization consistency correction
CN111611774A (en) Operation and maintenance operation instruction security analysis method, system and storage medium
CN116226769A (en) Short video abnormal behavior recognition method based on user behavior sequence
CN116541755A (en) Financial behavior pattern analysis and prediction method based on time sequence diagram representation learning
CN115794798A (en) Market supervision informationized standard management and dynamic maintenance system and method
CN115618297A (en) Method and device for identifying abnormal enterprise
CN108647497A (en) A kind of API key automatic recognition systems of feature based extraction
CN114610882A (en) Abnormal equipment code detection method and system based on electric power short text classification
CN113326371A (en) Event extraction method fusing pre-training language model and anti-noise interference remote monitoring information
CN113657443A (en) Online Internet of things equipment identification method based on SOINN network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant