CN110334508A - A kind of host sequence intrusion detection method - Google Patents

A kind of host sequence intrusion detection method Download PDF

Info

Publication number
CN110334508A
CN110334508A CN201910596409.7A CN201910596409A CN110334508A CN 110334508 A CN110334508 A CN 110334508A CN 201910596409 A CN201910596409 A CN 201910596409A CN 110334508 A CN110334508 A CN 110334508A
Authority
CN
China
Prior art keywords
sequence
characteristic
vector
order
commands
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910596409.7A
Other languages
Chinese (zh)
Other versions
CN110334508B (en
Inventor
卢逸君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Security Test And Appraisal Center Guangdong Province
Original Assignee
Information Security Test And Appraisal Center Guangdong Province
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Security Test And Appraisal Center Guangdong Province filed Critical Information Security Test And Appraisal Center Guangdong Province
Priority to CN201910596409.7A priority Critical patent/CN110334508B/en
Publication of CN110334508A publication Critical patent/CN110334508A/en
Application granted granted Critical
Publication of CN110334508B publication Critical patent/CN110334508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Complex Calculations (AREA)
  • Image Analysis (AREA)

Abstract

A kind of host sequence intrusion detection method, comprising the following steps: S1, respectively to each training sequence and each cycle tests extracts m characteristic commands;S2, the characteristic commands construction feature command set by the training sequence and cycle tests extraction is used;S3, distribution of the training sequence on characteristic commands collection characteristic dimension space is calculated;S4, the m characteristic commands extracted to every cycle tests are mapped as a new vector in characteristic commands concentration, are formed in the distribution vector of the characteristic commands collection characteristic dimension spatially;S5, the vector formed to every string cycle tests, most like k training order sequence therewith is found in the distribution, the most type of frequency of occurrence in the corresponding type of the k training order sequence is determined as to the differentiation type of the cycle tests.That the present invention provides a kind of costs is low, it is simple to implement, general performance is good host sequence intrusion detection method.

Description

A kind of host sequence intrusion detection method
Technical field
The invention belongs to computer network instrument detection field, especially a kind of host sequence intrusion detection method.
Background technique
The computer network instrument detection field of Intrusion Detection based on host mainly passes through host system calling sequence (abbreviation host Sequence) detect the abnormal behaviour of user.In host sequence intrusion detection, the object host sequence of detection is that user passes through life Enable the operating system bottom command sequence of row, routine call." sequence " indicates the series of commands of acquisition, and " order " only indicates it Middle single command.
The intrusion detection method of existing Intrusion Detection based on host system call sequence mainly includes following four classes:
1, based on serializing feature
The technology that serializing feature modeling is more mainstream is carried out using N-Gram, which proposed in 1996, will be System, which calls, regards word as, and calling sequence is regarded as phrase, sets k as sequence length, then window size is k+1, in sliding window When with the subsequent sequence collection of each word of data-base recording.The defect of this method is rate of false alarm height, and needs to construct sufficiently large spy Database is levied, although the experiment in ADFA data set shows that this method TPR can reach 90% or more, efficiency is higher, rate of false alarm Up to 30%, it reduces rate of false alarm and needs sufficiently more training sequences.
2, the feature based on document frequency statistics
Method based on frequency includes bag of words, TF-IDF, HMM etc., is with word frequency inverse document frequency method TF-IDF It represents, method is to call system to carry out vectorization calculating, is first given a mark with TF-IDF to sequence.On the basis of marking sequence On, it carries out SVM, KNN scheduling algorithm and classifies.Characteristic commands extracting method based on frequency is the disadvantage is that, the extraction of feature is complete It is based on probability entirely, is not based on semantic feature, important feature may be lost.Show to utilize TF- in the experiment of ADFA data set For IDF to a small amount of characteristic commands of sequential extraction procedures, the congregational rate of feature is unobvious, and needs to expend more calculating money when calculating IDF Source.
3, term vector, sentence vector incorporation model are based on
This method does not consider word frequency, but from the similitude between word on higher dimensional space apart from level extraction feature, Sequence is trained to a shallow-layer neural network, by training set each order or every string subcommand be mapped to a finger Determine in the vector space of dimension, then carries out dimensionality reduction.This kind of algorithm shortcomings are that calculate consumption resource larger, and Host-based intrusion detection Application scenarios determine that it needs the algorithm of more lightweight.
4, it is based on neural network
Ghosh et al. has used artificial neural network to misuse detection and abnormality detection, and ROC curve shows that TPR is 77.3%, FPR 2.2%.Han and CHO is introduced Evolutionary Neural Network (ENN) under study for action, normal recordings in training sequence Ratio with attack data is 2:1, shows that using the rate of false alarm of ENN be only 0.0011% to the experiment of DARPA99 data set, is examined It is about 1 hour that survey rate, which reaches 100%, the ENN training time,.Easily there is over-fitting using the methods of ENN, and the method TPR of ANN It shows not ideal enough.
Summary of the invention
It is a primary object of the present invention to overcome the deficiencies of the prior art and provide, a kind of cost is low, it is simple, comprehensive to implement Close the host sequence intrusion detection method to do very well.
To achieve the above object, the invention adopts the following technical scheme:
A kind of host sequence intrusion detection method, comprising the following steps:
S1, to training sequence and cycle tests, extract m characteristic commands;
S2, the characteristic commands construction feature command set by the training sequence and cycle tests extraction is used;
S3, distribution of the training sequence on characteristic commands collection characteristic dimension space is calculated;
S4, the m characteristic commands extracted to every cycle tests, are mapped as one in the characteristic commands collection dimensional space A new vector is formed in the distribution vector of the characteristic commands collection characteristic dimension spatially;
S5, the vector formed to every cycle tests, find k most like therewith characteristic commands sequence in the distribution The most type of frequency of occurrence in the corresponding type of the k characteristic commands sequence, is determined as the differentiation of the cycle tests by column Type.
Further:
It is further comprising the steps of:
S6, judge whether m is greater than given threshold, if it is not, then the value of m is updated to m+1, repeat execution step S1-S5;
S7, classifying quality detection is carried out to the type identification result under different m values, with the m with optimal classification effect Differentiation type under value differentiates type as final.
In step S7, classifying quality detection is carried out using TPR and FPR as the Testing index of classifying quality.
In step S1, include: to m characteristic commands of sequential extraction procedures
By it is Sequence Transformed be oriented authorized graph G=(V, E), wherein V indicate sequence in order point set, V={ V1, V2,…,Vi,…Vn| 1≤i≤n }, E is the side of cum rights w in figure, indicates the context relation of order;
Calculation command V according to the following formulaiScore:
Wherein, for ordering Vi、Vj, wijFor any two order V in sequenceiAnd VjBetween context weight, indicate Vi Subsequent order is VjNumber, In (Vi) expression subsequent commands be ViCommand set, Out (Vj) it is order VjSubsequent command set It closes, d is damped coefficient, and value range is (0,1), WS (Vj) it is order VjScore;
Wherein when calculating each order score, preset initial value is specified to all orders, and recursive calculation is repeatedly changed In generation, is until convergence;
After the WS value of orders all in the above method sequence of calculation, the maximum m life of WS value in all orders is extracted It enables.
Preferably, d takes 0.85.
In step S5, the k in the distribution closest vectors of the vector are calculated, take category label many by Voting principle Number, the differentiation type as the cycle tests.
Step S5 includes:
Institute the distance between directed quantity and test vector in S51, the calculating distribution;
S52, it sorts according to apart from size sequence, such as ascending order arrangement;
S53, it chooses with the test vector apart from the smallest k vector;
S54, the frequency of occurrence for determining classification where the k vector;
S55, prediction classification of the highest classification of frequency of occurrence as the test vector is returned.
In step S1, the initial value of m is more than or equal to 1.
A kind of computer readable storage medium, is stored with computer program, and the computer program can be executed by processor To realize the method.
The invention has the following beneficial effects:
The invention proposes a kind of Intrusion Detection based on host intruding detection system (host-based intrusion detection System, abbreviation HIDS) intrusion detection method, be able to solve Host Intrusion Detection System to algorithm time cost, invasion become Kind problem compares sensitive issue, is that a kind of cost is low compared with prior art, it is easy to accomplish, the good abnormality detection side of general performance Method can reach better characteristic commands extraction and detection effect.
The advantages of this method embodiment includes:
1, there is certain semantic feature with m characteristic commands of this method to each sequential extraction procedures, can effectively extracts Representative order, prominent crucial attack.
2, it is being instructed independent of training sequence compared to traditional STIDE algorithm according to the calculated order score of this method In the case where practicing sample size less, FPR performance can be effectively promoted.
3, compared with traditional TF-IDF method, when taking lesser m value, resource needed for this method extracts characteristic commands is more It is few, because calculating IDF in TF-IDF algorithm needs higher complexity.When m takes smaller value, it is assumed that training sequence N item, it is average long Spend P, all sequences length and be P*N, TF-IDF time-consuming reach f (P*N), this method time-consuming be f (N).
4, the characteristic commands extracted have certain semantic feature, can effectively extract key order, and do not have to be concerned about training sequence The case where key order occurs in column can successfully manage and new attack type occur and sequence of attack artificially changed Situation.
5, this method can effectively be lifted at the detected representation in the case that sample is sparse and imbalanced training sets.
It 6, can effective lifting feature order extraction efficiency compared to TF-IDF extracting method;Compared to STIDE method, originally Invention can effectively promote rate of false alarm performance.
Detailed description of the invention
Fig. 1 is the host sequence intrusion detection method flow chart of an embodiment of the present invention.
Fig. 2 be in the embodiment of the present invention by it is Sequence Transformed be digraph effect.
Fig. 3 is to be compared in ADFA data set using the testing result of the embodiment of the present invention and STIDE method.
Fig. 4 is to extract feature quantity comparison using the embodiment of the present invention and TF-IDF method in ADFA data set.
Fig. 5 is to be compared in ADFA data set using the detection time-consuming of the embodiment of the present invention and TF-IDF method.
Specific embodiment
It elaborates below to embodiments of the present invention.It is emphasized that following the description is only exemplary, The range and its application being not intended to be limiting of the invention.
Refering to fig. 1, in one embodiment, a kind of host sequence intrusion detection method, comprising the following steps:
S1, to training sequence and cycle tests, extract m characteristic commands;
S2, the characteristic commands construction feature command set by the training sequence and cycle tests extraction is used;
S3, distribution of the training sequence on characteristic commands collection characteristic dimension space is calculated;
S4, the m characteristic commands extracted to every cycle tests are mapped as one newly in the characteristic commands collection dimension Vector is formed in the distribution vector of the characteristic commands collection characteristic dimension spatially;
S5, the vector formed to every cycle tests, find k most like therewith characteristic commands sequence in the distribution The most type of frequency of occurrence in the corresponding type of the k characteristic commands sequence, is determined as the differentiation of the cycle tests by column Type.
In a preferred embodiment, this method is further comprising the steps of:
S6, judge whether m is greater than given threshold, if it is not, then the value of m is updated to m+1, repeat execution step S1-S5;
S7, classifying quality detection is carried out to the type identification result under different m values, with the m with optimal classification effect Differentiation type under value differentiates type as final.
In a preferred embodiment, in step S7, classification effect is carried out using TPR and FPR as the Testing index of classifying quality Fruit detection.
In a preferred embodiment, in step S1, include: to m characteristic commands of sequential extraction procedures
By it is Sequence Transformed be oriented authorized graph G=(V, E), wherein V indicate sequence in order point set, V={ V1, V2,…,Vi,…Vn| 1≤i≤n }, E is the side of cum rights w in figure, indicates the context relation of order;
Calculation command V according to the following formulaiScore:
Wherein, for ordering Vi、Vj, wijFor any two order V in sequenceiAnd VjBetween context weight, indicate Vi Subsequent order is VjNumber, In (Vi) expression subsequent commands be ViCommand set, Out (Vj) it is order VjSubsequent command set It closes, d is damped coefficient, and value range is (0,1), WS (Vj) it is order VjScore;
Wherein when calculating each order score, preset initial value is specified to all orders, and recursive calculation is repeatedly changed In generation, is until convergence;
After the WS value of orders all in the above method sequence of calculation, the maximum m life of WS value in all orders is extracted It enables.
There is certain semantic feature by m characteristic commands of this method to each sequential extraction procedures, can effectively extract and provide Representational order, prominent crucial attack go out.The characteristic commands of extraction have certain semantic feature, can effectively extract Crucial strike order, and do not have to be concerned about the case where key order occurs in training data, new attack can be successfully managed The case where type, sequence of attack are artificially changed.In this method, order score is independent of training sequence, compared to traditional STIDE algorithm can effectively promote FPR performance in the case where training samples number is few.It is dilute that this method can effectively be lifted at sample Detected representation in the case where dredging with imbalanced training sets.Compared to TF-IDF extracting method, this method can effectively lifting feature be ordered Enable extraction efficiency;Compared to STIDE method, this method can effectively promote rate of false alarm performance.
In a preferred embodiment, d takes 0.85.
In a preferred embodiment, in step S5, the k in the distribution closest vectors of the vector are calculated, by throwing Ticket principle takes category label mode, the differentiation type as the cycle tests.
In a preferred embodiment, step S5 includes:
Institute the distance between directed quantity and test vector in S51, the calculating distribution;
S52, it sorts according to apart from size sequence, such as ascending order arrangement;
S53, selection and test vector are apart from the smallest k vector;
S54, the frequency of occurrence for determining classification where the k vector;
S55, prediction classification of the highest classification of frequency of occurrence as the test vector is returned.
In one embodiment, in step S1, the initial value of m is more than or equal to 1.
The feature and advantage of the specific embodiment of the invention are further described below in conjunction with attached drawing.
In host sequence intrusion detection, the object host sequence of detection is that user passes through order line, the behaviour of routine call Make system bottom command sequence." sequence " indicates a series of order, and " order " only indicates wherein single command.One string sequence pair Answer a classification results, i.e. " normal " or "abnormal".
The method of the present invention can be divided into two big steps: the first step is that the characteristic commands of sequence are extracted, and second step is classification and Detection.
Step 1: characteristic commands are extracted: the characteristic commands that system call sequence is carried out on host sequence are extracted.
In this step, oriented authorized graph G=(V, E) is converted by host sequence, V indicates order, is converted into point set, E table The context relation for showing order is converted into the side in figure.Any two order ViAnd VjBetween context weight be wij, indicate ViSubsequent commands are VjNumber, the order V given for onei, In (Vi) indicate the command set for being directed toward the order, Out (Vi) To order ViThe command history of direction.
Then by such as following formula calculation command ViScore:
Wherein, d is damped coefficient, and value range is (0,1), represents a certain specific command and is directed toward any other order Probability usually takes 0.85, WS (Vj) it is order VjScore.
Characteristic commands extraction process to each sequence includes: according to the method described above, to calculate the WS of all orders of sequence Value takes the maximum m order of WS value.
When calculating each order score using this method, specify specific initial value to all orders, and recursive calculation until Convergence.Wherein, the score WS of order is complementary, so needing successive ignition until its convergence.
Above method embodies such thought: if showing this life after an order appears in many orders It enables more important;One order follows hard on the very high order of WS score, therefore mentions then the WS of this order obtains branch It is high.
It is called for example, recording a system with serializing, certain a string system call sequence is with following system call number To indicate:
6 6 63 6 42 120 6 195 120 6 6 114 114 1 1 2 5 2 2 5 2 2 5 2 1 1 1 1 1 1 1 1 2 5 2 1 1 1 1 1 2 5 2 1 1 1 1 2 5 2 2 5 2 2 5 2 2 5 2 2 5 2 1 1 2 5 2
This example be carried out with a string sequence characteristic commands extraction example (the string number is treated sequence, one Number indicate a kind of order).Sequence command is converted by above method, order and its context relation form digraph (referring to fig. 2, converting digraph effect for attack sequence).It recycles WS formula to calculate the score of each order, calculates one In a sequence after the score of all orders, m of highest scoring is selected to order, as the feature that this sequence is selected, thus Play the role of dimensionality reduction.The characteristic commands 1,252,6,120 when characteristic commands quantity m=4 are finally extracted, this four orders are selected It is this four order highest scorings to be calculated according to the formula of front, and do not have to consider appearance of this feature in training sequence Frequency.The effect chosen as seen from Figure 2, these orders are located at the key position in digraph, with the generation in structure Table.M can take other values, be herein only citing for 4.
Characteristic commands are extracted core code and are accomplished by
Input: sequence sequence to be extracted, characteristic commands quantity m
Output: the characteristic commands arranged according to WS value descending
Step 2: classification and Detection
1, the characteristic commands construction feature command set S extracted using training sequence and cycle tests by the first step;
2, distribution distri_train of the training sequence in characteristic commands collection S characteristic dimension is calculated;
3, the m characteristic commands extracted to each cycle tests are formed in characteristic commands collection S characteristic dimension spatially Distribution vector;
4, k closest vectors of the vector on distribution distri_train are calculated, take classification designator by Voting principle Mode is as its classification, using the result of classification as the judging result to the cycle tests.
Mode is exactly the maximum number of the frequency of occurrences.In step 4, the vectors for taking k nearest with target range is detected, then from The most result of ballot selection frequency of occurrence in the result label of k vector.This method passes through between measurement different characteristic value Distance is classified.Its thinking is: if the k in feature space, a sample most like (i.e. most adjacent in feature space Most of in sample closely) belong to some classification, then the sample also belongs to this classification.
Specific steps may include:
1, institute the distance between the directed quantity and test vector in distribution distri_train are calculated;
2, it sorts according to apart from increasing order;
3, it chooses and test vector is apart from the smallest k vector;
4, the frequency of occurrence of the classification before determining where k vector;
5, the k highest classification of vector frequency of occurrence is classified as the prediction of the test vector before returning.
In a specific embodiment, this method includes the following steps:
1, data processing and initialization.Obscure different classes of data, takes 90% to be used to train at random;Initialize m=1.
2, it to all training sequences and cycle tests, is utilized respectively characteristic commands extraction algorithm extract () and constructs m spy Sign order.(characteristic commands extraction algorithm includes all processes of the first step, comprising: converts, is formed oriented to sequence command Figure, the score of each order is calculated using WS formula, then takes the order of m highest scoring.)
3, with all characteristic commands construction feature command set S.
4, distribution distri_train of the training sequence on characteristic commands collection S is calculated.
5, the m characteristic commands extracted to each cycle tests, are mapped as a new vector V, shape in characteristic commands collection S At the distribution vector on characteristic commands collection S characteristic dimension space.
6, classified by KNN and determine sequence type: the vector V formed to each cycle tests is looked in distri_train To the characteristic commands sequence apart from nearest k, in the corresponding type of k sequence, most types is sentencing for the cycle tests Other type.
7, according to m current value, the TPR and FPR of classification results are assessed.
TPR and FPR is the Testing index of classifying quality.In two classification problems, mould can be more accurately measured with TPR, FPR Type classification results.TPR is real example rate, indicates the ratio for currently assigning to all positive samples shared by true positive sample in positive sample Example;FPR is false positive example rate, indicates currently to assign to all negative sample sums shared by true negative sample in positive sample classification by mistake Ratio.TPR, closer to 0, indicates that effect is better closer to 1, FPR.
8, work as m > preset upper limit (in all sequences, the minimum value of different command number), exit;Otherwise m repeats to walk from increasing 1 Rapid 2.
9, optimal TPR and FPR is showed according under different m values, determines final m value.
Experimental result
Experiment is carried out in ADFA-LD data set.ADFA data set is a set of master that Australian Defence Force Academy externally issues The data acquisition system of machine grade intruding detection system, is widely used in the test of intrusion detection class product.It has been incited somebody to action in data set Sorts of systems calling completes characterization, and is marked to attack type.
This method is verified in ADFA-LD data set, takes 90% as training sequence, residue 10% is cycle tests. Experiment shows the test using data same in ADFA-LD data set, it was demonstrated that validity, in terms of, this method than pass System method has apparent advantage.In this experiment, best features be can get when general m takes 5, k to take 1, TPR is up to 92%, FPR at this time About 1.9%.
In terms of validity:
(30%) TPR about 90%, FPR is about compared, this method can effectively promote recall rate, subtract with the effect of traditional STIDE Few rate of false alarm.Referring to Fig. 3, compared in ADFA data set using the testing result of this method and STIDE method.The results show that this Method recall rate is better than STIDE method, and rate of false alarm is far below tradition STIDE method.
In terms for the treatment of effeciency:
When m takes optimum value 5,3 times of improved efficiency of this method ratio TF-IDF method or more.Although the two performance is close, The feature quantity as needed for this method ratio TF-IDF is smaller, lower than the characteristic dimension of TF-IDF.Referring to fig. 4, as m=5, Two methods detected representation is consistent and is best, but this method takes characteristic commands 89 altogether, and TF-IDF takes characteristic commands 139 altogether It is a.Moreover, this method can save the overhead that rate calculates IDF, therefore have greatly in terms of overall time-consuming compared with TF-IDF method Width is promoted.Referring to Fig. 5, the results show that two methods performance is consistent and best, but this method time-consuming only needs TF- as m=5 The 30% of IDF time-consuming.
The above content is combine it is specific/further detailed description of the invention for preferred embodiment, cannot recognize Fixed specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, Without departing from the inventive concept of the premise, some replacements or modifications can also be made to the embodiment that these have been described, And these substitutions or variant all shall be regarded as belonging to protection scope of the present invention.

Claims (9)

1. a kind of host sequence intrusion detection method, which comprises the following steps:
S1, to training sequence and cycle tests, extract m characteristic commands;
S2, the characteristic commands construction feature command set by the training sequence and cycle tests extraction is used;
S3, distribution of the training sequence on characteristic commands collection characteristic dimension space is calculated;
S4, the m characteristic commands extracted to every cycle tests, are mapped as one in the characteristic commands collection characteristic dimension space A new vector is formed in the distribution vector of the characteristic commands collection characteristic dimension spatially;
S5, the vector formed to every cycle tests, find k most like therewith characteristic commands sequence in the distribution, The most type of frequency of occurrence in the corresponding type of the k characteristic commands sequence is determined as to the differentiation class of the cycle tests Type.
2. host sequence intrusion detection method as described in claim 1, which is characterized in that further comprising the steps of:
S6, judge whether m is greater than given threshold, if it is not, then the value of m is updated to m+1, repeats and execute step S1-S5;
S7, classifying quality detection is carried out to the type identification result under different m values, with the m value with optimal classification effect Under differentiation type differentiate type as final.
3. host sequence intrusion detection method as claimed in claim 2, which is characterized in that in step S7, made with TPR and FPR Classifying quality detection is carried out for the Testing index of classifying quality.
4. host sequence intrusion detection method as described in any one of claims 1 to 3, which is characterized in that in step S1, to sequence Column extract m characteristic commands
By it is Sequence Transformed be oriented authorized graph G=(V, E), wherein V indicate sequence in order point set, V={ V1, V2..., Vi... Vn| 1≤i≤n }, E is the side of cum rights w in figure, indicates the context relation of order;
Calculation command V according to the following formulaiScore:
Wherein, for ordering Vi、Vj, wijFor any two order V in sequenceiAnd VjBetween context weight, indicate ViIt is subsequent Order be VjNumber, In (Vi) expression subsequent commands be ViCommand set, Out (Vj) it is order VjSubsequent command history, d For damped coefficient, value range is (0,1), WS (Vj) it is order VjScore;
Wherein when calculating each order score, preset initial value is specified to all orders, and recursive calculation is straight through successive ignition To convergence;
After the WS value of orders all in the above method sequence of calculation, the maximum m order of WS value in all orders is extracted.
5. host sequence intrusion detection method as claimed in claim 4, which is characterized in that d takes 0.85.
6. such as host sequence intrusion detection method described in any one of claim 1 to 5, which is characterized in that in step S5, calculate The k in the distribution closest vectors of the vector, take category label mode by Voting principle, as sentencing for the cycle tests Other type.
7. host sequence intrusion detection method as claimed in claim 6, which is characterized in that step S5 includes:
Institute the distance between directed quantity and test vector in S51, the calculating distribution;
S52, it sorts according to apart from size sequence;
S53, selection and test vector are apart from the smallest k vector;
S54, the frequency of occurrence for determining classification where the k vector;
S55, prediction classification of the highest classification of frequency of occurrence as the test vector is returned.
8. host sequence intrusion detection method as described in any one of claim 1 to 7, which is characterized in that in step S1, m's Initial value is more than or equal to 1.
9. a kind of computer readable storage medium, it is characterised in that: be stored with computer program, the computer program can be located Reason device is executed to realize according to claim 1 to any one of 8 the methods.
CN201910596409.7A 2019-07-03 2019-07-03 Host sequence intrusion detection method Active CN110334508B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910596409.7A CN110334508B (en) 2019-07-03 2019-07-03 Host sequence intrusion detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910596409.7A CN110334508B (en) 2019-07-03 2019-07-03 Host sequence intrusion detection method

Publications (2)

Publication Number Publication Date
CN110334508A true CN110334508A (en) 2019-10-15
CN110334508B CN110334508B (en) 2021-01-05

Family

ID=68144025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910596409.7A Active CN110334508B (en) 2019-07-03 2019-07-03 Host sequence intrusion detection method

Country Status (1)

Country Link
CN (1) CN110334508B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143840A (en) * 2019-12-31 2020-05-12 上海观安信息技术股份有限公司 Method and system for identifying abnormity of host operation instruction
CN111563234A (en) * 2020-04-23 2020-08-21 华南理工大学 Feature extraction method of system call data in host anomaly detection
CN113225331A (en) * 2021-04-30 2021-08-06 中国科学技术大学 Method, system and device for detecting host intrusion safety based on graph neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976313A (en) * 2010-09-19 2011-02-16 四川大学 Frequent subgraph mining based abnormal intrusion detection method
CN102521534A (en) * 2011-12-03 2012-06-27 南京大学 Intrusion detection method based on crude entropy property reduction
US20130191477A1 (en) * 2012-01-20 2013-07-25 Electronics And Telecommunications Research Institute Mapping system, network, and method for adaptation of id/loc separation to datacenter for cloud computing
CN107392015A (en) * 2017-07-06 2017-11-24 长沙学院 A kind of intrusion detection method based on semi-supervised learning
CN109547496A (en) * 2019-01-16 2019-03-29 西安工业大学 A kind of host malicious behavioral value method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976313A (en) * 2010-09-19 2011-02-16 四川大学 Frequent subgraph mining based abnormal intrusion detection method
CN102521534A (en) * 2011-12-03 2012-06-27 南京大学 Intrusion detection method based on crude entropy property reduction
US20130191477A1 (en) * 2012-01-20 2013-07-25 Electronics And Telecommunications Research Institute Mapping system, network, and method for adaptation of id/loc separation to datacenter for cloud computing
CN107392015A (en) * 2017-07-06 2017-11-24 长沙学院 A kind of intrusion detection method based on semi-supervised learning
CN109547496A (en) * 2019-01-16 2019-03-29 西安工业大学 A kind of host malicious behavioral value method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林果园: "基于主机行为的异常检测技术研究", 《中国博士学位论文全文数据库信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143840A (en) * 2019-12-31 2020-05-12 上海观安信息技术股份有限公司 Method and system for identifying abnormity of host operation instruction
CN111143840B (en) * 2019-12-31 2022-01-25 上海观安信息技术股份有限公司 Method and system for identifying abnormity of host operation instruction
CN111563234A (en) * 2020-04-23 2020-08-21 华南理工大学 Feature extraction method of system call data in host anomaly detection
CN113225331A (en) * 2021-04-30 2021-08-06 中国科学技术大学 Method, system and device for detecting host intrusion safety based on graph neural network

Also Published As

Publication number Publication date
CN110334508B (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN110070141B (en) Network intrusion detection method
CN104598813B (en) Computer intrusion detection method based on integrated study and semi-supervised SVM
CN108718310A (en) Multi-level attack signatures generation based on deep learning and malicious act recognition methods
CN109918505B (en) Network security event visualization method based on text processing
CN109788079A (en) DGA domain name real-time detection method and device
CN110334508A (en) A kind of host sequence intrusion detection method
CN107579846B (en) Cloud computing fault data detection method and system
CN106202952A (en) A kind of Parkinson disease diagnostic method based on machine learning
CN111598179B (en) Power monitoring system user abnormal behavior analysis method, storage medium and equipment
CN109547423A (en) A kind of WEB malicious requests depth detection system and method based on machine learning
CN111382438B (en) Malware detection method based on multi-scale convolutional neural network
CN110191096A (en) A kind of term vector homepage invasion detection method based on semantic analysis
CN113376516A (en) Medium-voltage vacuum circuit breaker operation fault self-diagnosis and early-warning method based on deep learning
CN111400713B (en) Malicious software population classification method based on operation code adjacency graph characteristics
Chu et al. Co-training based on semi-supervised ensemble classification approach for multi-label data stream
CN111917788A (en) HMM model-based SQL injection attack detection method
Zhao et al. Fuzzy sentiment membership determining for sentiment classification
CN116192537B (en) APT attack report event extraction method, system and storage medium
Irfan et al. Energy theft identification using AdaBoost Ensembler in the Smart Grids
CN106991171A (en) Topic based on Intelligent campus information service platform finds method
CN116260565A (en) Chip electromagnetic side channel analysis method, system and storage medium
Chao et al. Research on network intrusion detection technology based on dcgan
CN113722230B (en) Integrated evaluation method and device for vulnerability mining capability of fuzzy test tool
CN115688101A (en) Deep learning-based file classification method and device
CN111860441B (en) Video target identification method based on unbiased depth migration learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant