CN101739337B - Method for analyzing characteristic of software vulnerability sequence based on cluster - Google Patents

Method for analyzing characteristic of software vulnerability sequence based on cluster Download PDF

Info

Publication number
CN101739337B
CN101739337B CN2009102418919A CN200910241891A CN101739337B CN 101739337 B CN101739337 B CN 101739337B CN 2009102418919 A CN2009102418919 A CN 2009102418919A CN 200910241891 A CN200910241891 A CN 200910241891A CN 101739337 B CN101739337 B CN 101739337B
Authority
CN
China
Prior art keywords
sequence
software
software vulnerability
vulnerability sequence
vulnerability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009102418919A
Other languages
Chinese (zh)
Other versions
CN101739337A (en
Inventor
胡昌振
任家东
王崑声
吴迪
李立宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN2009102418919A priority Critical patent/CN101739337B/en
Publication of CN101739337A publication Critical patent/CN101739337A/en
Application granted granted Critical
Publication of CN101739337B publication Critical patent/CN101739337B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention discloses a method for analyzing the characteristic of a software vulnerability sequence based on a cluster, which belongs to the technical field of information security. The method comprises the following steps: firstly, establishing a software vulnerability sequence database by using the conventional software vulnerability sequence; secondly, analyzing the software vulnerability sequence in the database by using clustering technology so as to generate a plurality of software vulnerability sequence clustering clusters; thirdly, finding the software vulnerability sequence clustering cluster which is most similar to a doubtful vulnerability sequence of software to be tested through similarity computation; fourthly, comparing the doubtful vulnerability sequence of the software to be tested with all software vulnerability sequences in the most similar software vulnerability sequence clustering cluster through a sequence comparison method, and further finding the software vulnerability sequence which is most similar to the doubtful vulnerability sequence of the software to be tested from the software vulnerability clustering cluster; and finally, outputting related vulnerability information, in the software vulnerability sequence database, corresponding to the most similar software vulnerability sequence as a security report. The method improves the analysis efficiency of the software vulnerability sequence.

Description

A kind of analytical approach of the characteristic of software vulnerability sequence based on cluster
Technical field
The present invention relates to a kind of analytical approach of the characteristic of software vulnerability sequence based on cluster, belong to field of information security technology.
Background technology
Along with the continuous rising of hacker attack event quantity, worm spreading unchecked on Internet, information security becomes the focus in people's eye gradually.A key problem in the information security is exactly the software security flaw that is present in the computer system, and the assailant of malice can utilize these security breaches to promote authority, visit unauthorized resource, even destroy sensitive data.The widespread usage of computer software brings People more and more many convenient, and influences daily life day by day, but has a large amount of mistakes and leak, under cover huge risk in the computer software.The basic solution route that stops system attack and invasion is before software defect is utilized, finds and analyze and study.
The operation of any system or software all can be supposed a security domain; This security domain is stipulated by security strategy; Any operation in this territory all is safe, controlled, in case exceed this territory or violated security strategy, the operation of system or software is exactly uncontrollable, unknown.Leak is the trigger point that is switched to non-security domain by security domain.Leak be static, passive, can trigger.Generally speaking; Article one, leak information comprises that Vulnerability Name is claimed, leak rank, the leak origin cause of formation, leak influence, leak description, affected system, unaffected system, leak solution, leak use pattern and leak utilize method etc., will consider leak source, leak date issued, leak reference information etc. in addition.
The analytical approach of existing software vulnerability sequence characteristic has: Petri nets analytic approach, FTA, rough set theory analytic approach, analysis of neural network method etc.But; Because the characteristic of software vulnerability sequence in the software vulnerability sequence database is quite huge; This makes to the signature analysis of the doubtful vulnerability sequence of software under testing very difficult; And traditional software vulnerability sequence characteristic analysis method does not carry out cluster analysis to the software vulnerability sequence in the software vulnerability sequence database, and is therefore, lower to the signature analysis efficient of the doubtful vulnerability sequence of software under testing.
The important prior art that the present invention uses has: K-means algorithm and Needleman-Wunsch algorithm.
The K-means algorithm is the clustering algorithm based on subregion of J.B.MacQueen in a kind of classics of proposition in 1967, and this algorithm utilizes the K partition strategy that data set is carried out subregion, and each subregion that obtains is represented a cluster.Produced profound influence in applications such as science and industry.
The performing step of K-means algorithm is following:
1. select the center sequence of K data sequence arbitrarily as the data sequence in the initial clustering bunch;
2. according to the center sequence of data sequence in the clustering cluster, the similarity of sequence among computing center's sequence and the sequence library D is given the most similar clustering cluster structure with each sequence allocation;
3. upgrade the center sequence of each clustering cluster, repeating step two no longer changes up to cluster result, obtains final clustering cluster structure.
The Needleman-Wunsch algorithm is Needleman and Wunsch find the solution two sequence overall comparison problems in the serial of proposition in 1970 a classic algorithm.This algorithm improves based on dynamic programming algorithm and on its basis, in the sequence length range, the similarity of sequence is compared, for global sequence's comparison provides feasible operation scheme.
The performing step of Needleman-Wunsch algorithm is following:
1. at first make up sub matrix;
According to build sub matrix, find the solution the optimal value of the prefix subsequence of sequence S and T;
3. find the solution the optimal value of more massive subsequence then, up to the optimal value of trying to achieve sequence S and T;
4. secondly utilize the optimum comparison of optimum comparison of each stage score matrix construction;
5. the final two sequences that obtains optimum comparison.
Summary of the invention
The objective of the invention is to propose a kind of analytical approach of the characteristic of software vulnerability sequence based on cluster.The present invention at first utilizes existing software vulnerability sequence, creates the software vulnerability sequence database; Use the software vulnerability sequence in the clustering technique analytical database then, produce a plurality of software vulnerability sequence clustering cluster; Calculate through similarity, find the software vulnerability sequence clustering cluster the most close with the doubtful vulnerability sequence of software under testing; And adopt the sequence alignment method that the doubtful vulnerability sequence of software under testing and all software vulnerability sequences in the most close software vulnerability sequence clustering cluster are compared, further find in this software vulnerability sequence clustering cluster the most similar software vulnerability sequence of doubtful vulnerability sequence with software under testing; Finally, this is the most similar software vulnerability sequence pairing relevant vulnerability information in the software vulnerability sequence database is exported as safety message.
The objective of the invention is to realize through following technical proposals.
At first provide the definition of related notion:
[definition 1] software vulnerability sequence: be meant an orderly procedure operation sequence that can cause leak to produce.
The doubtful vulnerability sequence of [definition 2] software under testing: the program that is meant is through after the static analysis, and doubtful is software vulnerability sequence, the sequence that needs to analyze its leak reason.
[definition 3] collection: be the set of forming the least member of software vulnerability sequence; The least member of software vulnerability sequence (being also referred to as item) is the symbol of formalization statement leak characteristic; Formalization representation is: L={l 1, l 2..., l m, wherein, L is a collection, l 1, l 2..., l mFor forming the least member of software vulnerability sequence.
[definition 4] sequence is right: concentrate from item and choose two items arbitrarily, and two items choosing put in order and sequencing that it is concentrated at item is consistent, then claim these two items to be combined as sequence right; Formalization representation is e k=l Ij(i<j).
The concrete operations step of the analytical approach of a kind of characteristic of software vulnerability sequence based on cluster of the present invention is following:
Step 1, utilize existing software vulnerability sequence data, create the software vulnerability sequence database.
Collect the software vulnerability sequence data that existing Hole Detection tool detection arrives, create the software vulnerability sequence database.
The software vulnerability sequence database representes with D, the running process when comprising fixed software vulnerability and producing.D is tuple < SID, S, INF; CID>set, wherein SID is the sequence number of this software vulnerability sequence, S is a software vulnerability sequence; INF is the relevant vulnerability information of this software vulnerability sequence; CID is the class-mark that gathers of this software vulnerability sequence, is the field of setting for software vulnerability sequence is carried out cluster analysis, to realize representing that with the different class-marks that gathers vulnerability sequence belongs to different classes.
Step 2, employing clustering technique are classified to the software vulnerability sequence in the software vulnerability sequence database.
Its concrete operations step is following:
The 1st step: the software vulnerability sequence in the software vulnerability sequence database of step 1 foundation carries out pre-service; Its concrete operations step is:
1. the item collection L that comprises according to the software vulnerability sequence database D obtains the right set E={e of sequence that comprises in the software vulnerability sequence database 1, e 2..., e k, e k=l il j(i<j);
2. for each software vulnerability sequence, describe with the right isometric vector of set E of sequence with one group, if comprise sequence in the software vulnerability sequence to e k, its corresponding vector value is 1; Otherwise its corresponding vector value is 0;
The 2nd step: the software vulnerability sequence that obtains after adopting clustering technique to the 1st step pre-service carries out cluster.Adopt the concrete operations of K-means algorithm following:
1. define software vulnerability sequence clustering cluster structure: each software vulnerability sequence clustering cluster structure C iRepresent by a three-dimensional array: C i={ X i, (count I1, count I2..., count In), S i; Wherein, i is a positive integer, and 1≤i≤K is the numbering of software vulnerability sequence clustering cluster, and K is the quantity of the clustering cluster of artificial definition; X i={ x I1, x I2..., x InBe the center sequence of software vulnerability sequence clustering cluster, and be a n-dimensional vector, n is a positive integer, value is a right quantity of concentrating of sequence; Count IkBe to comprise sequence to e in each software vulnerability sequence clustering cluster kThe number of sequence; S iBe the software vulnerability sequence that comprises in this software vulnerability sequence clustering cluster.
2. select the center sequence of K software vulnerability sequence arbitrarily as software vulnerability sequence in the initial software vulnerability sequence clustering cluster;
3. based on the center sequence of software vulnerability sequence in the software vulnerability sequence clustering cluster; The similarity of software vulnerability sequence in the center sequence of software for calculation vulnerability sequence and the software vulnerability sequence database classifies each software vulnerability sequence to the most similar software vulnerability sequence clustering cluster structure;
4. the acquisition methods of the center sequence of application software vulnerability sequence clustering cluster; Obtain new renewal center sequence, upgrade the center sequence of each software vulnerability sequence clustering cluster, repeat 3.; No longer change up to cluster result, obtain final software vulnerability sequence clustering cluster structure.
The acquisition methods of the center sequence of said software vulnerability sequence clustering cluster is: for each sequence to e kDefine a minimum support threshold value θ j(1≤j≤n), if software vulnerability sequence clustering cluster C iInterior sequence is to e kAverage support more than or equal to θ j, think that sequence is to e kBe software vulnerability sequence clustering cluster C iThe characteristic attribute of interior software vulnerability sequence, x IkGet 1; Otherwise, think attribute e kBe software vulnerability sequence clustering cluster C iThe non-characteristic attribute of interior software vulnerability sequence, x IkGet 0; Shown in formula 1.
x ij = 1 , count ij | C i | &GreaterEqual; &theta; j 0 , count ij | C i | < &theta; j - - - ( 1 )
Wherein, | C i| expression C iThe number of middle sequence.
The similarity of software vulnerability sequence is calculated through formula 2 in the center sequence of software vulnerability sequence 3. and the software vulnerability sequence database:
Sim ( p 1 , p 2 ) = | p 1 &CenterDot; s &cap; p 2 &CenterDot; s | | p 1 &CenterDot; s &cup; p 2 &CenterDot; s | - - - ( 2 )
Wherein, p 1.s represent software vulnerability sequence p 1The set that the sequence of being supported is right; p 2.s represent software vulnerability sequence p 2The set that the sequence of being supported is right.
Step 3, from the source program of software under testing, extract the doubtful vulnerability sequence of software under testing;
Step 4, the searching software vulnerability sequence clustering cluster the most similar with the doubtful vulnerability sequence of software under testing
The doubtful vulnerability sequence of software under testing and the center sequence of each software vulnerability sequence clustering cluster are calculated similarity, find the software vulnerability sequence clustering cluster C the most similar with it j(1≤j≤K); If this doubtful vulnerability sequence does not belong to any one software vulnerability sequence clustering cluster, the running process that the doubtful vulnerability sequence sequence that this software under testing is described is a safety, end operation; If this doubtful vulnerability sequence belongs to software vulnerability sequence clustering cluster C j, forward step 5 to.
The similarity calculating method of software vulnerability sequence is identical in the computing method of the doubtful vulnerability sequence of said software under testing and the center sequence similarity degree of each software vulnerability sequence clustering cluster and software vulnerability sequence center sequence and the software vulnerability sequence database.
Step 5, application sequence comparison technological doubtful vulnerability sequence and software vulnerability sequence clustering cluster C with software under testing jIn each software vulnerability sequence carry out sequence alignment, obtain the software vulnerability sequence S the most similar with it j
Step 6, with software vulnerability sequence S the most similar in the software vulnerability sequence database with the doubtful vulnerability sequence of software under testing jPairing leak information INF exports as safety message.
Beneficial effect
The inventive method is through carrying out cluster to the software vulnerability sequence in the software vulnerability sequence database; The doubtful vulnerability sequence of software under testing only with the software vulnerability sequence clustering cluster that obtains in K center sequence carry out measuring similarity; Obtain a software vulnerability sequence clustering cluster the most similar with it; The characteristic range of the doubtful vulnerability sequence of locking software under testing; The characteristic of directly getting rid of the software vulnerability sequence in other (K-1) individual software vulnerability sequence clustering cluster; The doubtful vulnerability sequence of further employing sequence alignment method judgement software under testing is the most similar with which the bar software vulnerability sequence in the most similar software vulnerability sequence clustering cluster, finally confirms the leak information of the doubtful vulnerability sequence of this software under testing.The present invention has improved the analysis efficiency of software vulnerability sequence, and for the operation of extensive vulnerability sequence data set, advantage of the present invention can be more outstanding.
Description of drawings
Fig. 1 is the schematic flow sheet of analytical approach of a kind of characteristic of software vulnerability sequence based on cluster of the specific embodiment of the invention;
Fig. 2 is S and the S in the specific embodiment of the invention 3Needleman-Wunsch algorithm computation process synoptic diagram;
Fig. 3 is S and the S in the specific embodiment of the invention 5Needleman-Wunsch algorithm computation process synoptic diagram.
Embodiment
Below in conjunction with embodiment technical scheme of the present invention is described in detail.
The schematic flow sheet of the analytical approach of a kind of characteristic of software vulnerability sequence based on cluster of the present invention is as shown in Figure 1, and the concrete operations step is following:
Step 1, utilize existing software vulnerability sequence data, create the software vulnerability sequence database, as shown in table 1.
Table 1 software vulnerability sequence database
SID S i INF CID
1 S 1=ACBAA Timing error C 1
2 S 2=ABA The installation parameter mistake C 2
3 S 3?BCACCB The boundary condition mistake C 3
4 S 4=CBCBCC The visit check errors C 3
5 S 5=CABCBCCB The source check errors C 3
6 S 6=BACBB The condition check errors C 3
7 S 7=ABAAC The access strategy mistake C 2
8 S 8=BCBA The input validation mistake C 3
9 S 9=ACABAA Fortuitous event is disposed mistake C 3
10 S 10=BBCBA The intermodule interaction error C 1
Step 2, employing clustering technique are classified to the software vulnerability sequence in the software vulnerability sequence database.
The 1st step: the software vulnerability sequence in the software vulnerability sequence database is carried out pre-service;
Behind the pretreatment software vulnerability sequence, 10 software vulnerability sequences storing in the software vulnerability sequence database are turned to 9 dimensional vectors by form, and are as shown in table 2.
Table 2 pretreatment software vulnerability sequence
SID S i The attribute pair set 9 dimensional vectors are represented
1 ACBAA {AC,AB,AA,CB,CA,BA} {1,1,1,1,0,0,1,1,0}
2 ABA {AB,AA,BA} {1,1,0,1,0,0,0,0,0}
3 BCACCB {BC,BA,BB,CA,CC,CB,AC,AB} {0,1,1,1,1,1,1,1,1}
4 CBCBCC {CB,CC,BC,BB} {0,0,0,0,1,1,0,1,1}
5 CABCBCCB {CA,CB,CC,AB,AC,BB,BC} {0,1,1,0,1,1,1,1,1}
6 BACBB {BA,BB,BC,AC,AB,CB} {0,1,1,1,1,1,0,1,0}
7 ABAAC {AB,AA,AC,BA,BC} {1,1,1,1,0,1,0,0,0}
8 BCBA {BA,BC,BB,CB,CA} {0,0,0,1,1,1,1,1,0}
9 ACABAA {AC,AA,AB,CA,CB,BA} {1,1,1,1,0,0,1,1,0}
10 BBCBA {BB,BC,BA,CB,CA} {0,0,0,1,1,1,1,1,0}
The 2nd step: adopt clustering technique that the 1st step was carried out cluster through the software vulnerability sequence that obtains after the pre-service.
Adopt the K-means algorithm that software vulnerability sequence is carried out cluster, K=3 is set, preceding 3 the software vulnerability sequence sequences of supposition are 3 software vulnerability sequence clustering cluster C earlier 1, C 2, C 3The center sequence.Calculate 7 software vulnerability sequences and 3 software vulnerability sequence clustering cluster C of back 1, C 2, C 3The similarity of center sequence, the center sequence of update software vulnerability sequence clustering cluster, similarity threshold is set at 0.5.
Software vulnerability sequence in the software vulnerability sequence database is divided into C 1, C 2, C 3Totally 3 software vulnerability sequence clustering cluster are assigned to software vulnerability sequence clustering cluster C 1In software vulnerability sequence be S 4, S 9Be assigned to software vulnerability sequence clustering cluster C 2In software vulnerability sequence be S 2, S 7Be assigned to software vulnerability sequence clustering cluster C 3In software vulnerability sequence be S 1, S 3, S 5, S 6, S 8, S 10, and C 1, C 2, C 3The center sequence be respectively S 4, S 2, S 3
Step 3, from the source program of software under testing, extract the doubtful vulnerability sequence of software under testing;
Step 4, to the doubtful vulnerability sequence S of software under testing iCalculate similarity with the center sequence of each software vulnerability sequence clustering cluster.
The doubtful vulnerability sequence S=ABCBABB of software under testing.
Software vulnerability sequence S 4Vector representation be p 4={ 0,1,1,0,1,1,1,1,1} according to the definition of sequence similarity degree, calculates S respectively 4With software vulnerability sequence S 1, S 2, S 3Similarity, can get Sim (p 1, p 4)=1/9, Sim (p 2, p 4)=0/7, Sim (p 3, p 4)=4/8, and maximum Sim (p among the three 3, p 4)=4/8>=0.5, therefore, with S 4Be assigned to S 3Software vulnerability sequence clustering cluster C for the center 3In.
Software vulnerability sequence S 5Vector representation be p 5={ 0,0,0,0,1,1,0,1,1} calculates S respectively 5With software vulnerability sequence S 1, S 2, S 3Similarity, can get Sim (p 1, p 5)=4/9, Sim (p 2, p 5)=1/9, Sim (p 3, p 5)=7/8, and maximum Sim (p among the three 3, p 5)=7/8>0.5, therefore, with S 5Be assigned to S 3Software vulnerability sequence clustering cluster C for the center 3In.
Software vulnerability sequence S 6Vector representation be p 6={ 0,1,1,1,1,1,0,1,0} calculates S respectively 6With software vulnerability sequence S 1, S 2, S 3Similarity, can get Sim (p 1, p 6)=4/8, Sim (p 2, p 6)=2/7, Sim (p 3, p 6)=6/8, and maximum Sim (p among the three 3, p 6)=6/8>0.5, therefore, with S 6Be assigned to S 3Software vulnerability sequence clustering cluster C for the center 3In.
In like manner can get Sim (p 2, p 7)=3/5>0.5, S 7Be assigned to S 2Software vulnerability sequence clustering cluster C for the center sequence 2In; Sim (p 3, p 8)=5/8>0.5, S 8Be assigned to S 3Software vulnerability sequence clustering cluster C for the center sequence 3In; Sim (p 2, p 7)=1>0.5, S 9Be assigned to S 1Software vulnerability sequence clustering cluster C for the center sequence 1In; Sim (p 3, p 10)=5/8>0.5, S 10Be assigned to S 3Software vulnerability sequence clustering cluster C for the center sequence 3In.
Therefore, after the calculating of the similarity between first round software vulnerability sequence, be assigned to software vulnerability sequence clustering cluster C 1In software vulnerability sequence S is arranged 1, S 9Be assigned to software vulnerability sequence clustering cluster C 2In software vulnerability sequence S is arranged 2, S 7Be assigned to software vulnerability sequence clustering cluster C 3In software vulnerability sequence S is arranged 3, S 4, S 5, S 6, S 8, S 10, carry out second below and take turns calculating.
According to K-means algorithm thought, guarantee that similarity is high in K the software vulnerability sequence clustering cluster, and similarity is low between the software vulnerability sequence clustering cluster, further upgrades the center sequence of K software vulnerability sequence clustering cluster.Because Sim is (p 1, p 3)=5/8>0.5, and Sim (p 3, p 4)=4/8<5/8, therefore, software vulnerability sequence clustering cluster C 1The center sequence be updated to S 4, S 1Be assigned to software vulnerability sequence clustering cluster C 3In, and software vulnerability sequence clustering cluster C 3The center sequence also need further choose.
At software vulnerability sequence S 3, S 4, S 5, S 6, S 8, S 10In, through calculating similarity in twos, regain software vulnerability sequence clustering cluster C 3The center sequence, final, obtain software vulnerability sequence S 3Still be the center sequence after upgrading.
Therefore, after employing K-means algorithm carries out clustering processing to the sequence in the vulnerability sequence storehouse, can know that 10 software vulnerability sequences are divided into 3 software vulnerability sequence clustering cluster, software vulnerability sequence S 4, S 9Be assigned to software vulnerability sequence clustering cluster C 1In; Software vulnerability sequence S 2, S 7Be assigned to software vulnerability sequence clustering cluster C 2In; Software vulnerability sequence S 1, S 3, S 5, S 6, S 8, S 10Be assigned to software vulnerability sequence clustering cluster C 3In.And 3 software vulnerability sequence clustering cluster C 1, C 2, C 3The center sequence be respectively S 4, S 2, S 3
Suppose the doubtful vulnerability sequence S=ABCBABB of software under testing, below with the doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of software under testing 1, C 2, C 3Center sequence S 4, S 2, S 3Carry out similarity respectively and calculate, judge that the doubtful vulnerability sequence S of software under testing is the most similar with which software vulnerability sequence clustering cluster.
Known software vulnerability sequence S 4=CBCBCC, S 2=ABA, S 3=BCACCB, p 4=0,0,0,0,1,1,0,1,1}, p 2=1,1,0,1,0,0,0,0,0}, p 3={ 0,1,1,1,1,1,1,1,1} at first, converts the doubtful vulnerability sequence S=ABCBABB of software under testing into 9 dimensional vectors and representes that the sequence pair set is { AA, AB, AC, BB, BA, BC, CA, CB}, p={1,1,1,1,1,1,1,1,0}.
Calculate Sim (p, p through similarity 4)=3/9, Sim (p, p 2)=3/8, Sim (p, p 3)=7/9, and maximum Sim (p, p among the three 3)=7/9>0.5 shows the doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of software under testing 3The most similar.
Step 5, use doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of Needleman-Wunsch algorithm with software under testing jIn each software vulnerability sequence carry out sequence alignment, obtain the software vulnerability sequence S the most similar with it j
The performing step of Needleman-Wunsch algorithm is following:
The 1st step: at first make up (m+1) * (n+1) and get sub matrix;
The 2nd the step: according to build sub matrix, find the solution the doubtful vulnerability sequence S of software under testing and the software vulnerability sequence clustering cluster C the most similar with it jIn software vulnerability sequence S jThe optimal value of prefix subsequence;
The 3rd step: find the solution the optimal value of more extensive subsequence then, up to trying to achieve sequence S and S jOptimal value;
The 4th step: next utilizes the optimum comparison of optimum comparison of each stage score matrix construction;
The 5th step: the software vulnerability sequence of the optimum comparison of doubtful vulnerability sequence S of final acquisition and software under testing.
Adopt doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of Needleman-Wunsch algorithm with software under testing 3In software vulnerability sequence S 1, S 3, S 5, S 6, S 8, S 10Carry out sequence alignment, obtain the software vulnerability sequence the most similar with it.S wherein 1=ACBAA, S 3=BCACCB, S 5=CABCBCCB, S 6=BACBB, S 8=BCBA, S 10=BBCBA.S and S 3Needleman-Wunsch algorithm computation process as shown in Figure 2; S and S 5Needleman-Wunsch algorithm computation process as shown in Figure 3.
Doubtful vulnerability sequence S and software vulnerability sequence S according to top software under testing 3, S 5Needleman-Wunsch algorithm computation process synoptic diagram, to the doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of software under testing 3In software vulnerability sequence S 3, S 1, S 5, S 6, S 8, S 10Mate, represent to insert or delete corresponding character through insertion " _ " and obtain the maximum comparability arrangement between the software vulnerability sequence.
S 1With the optimum comparison of S be:
S 1:A_CBA_?_
S:ABCBABB
S 3With the optimum comparison of S be:
S 3:_BC_?_B_
S:ABCBABB
S and S 5Optimum comparison be:
S:_ABCB_?_B
S 5:CABCBCCB
S 6With the optimum comparison of S be:
S 6:_B_?_?_BB
S:ABCBABB
S 8With the optimum comparison of S be:
S 8:_BCBA_?_
S:ABCBABB
S 10With the optimum comparison of S be:
S 10:_B_B_B_
S:ABCBABB
Therefore, the doubtful vulnerability sequence S of software under testing and software vulnerability sequence clustering cluster C 3In software vulnerability sequence S 5The most similar.
Step 6, with software vulnerability sequence S the most similar in the software vulnerability sequence database with the doubtful vulnerability sequence of software under testing jPairing leak information INF exports as safety message.
Because the doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of software under testing 3In software vulnerability sequence S 5The most similar, further from table 1 software vulnerability sequence database, extract software vulnerability sequence S 5Therefore pairing relevant vulnerability information INF, can judge that the leak of the doubtful vulnerability sequence S of software under testing is characterized as the source check errors, and export with the safety message form for the source check errors.

Claims (4)

1. the method for distilling of an operation sequence of software vulnerability characteristics, it is characterized in that: the concrete operations step is following:
Step 1, utilize existing software vulnerability sequence data, create the software vulnerability sequence database; The software vulnerability sequence database comprises the relevant vulnerability information of software vulnerability sequence;
Step 2, employing clustering technique are classified to the software vulnerability sequence in the software vulnerability sequence database; The concrete operations step is following:
The 1st step: the software vulnerability sequence in the software vulnerability sequence database of step 1 foundation carries out pre-service; Its concrete operations step is:
1. the item collection L that comprises according to the software vulnerability sequence database D obtains the right set E={e of sequence that comprises in the software vulnerability sequence database 1, e 2..., e k, e k=l il j, i<j wherein;
The item collection is the set of forming the least member of software vulnerability sequence; The least member of software vulnerability sequence is the symbol of formalization statement leak characteristic; Formalization representation is: L={l 1, l 2..., l m, wherein, L is a collection, l 1, l 2..., l mFor forming the least member of software vulnerability sequence;
Concentrate from item and to choose two items arbitrarily, and two items choosing put in order and sequencing that it is concentrated at item is consistent, then claim these two items to be combined as sequence right;
2. for each software vulnerability sequence, describe with the right isometric vector of set E of sequence with one group, if comprise sequence in the software vulnerability sequence to e k, its corresponding vector value is 1; Otherwise its corresponding vector value is 0;
The 2nd step: the software vulnerability sequence that obtains after adopting clustering technique to the 1st step pre-service carries out cluster;
Step 3, from the source program of software under testing, extract the doubtful vulnerability sequence of software under testing;
Step 4, the searching software vulnerability sequence clustering cluster the most similar: the doubtful vulnerability sequence of software under testing and the center sequence of each software vulnerability sequence clustering cluster are calculated similarity, find the software vulnerability sequence clustering cluster C the most similar with it with the doubtful vulnerability sequence of software under testing j, 1≤j≤K; If this doubtful vulnerability sequence does not belong to any one software vulnerability sequence clustering cluster, the running process that the doubtful vulnerability sequence sequence that this software under testing is described is a safety, end operation; If this doubtful vulnerability sequence belongs to software vulnerability sequence clustering cluster C j, forward step 5 to;
Step 5, application sequence comparison technological doubtful vulnerability sequence and software vulnerability sequence clustering cluster C with software under testing jIn each software vulnerability sequence carry out sequence alignment, obtain the software vulnerability sequence S the most similar with it j
Step 6, with software vulnerability sequence S the most similar in the software vulnerability sequence database with the doubtful vulnerability sequence of software under testing jPairing leak information INF exports as safety message.
2. the method for distilling of a kind of operation sequence of software vulnerability characteristics as claimed in claim 1 is characterized in that: the concrete operations step that the software vulnerability sequence that the 2nd step of step 2 obtains after adopting the K-means algorithm to the 1st step pre-service carries out cluster is following:
1. define software vulnerability sequence clustering cluster structure: each software vulnerability sequence clustering cluster structure C iRepresent by a three-dimensional array: C i={ X i, (count I1, count I2..., count In), S i; Wherein, i is a positive integer, and 1≤i≤K is the numbering of software vulnerability sequence clustering cluster, and K is the quantity of the clustering cluster of artificial definition; X i={ x I1, x I2..., x InBe the center sequence of software vulnerability sequence clustering cluster, and be a n-dimensional vector, n is a positive integer, value is a right quantity of concentrating of sequence; Count IkBe to comprise sequence to e in each software vulnerability sequence clustering cluster kThe number of sequence; S iBe the software vulnerability sequence that comprises in this software vulnerability sequence clustering cluster;
2. select the center sequence of K software vulnerability sequence arbitrarily as software vulnerability sequence in the initial software vulnerability sequence clustering cluster;
3. based on the center sequence of software vulnerability sequence in the software vulnerability sequence clustering cluster; The similarity of software vulnerability sequence in the center sequence of software for calculation vulnerability sequence and the software vulnerability sequence database classifies each software vulnerability sequence to the most similar software vulnerability sequence clustering cluster structure;
4. the acquisition methods of the center sequence of application software vulnerability sequence clustering cluster; Obtain new renewal center sequence, upgrade the center sequence of each software vulnerability sequence clustering cluster, repeat 3.; No longer change up to cluster result, obtain final software vulnerability sequence clustering cluster structure;
The acquisition methods of the center sequence of said software vulnerability sequence clustering cluster is: for each sequence to e kDefine a minimum support threshold value θ j, if 1≤j≤n is software vulnerability sequence clustering cluster C iInterior sequence is to e kAverage support more than or equal to θ j, think that sequence is to e kBe software vulnerability sequence clustering cluster C iThe characteristic attribute of interior software vulnerability sequence, x IkGet 1; Otherwise, think attribute e kBe software vulnerability sequence clustering cluster C iThe non-characteristic attribute of interior software vulnerability sequence, x IkGet 0;
Shown in formula 1;
x ij = 1 , count ij | C i | &GreaterEqual; &theta; j 0 , count ij | C i | < &theta; j - - - ( 1 )
Wherein, | C i| expression C iThe number of middle sequence.
3. the method for distilling of a kind of operation sequence of software vulnerability characteristics as claimed in claim 2 is characterized in that: step 2 in the 2nd step 3. in the center sequence of said software vulnerability sequence and the software vulnerability sequence database center sequence similarity degree of the doubtful vulnerability sequence of the similarity of software vulnerability sequence and the software under testing described in the step 4 and each software vulnerability sequence clustering cluster pass through formula 2 acquisitions:
Sim ( p 1 , p 2 ) = | p 1 . s &cap; p 2 . s | | p 1 . s &cup; p 2 . s | - - - ( 2 )
Wherein, p 1.s represent the right set of sequence that software vulnerability sequence p1 is supported; p 2.s represent software vulnerability sequence p 2The set that the sequence of being supported is right.
4. according to claim 1 or claim 2 a kind of method for distilling of operation sequence of software vulnerability characteristics, it is characterized in that: the sequence alignment technology described in the step 5 is the Needleman-Wunsch algorithm; Implementation step is following:
The 1st step: make up (m+1) * (n+1) and get sub matrix;
The 2nd the step: according to build sub matrix, find the solution the doubtful vulnerability sequence S of software under testing and the software vulnerability sequence clustering cluster C the most similar with it jIn software vulnerability sequence S JmThe optimal value of prefix subsequence;
The 3rd step: find the solution the optimal value of more extensive subsequence then, up to trying to achieve sequence S and S JmOptimal value;
The 4th step: next utilizes the optimum comparison of optimum comparison of each stage score matrix construction;
The 5th step: the software vulnerability sequence of the optimum comparison of doubtful vulnerability sequence S of final acquisition and software under testing.
CN2009102418919A 2009-12-14 2009-12-14 Method for analyzing characteristic of software vulnerability sequence based on cluster Expired - Fee Related CN101739337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102418919A CN101739337B (en) 2009-12-14 2009-12-14 Method for analyzing characteristic of software vulnerability sequence based on cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102418919A CN101739337B (en) 2009-12-14 2009-12-14 Method for analyzing characteristic of software vulnerability sequence based on cluster

Publications (2)

Publication Number Publication Date
CN101739337A CN101739337A (en) 2010-06-16
CN101739337B true CN101739337B (en) 2012-06-20

Family

ID=42462851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102418919A Expired - Fee Related CN101739337B (en) 2009-12-14 2009-12-14 Method for analyzing characteristic of software vulnerability sequence based on cluster

Country Status (1)

Country Link
CN (1) CN101739337B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996292B (en) * 2010-12-10 2012-05-23 北京理工大学 Method for analyzing safety property of software based on sequence clustering
CN102446254B (en) * 2011-12-30 2014-08-06 中国信息安全测评中心 Similar loophole inquiry method based on text mining
CN104778403A (en) * 2014-01-15 2015-07-15 江苏南大苏富特科技股份有限公司 Sequence pattern software security vulnerability-based detection method
CN105279075A (en) * 2014-06-10 2016-01-27 江苏博智软件科技有限公司 Detection method based on sequence mode software safety hole
US10289626B2 (en) 2014-12-30 2019-05-14 Teradata Us, Inc. Distributed sequential pattern mining (SPM) using static task distribution strategy
CN105718801A (en) * 2016-01-26 2016-06-29 国家信息技术安全研究中心 Loophole clustering method based on programming mode and mode matching
CN107229563B (en) * 2016-03-25 2020-07-10 中国科学院信息工程研究所 Cross-architecture binary program vulnerability function association method
CN105956417A (en) * 2016-05-04 2016-09-21 西安电子科技大学 Similar base sequence query method based on editing distance in cloud environment
RU2634173C1 (en) * 2016-06-24 2017-10-24 Акционерное общество "Лаборатория Касперского" System and detecting method of remote administration application
CN107679403B (en) * 2017-10-11 2021-02-12 北京理工大学 Lesso software variety detection method based on sequence comparison algorithm
CN114785574B (en) * 2022-04-07 2023-09-29 国网浙江省电力有限公司宁波供电公司 AI-assisted remote vulnerability accurate verification method
CN115422556B (en) * 2022-11-07 2023-03-24 中国科学技术大学 Vulnerability exploitation probability prediction method, system, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080178261A1 (en) * 2007-01-19 2008-07-24 Hiroshi Yao Information processing apparatus
CN101241532A (en) * 2008-02-15 2008-08-13 北京邮电大学 Source code facing buffer overflow detection method based on inequalities solving

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080178261A1 (en) * 2007-01-19 2008-07-24 Hiroshi Yao Information processing apparatus
CN101241532A (en) * 2008-02-15 2008-08-13 北京邮电大学 Source code facing buffer overflow detection method based on inequalities solving

Also Published As

Publication number Publication date
CN101739337A (en) 2010-06-16

Similar Documents

Publication Publication Date Title
CN101739337B (en) Method for analyzing characteristic of software vulnerability sequence based on cluster
McGovern et al. Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction
Mandhare et al. A comparative study of cluster based outlier detection, distance based outlier detection and density based outlier detection techniques
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
Arbin et al. Comparative analysis between k-means and k-medoids for statistical clustering
CN108805142A (en) A kind of crime high-risk personnel analysis method and system
CN111047173B (en) Community credibility evaluation method based on improved D-S evidence theory
CN105389486A (en) Authentication method based on mouse behavior
CN111984702A (en) Method, device, equipment and storage medium for analyzing spatial evolution of village and town settlement
CN104966106A (en) Biological age step-by-step predication method based on support vector machine
CN105046323A (en) Regularization-based RBF network multi-label classification method
CN106570537A (en) Random forest model selection method based on confusion matrix
Qin et al. Evaluation of goaf stability based on transfer learning theory of artificial intelligence
CN108931815A (en) A kind of hierarchical identification method of lithology
CN110287237A (en) One kind analyzing efficient corporations&#39; data digging method based on social network structure
Maervoet et al. Outlier detection in relational data: A case study in geographical information systems
Zhao et al. Anyscan: An efficient anytime framework with active learning for large-scale network clustering
CN111008673A (en) Method for collecting and extracting malignant data chain in power distribution network information physical system
Kopp et al. Interpreting and clustering outliers with sapling random forests
Amiri et al. Illegal miner detection based on pattern mining: A practical approach
Gao et al. Estimation of Distribution Algorithms for Knapsack Problem.
Gou et al. Effective and Efficient Community Search with Graph Embeddings
CN103886007A (en) Mutual constraint based fuzzy data classification method
CN112270336A (en) GA-BP working condition identification method and system
Zhao et al. Study on evaluation model of rate grade of soil corrosion for pipeline

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120620

Termination date: 20141214

EXPY Termination of patent right or utility model