CN101739337B

CN101739337B - Method for analyzing characteristic of software vulnerability sequence based on cluster

Info

Publication number: CN101739337B
Application number: CN2009102418919A
Authority: CN
Inventors: 胡昌振; 任家东; 王崑声; 吴迪; 李立宁
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2009-12-14
Filing date: 2009-12-14
Publication date: 2012-06-20
Anticipated expiration: 2029-12-14
Also published as: CN101739337A

Abstract

The invention discloses a method for analyzing the characteristic of a software vulnerability sequence based on a cluster, which belongs to the technical field of information security. The method comprises the following steps: firstly, establishing a software vulnerability sequence database by using the conventional software vulnerability sequence; secondly, analyzing the software vulnerability sequence in the database by using clustering technology so as to generate a plurality of software vulnerability sequence clustering clusters; thirdly, finding the software vulnerability sequence clustering cluster which is most similar to a doubtful vulnerability sequence of software to be tested through similarity computation; fourthly, comparing the doubtful vulnerability sequence of the software to be tested with all software vulnerability sequences in the most similar software vulnerability sequence clustering cluster through a sequence comparison method, and further finding the software vulnerability sequence which is most similar to the doubtful vulnerability sequence of the software to be tested from the software vulnerability clustering cluster; and finally, outputting related vulnerability information, in the software vulnerability sequence database, corresponding to the most similar software vulnerability sequence as a security report. The method improves the analysis efficiency of the software vulnerability sequence.

Description

A kind of analytical approach of the characteristic of software vulnerability sequence based on cluster

Technical field

The present invention relates to a kind of analytical approach of the characteristic of software vulnerability sequence based on cluster, belong to field of information security technology.

Background technology

Along with the continuous rising of hacker attack event quantity, worm spreading unchecked on Internet, information security becomes the focus in people's eye gradually.A key problem in the information security is exactly the software security flaw that is present in the computer system, and the assailant of malice can utilize these security breaches to promote authority, visit unauthorized resource, even destroy sensitive data.The widespread usage of computer software brings People more and more many convenient, and influences daily life day by day, but has a large amount of mistakes and leak, under cover huge risk in the computer software.The basic solution route that stops system attack and invasion is before software defect is utilized, finds and analyze and study.

The operation of any system or software all can be supposed a security domain; This security domain is stipulated by security strategy; Any operation in this territory all is safe, controlled, in case exceed this territory or violated security strategy, the operation of system or software is exactly uncontrollable, unknown.Leak is the trigger point that is switched to non-security domain by security domain.Leak be static, passive, can trigger.Generally speaking; Article one, leak information comprises that Vulnerability Name is claimed, leak rank, the leak origin cause of formation, leak influence, leak description, affected system, unaffected system, leak solution, leak use pattern and leak utilize method etc., will consider leak source, leak date issued, leak reference information etc. in addition.

The analytical approach of existing software vulnerability sequence characteristic has: Petri nets analytic approach, FTA, rough set theory analytic approach, analysis of neural network method etc.But; Because the characteristic of software vulnerability sequence in the software vulnerability sequence database is quite huge; This makes to the signature analysis of the doubtful vulnerability sequence of software under testing very difficult; And traditional software vulnerability sequence characteristic analysis method does not carry out cluster analysis to the software vulnerability sequence in the software vulnerability sequence database, and is therefore, lower to the signature analysis efficient of the doubtful vulnerability sequence of software under testing.

The important prior art that the present invention uses has: K-means algorithm and Needleman-Wunsch algorithm.

The K-means algorithm is the clustering algorithm based on subregion of J.B.MacQueen in a kind of classics of proposition in 1967, and this algorithm utilizes the K partition strategy that data set is carried out subregion, and each subregion that obtains is represented a cluster.Produced profound influence in applications such as science and industry.

The performing step of K-means algorithm is following:

1. select the center sequence of K data sequence arbitrarily as the data sequence in the initial clustering bunch;

2. according to the center sequence of data sequence in the clustering cluster, the similarity of sequence among computing center's sequence and the sequence library D is given the most similar clustering cluster structure with each sequence allocation;

3. upgrade the center sequence of each clustering cluster, repeating step two no longer changes up to cluster result, obtains final clustering cluster structure.

The Needleman-Wunsch algorithm is Needleman and Wunsch find the solution two sequence overall comparison problems in the serial of proposition in 1970 a classic algorithm.This algorithm improves based on dynamic programming algorithm and on its basis, in the sequence length range, the similarity of sequence is compared, for global sequence's comparison provides feasible operation scheme.

The performing step of Needleman-Wunsch algorithm is following:

1. at first make up sub matrix;

According to build sub matrix, find the solution the optimal value of the prefix subsequence of sequence S and T;

3. find the solution the optimal value of more massive subsequence then, up to the optimal value of trying to achieve sequence S and T;

4. secondly utilize the optimum comparison of optimum comparison of each stage score matrix construction;

5. the final two sequences that obtains optimum comparison.

Summary of the invention

The objective of the invention is to propose a kind of analytical approach of the characteristic of software vulnerability sequence based on cluster.The present invention at first utilizes existing software vulnerability sequence, creates the software vulnerability sequence database; Use the software vulnerability sequence in the clustering technique analytical database then, produce a plurality of software vulnerability sequence clustering cluster; Calculate through similarity, find the software vulnerability sequence clustering cluster the most close with the doubtful vulnerability sequence of software under testing; And adopt the sequence alignment method that the doubtful vulnerability sequence of software under testing and all software vulnerability sequences in the most close software vulnerability sequence clustering cluster are compared, further find in this software vulnerability sequence clustering cluster the most similar software vulnerability sequence of doubtful vulnerability sequence with software under testing; Finally, this is the most similar software vulnerability sequence pairing relevant vulnerability information in the software vulnerability sequence database is exported as safety message.

The objective of the invention is to realize through following technical proposals.

At first provide the definition of related notion:

[definition 1] software vulnerability sequence: be meant an orderly procedure operation sequence that can cause leak to produce.

The doubtful vulnerability sequence of [definition 2] software under testing: the program that is meant is through after the static analysis, and doubtful is software vulnerability sequence, the sequence that needs to analyze its leak reason.

[definition 3] collection: be the set of forming the least member of software vulnerability sequence; The least member of software vulnerability sequence (being also referred to as item) is the symbol of formalization statement leak characteristic; Formalization representation is: L={l ₁, l ₂..., l _m, wherein, L is a collection, l ₁, l ₂..., l _mFor forming the least member of software vulnerability sequence.

[definition 4] sequence is right: concentrate from item and choose two items arbitrarily, and two items choosing put in order and sequencing that it is concentrated at item is consistent, then claim these two items to be combined as sequence right; Formalization representation is e _k=l _Ij(i＜j).

The concrete operations step of the analytical approach of a kind of characteristic of software vulnerability sequence based on cluster of the present invention is following:

Step 1, utilize existing software vulnerability sequence data, create the software vulnerability sequence database.

Collect the software vulnerability sequence data that existing Hole Detection tool detection arrives, create the software vulnerability sequence database.

The software vulnerability sequence database representes with D, the running process when comprising fixed software vulnerability and producing.D is tuple < SID, S, INF; CID>set, wherein SID is the sequence number of this software vulnerability sequence, S is a software vulnerability sequence; INF is the relevant vulnerability information of this software vulnerability sequence; CID is the class-mark that gathers of this software vulnerability sequence, is the field of setting for software vulnerability sequence is carried out cluster analysis, to realize representing that with the different class-marks that gathers vulnerability sequence belongs to different classes.

Step 2, employing clustering technique are classified to the software vulnerability sequence in the software vulnerability sequence database.

Its concrete operations step is following:

The 1st step: the software vulnerability sequence in the software vulnerability sequence database of step 1 foundation carries out pre-service; Its concrete operations step is:

1. the item collection L that comprises according to the software vulnerability sequence database D obtains the right set E={e of sequence that comprises in the software vulnerability sequence database ₁, e ₂..., e _k, e _k=l _il _j(i＜j);

2. for each software vulnerability sequence, describe with the right isometric vector of set E of sequence with one group, if comprise sequence in the software vulnerability sequence to e _k, its corresponding vector value is 1; Otherwise its corresponding vector value is 0;

The 2nd step: the software vulnerability sequence that obtains after adopting clustering technique to the 1st step pre-service carries out cluster.Adopt the concrete operations of K-means algorithm following:

1. define software vulnerability sequence clustering cluster structure: each software vulnerability sequence clustering cluster structure C _iRepresent by a three-dimensional array: C _i={ X _i, (count _I1, count _I2..., count _In), S _i; Wherein, i is a positive integer, and 1≤i≤K is the numbering of software vulnerability sequence clustering cluster, and K is the quantity of the clustering cluster of artificial definition; X _i={ x _I1, x _I2..., x _InBe the center sequence of software vulnerability sequence clustering cluster, and be a n-dimensional vector, n is a positive integer, value is a right quantity of concentrating of sequence; Count _IkBe to comprise sequence to e in each software vulnerability sequence clustering cluster _kThe number of sequence; S _iBe the software vulnerability sequence that comprises in this software vulnerability sequence clustering cluster.

2. select the center sequence of K software vulnerability sequence arbitrarily as software vulnerability sequence in the initial software vulnerability sequence clustering cluster;

3. based on the center sequence of software vulnerability sequence in the software vulnerability sequence clustering cluster; The similarity of software vulnerability sequence in the center sequence of software for calculation vulnerability sequence and the software vulnerability sequence database classifies each software vulnerability sequence to the most similar software vulnerability sequence clustering cluster structure;

4. the acquisition methods of the center sequence of application software vulnerability sequence clustering cluster; Obtain new renewal center sequence, upgrade the center sequence of each software vulnerability sequence clustering cluster, repeat 3.; No longer change up to cluster result, obtain final software vulnerability sequence clustering cluster structure.

The acquisition methods of the center sequence of said software vulnerability sequence clustering cluster is: for each sequence to e _kDefine a minimum support threshold value θ _j(1≤j≤n), if software vulnerability sequence clustering cluster C _iInterior sequence is to e _kAverage support more than or equal to θ _j, think that sequence is to e _kBe software vulnerability sequence clustering cluster C _iThe characteristic attribute of interior software vulnerability sequence, x _IkGet 1; Otherwise, think attribute e _kBe software vulnerability sequence clustering cluster C _iThe non-characteristic attribute of interior software vulnerability sequence, x _IkGet 0; Shown in formula 1.

x_{ij} = \{\begin{matrix} 1, & \frac{{count}_{ij}}{| C_{i} |} &GreaterEqual; θ_{j} \\ 0, & \frac{{count}_{ij}}{| C_{i} |} < θ_{j} \end{matrix} - - - (1)

Wherein, | C _i| expression C _iThe number of middle sequence.

The similarity of software vulnerability sequence is calculated through formula 2 in the center sequence of software vulnerability sequence 3. and the software vulnerability sequence database:

Sim (p_{1}, p_{2}) = \frac{| p_{1} \cdot s \cap p_{2} \cdot s |}{| p_{1} \cdot s \cup p_{2} \cdot s |} - - - (2)

Wherein, p ₁.s represent software vulnerability sequence p ₁The set that the sequence of being supported is right; p ₂.s represent software vulnerability sequence p ₂The set that the sequence of being supported is right.

Step 3, from the source program of software under testing, extract the doubtful vulnerability sequence of software under testing;

Step 4, the searching software vulnerability sequence clustering cluster the most similar with the doubtful vulnerability sequence of software under testing

The doubtful vulnerability sequence of software under testing and the center sequence of each software vulnerability sequence clustering cluster are calculated similarity, find the software vulnerability sequence clustering cluster C the most similar with it _j(1≤j≤K); If this doubtful vulnerability sequence does not belong to any one software vulnerability sequence clustering cluster, the running process that the doubtful vulnerability sequence sequence that this software under testing is described is a safety, end operation; If this doubtful vulnerability sequence belongs to software vulnerability sequence clustering cluster C _j, forward step 5 to.

The similarity calculating method of software vulnerability sequence is identical in the computing method of the doubtful vulnerability sequence of said software under testing and the center sequence similarity degree of each software vulnerability sequence clustering cluster and software vulnerability sequence center sequence and the software vulnerability sequence database.

Step 5, application sequence comparison technological doubtful vulnerability sequence and software vulnerability sequence clustering cluster C with software under testing _jIn each software vulnerability sequence carry out sequence alignment, obtain the software vulnerability sequence S the most similar with it _j

Step 6, with software vulnerability sequence S the most similar in the software vulnerability sequence database with the doubtful vulnerability sequence of software under testing _jPairing leak information INF exports as safety message.

Beneficial effect

The inventive method is through carrying out cluster to the software vulnerability sequence in the software vulnerability sequence database; The doubtful vulnerability sequence of software under testing only with the software vulnerability sequence clustering cluster that obtains in K center sequence carry out measuring similarity; Obtain a software vulnerability sequence clustering cluster the most similar with it; The characteristic range of the doubtful vulnerability sequence of locking software under testing; The characteristic of directly getting rid of the software vulnerability sequence in other (K-1) individual software vulnerability sequence clustering cluster; The doubtful vulnerability sequence of further employing sequence alignment method judgement software under testing is the most similar with which the bar software vulnerability sequence in the most similar software vulnerability sequence clustering cluster, finally confirms the leak information of the doubtful vulnerability sequence of this software under testing.The present invention has improved the analysis efficiency of software vulnerability sequence, and for the operation of extensive vulnerability sequence data set, advantage of the present invention can be more outstanding.

Description of drawings

Fig. 1 is the schematic flow sheet of analytical approach of a kind of characteristic of software vulnerability sequence based on cluster of the specific embodiment of the invention;

Fig. 2 is S and the S in the specific embodiment of the invention ₃Needleman-Wunsch algorithm computation process synoptic diagram;

Fig. 3 is S and the S in the specific embodiment of the invention ₅Needleman-Wunsch algorithm computation process synoptic diagram.

Embodiment

Below in conjunction with embodiment technical scheme of the present invention is described in detail.

The schematic flow sheet of the analytical approach of a kind of characteristic of software vulnerability sequence based on cluster of the present invention is as shown in Figure 1, and the concrete operations step is following:

Step 1, utilize existing software vulnerability sequence data, create the software vulnerability sequence database, as shown in table 1.

Table 1 software vulnerability sequence database

SID	S _i	INF	CID
				1	S ₁＝ACBAA	Timing error	C ₁
2	S ₂＝ABA	The installation parameter mistake	C ₂
				3	S ₃?BCACCB	The boundary condition mistake	C ₃
4	S ₄＝CBCBCC	The visit check errors	C ₃
				5	S ₅＝CABCBCCB	The source check errors	C ₃
6	S ₆＝BACBB	The condition check errors	C ₃
				7	S ₇＝ABAAC	The access strategy mistake	C ₂
8	S ₈＝BCBA	The input validation mistake	C ₃
				9	S ₉＝ACABAA	Fortuitous event is disposed mistake	C ₃
10	S ₁₀＝BBCBA	The intermodule interaction error	C ₁

The 1st step: the software vulnerability sequence in the software vulnerability sequence database is carried out pre-service;

Behind the pretreatment software vulnerability sequence, 10 software vulnerability sequences storing in the software vulnerability sequence database are turned to 9 dimensional vectors by form, and are as shown in table 2.

Table 2 pretreatment software vulnerability sequence

SID	S _i	The attribute pair set	9 dimensional vectors are represented
				1	ACBAA	{AC，AB，AA，CB，CA，BA}	{1，1，1，1，0，0，1，1，0}
2	ABA	{AB，AA，BA}	{1，1，0，1，0，0，0，0，0}
				3	BCACCB	{BC，BA，BB，CA，CC，CB，AC，AB}	{0，1，1，1，1，1，1，1，1}
4	CBCBCC	{CB，CC，BC，BB}	{0，0，0，0，1，1，0，1，1}
				5	CABCBCCB	{CA，CB，CC，AB，AC，BB，BC}	{0，1，1，0，1，1，1，1，1}
6	BACBB	{BA，BB，BC，AC，AB，CB}	{0，1，1，1，1，1，0，1，0}
				7	ABAAC	{AB，AA，AC，BA，BC}	{1，1，1，1，0，1，0，0，0}
8	BCBA	{BA，BC，BB，CB，CA}	{0，0，0，1，1，1，1，1，0}
				9	ACABAA	{AC，AA，AB，CA，CB，BA}	{1，1，1，1，0，0，1，1，0}
10	BBCBA	{BB，BC，BA，CB，CA}	{0，0，0，1，1，1，1，1，0}

The 2nd step: adopt clustering technique that the 1st step was carried out cluster through the software vulnerability sequence that obtains after the pre-service.

Adopt the K-means algorithm that software vulnerability sequence is carried out cluster, K=3 is set, preceding 3 the software vulnerability sequence sequences of supposition are 3 software vulnerability sequence clustering cluster C earlier ₁, C ₂, C ₃The center sequence.Calculate 7 software vulnerability sequences and 3 software vulnerability sequence clustering cluster C of back ₁, C ₂, C ₃The similarity of center sequence, the center sequence of update software vulnerability sequence clustering cluster, similarity threshold is set at 0.5.

Software vulnerability sequence in the software vulnerability sequence database is divided into C ₁, C ₂, C ₃Totally 3 software vulnerability sequence clustering cluster are assigned to software vulnerability sequence clustering cluster C ₁In software vulnerability sequence be S ₄, S ₉Be assigned to software vulnerability sequence clustering cluster C ₂In software vulnerability sequence be S ₂, S ₇Be assigned to software vulnerability sequence clustering cluster C ₃In software vulnerability sequence be S ₁, S ₃, S ₅, S ₆, S ₈, S ₁₀, and C ₁, C ₂, C ₃The center sequence be respectively S ₄, S ₂, S ₃

Step 4, to the doubtful vulnerability sequence S of software under testing _iCalculate similarity with the center sequence of each software vulnerability sequence clustering cluster.

The doubtful vulnerability sequence S=ABCBABB of software under testing.

Software vulnerability sequence S ₄Vector representation be p ₄={ 0,1,1,0,1,1,1,1,1} according to the definition of sequence similarity degree, calculates S respectively ₄With software vulnerability sequence S ₁, S ₂, S ₃Similarity, can get Sim (p ₁, p ₄)=1/9, Sim (p ₂, p ₄)=0/7, Sim (p ₃, p ₄)=4/8, and maximum Sim (p among the three ₃, p ₄)=4/8>=0.5, therefore, with S ₄Be assigned to S ₃Software vulnerability sequence clustering cluster C for the center ₃In.

Software vulnerability sequence S ₅Vector representation be p ₅={ 0,0,0,0,1,1,0,1,1} calculates S respectively ₅With software vulnerability sequence S ₁, S ₂, S ₃Similarity, can get Sim (p ₁, p ₅)=4/9, Sim (p ₂, p ₅)=1/9, Sim (p ₃, p ₅)=7/8, and maximum Sim (p among the three ₃, p ₅)=7/8＞0.5, therefore, with S ₅Be assigned to S ₃Software vulnerability sequence clustering cluster C for the center ₃In.

Software vulnerability sequence S ₆Vector representation be p ₆={ 0,1,1,1,1,1,0,1,0} calculates S respectively ₆With software vulnerability sequence S ₁, S ₂, S ₃Similarity, can get Sim (p ₁, p ₆)=4/8, Sim (p ₂, p ₆)=2/7, Sim (p ₃, p ₆)=6/8, and maximum Sim (p among the three ₃, p ₆)=6/8＞0.5, therefore, with S ₆Be assigned to S ₃Software vulnerability sequence clustering cluster C for the center ₃In.

In like manner can get Sim (p ₂, p ₇)=3/5＞0.5, S ₇Be assigned to S ₂Software vulnerability sequence clustering cluster C for the center sequence ₂In; Sim (p ₃, p ₈)=5/8＞0.5, S ₈Be assigned to S ₃Software vulnerability sequence clustering cluster C for the center sequence ₃In; Sim (p ₂, p ₇)=1＞0.5, S ₉Be assigned to S ₁Software vulnerability sequence clustering cluster C for the center sequence ₁In; Sim (p ₃, p ₁₀)=5/8＞0.5, S ₁₀Be assigned to S ₃Software vulnerability sequence clustering cluster C for the center sequence ₃In.

Therefore, after the calculating of the similarity between first round software vulnerability sequence, be assigned to software vulnerability sequence clustering cluster C ₁In software vulnerability sequence S is arranged ₁, S ₉Be assigned to software vulnerability sequence clustering cluster C ₂In software vulnerability sequence S is arranged ₂, S ₇Be assigned to software vulnerability sequence clustering cluster C ₃In software vulnerability sequence S is arranged ₃, S ₄, S ₅, S ₆, S ₈, S ₁₀, carry out second below and take turns calculating.

According to K-means algorithm thought, guarantee that similarity is high in K the software vulnerability sequence clustering cluster, and similarity is low between the software vulnerability sequence clustering cluster, further upgrades the center sequence of K software vulnerability sequence clustering cluster.Because Sim is (p ₁, p ₃)=5/8＞0.5, and Sim (p ₃, p ₄)=4/8＜5/8, therefore, software vulnerability sequence clustering cluster C ₁The center sequence be updated to S ₄, S ₁Be assigned to software vulnerability sequence clustering cluster C ₃In, and software vulnerability sequence clustering cluster C ₃The center sequence also need further choose.

At software vulnerability sequence S ₃, S ₄, S ₅, S ₆, S ₈, S ₁₀In, through calculating similarity in twos, regain software vulnerability sequence clustering cluster C ₃The center sequence, final, obtain software vulnerability sequence S ₃Still be the center sequence after upgrading.

Therefore, after employing K-means algorithm carries out clustering processing to the sequence in the vulnerability sequence storehouse, can know that 10 software vulnerability sequences are divided into 3 software vulnerability sequence clustering cluster, software vulnerability sequence S ₄, S ₉Be assigned to software vulnerability sequence clustering cluster C ₁In; Software vulnerability sequence S ₂, S ₇Be assigned to software vulnerability sequence clustering cluster C ₂In; Software vulnerability sequence S ₁, S ₃, S ₅, S ₆, S ₈, S ₁₀Be assigned to software vulnerability sequence clustering cluster C ₃In.And 3 software vulnerability sequence clustering cluster C ₁, C ₂, C ₃The center sequence be respectively S ₄, S ₂, S ₃

Suppose the doubtful vulnerability sequence S=ABCBABB of software under testing, below with the doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of software under testing ₁, C ₂, C ₃Center sequence S ₄, S ₂, S ₃Carry out similarity respectively and calculate, judge that the doubtful vulnerability sequence S of software under testing is the most similar with which software vulnerability sequence clustering cluster.

Known software vulnerability sequence S ₄=CBCBCC, S ₂=ABA, S ₃=BCACCB, p ₄=0,0,0,0,1,1,0,1,1}, p ₂=1,1,0,1,0,0,0,0,0}, p ₃={ 0,1,1,1,1,1,1,1,1} at first, converts the doubtful vulnerability sequence S=ABCBABB of software under testing into 9 dimensional vectors and representes that the sequence pair set is { AA, AB, AC, BB, BA, BC, CA, CB}, p={1,1,1,1,1,1,1,1,0}.

Calculate Sim (p, p through similarity ₄)=3/9, Sim (p, p ₂)=3/8, Sim (p, p ₃)=7/9, and maximum Sim (p, p among the three ₃)=7/9＞0.5 shows the doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of software under testing ₃The most similar.

Step 5, use doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of Needleman-Wunsch algorithm with software under testing _jIn each software vulnerability sequence carry out sequence alignment, obtain the software vulnerability sequence S the most similar with it _j

The performing step of Needleman-Wunsch algorithm is following:

The 1st step: at first make up (m+1) * (n+1) and get sub matrix;

The 2nd the step: according to build sub matrix, find the solution the doubtful vulnerability sequence S of software under testing and the software vulnerability sequence clustering cluster C the most similar with it _jIn software vulnerability sequence S _jThe optimal value of prefix subsequence;

The 3rd step: find the solution the optimal value of more extensive subsequence then, up to trying to achieve sequence S and S _jOptimal value;

The 4th step: next utilizes the optimum comparison of optimum comparison of each stage score matrix construction;

The 5th step: the software vulnerability sequence of the optimum comparison of doubtful vulnerability sequence S of final acquisition and software under testing.

Adopt doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of Needleman-Wunsch algorithm with software under testing ₃In software vulnerability sequence S ₁, S ₃, S ₅, S ₆, S ₈, S ₁₀Carry out sequence alignment, obtain the software vulnerability sequence the most similar with it.S wherein ₁=ACBAA, S ₃=BCACCB, S ₅=CABCBCCB, S ₆=BACBB, S ₈=BCBA, S ₁₀=BBCBA.S and S ₃Needleman-Wunsch algorithm computation process as shown in Figure 2; S and S ₅Needleman-Wunsch algorithm computation process as shown in Figure 3.

Doubtful vulnerability sequence S and software vulnerability sequence S according to top software under testing ₃, S ₅Needleman-Wunsch algorithm computation process synoptic diagram, to the doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of software under testing ₃In software vulnerability sequence S ₃, S ₁, S ₅, S ₆, S ₈, S ₁₀Mate, represent to insert or delete corresponding character through insertion " _ " and obtain the maximum comparability arrangement between the software vulnerability sequence.

S ₁With the optimum comparison of S be:

S ₁：A_CBA_?_

S：ABCBABB

S ₃With the optimum comparison of S be:

S ₃：_BC_?_B_

S：ABCBABB

S and S ₅Optimum comparison be:

S：_ABCB_?_B

S ₅：CABCBCCB

S ₆With the optimum comparison of S be:

S ₆：_B_?_?_BB

S：ABCBABB

S ₈With the optimum comparison of S be:

S ₈：_BCBA_?_

S：ABCBABB

S ₁₀With the optimum comparison of S be:

S ₁₀：_B_B_B_

S：ABCBABB

Therefore, the doubtful vulnerability sequence S of software under testing and software vulnerability sequence clustering cluster C ₃In software vulnerability sequence S ₅The most similar.

Because the doubtful vulnerability sequence S and the software vulnerability sequence clustering cluster C of software under testing ₃In software vulnerability sequence S ₅The most similar, further from table 1 software vulnerability sequence database, extract software vulnerability sequence S ₅Therefore pairing relevant vulnerability information INF, can judge that the leak of the doubtful vulnerability sequence S of software under testing is characterized as the source check errors, and export with the safety message form for the source check errors.

Claims

1. the method for distilling of an operation sequence of software vulnerability characteristics, it is characterized in that: the concrete operations step is following:

Step 1, utilize existing software vulnerability sequence data, create the software vulnerability sequence database; The software vulnerability sequence database comprises the relevant vulnerability information of software vulnerability sequence;

Step 2, employing clustering technique are classified to the software vulnerability sequence in the software vulnerability sequence database; The concrete operations step is following:

1. the item collection L that comprises according to the software vulnerability sequence database D obtains the right set E={e of sequence that comprises in the software vulnerability sequence database ₁, e ₂..., e _k, e _k=l _il _j, i＜j wherein;

The item collection is the set of forming the least member of software vulnerability sequence; The least member of software vulnerability sequence is the symbol of formalization statement leak characteristic; Formalization representation is: L={l ₁, l ₂..., l _m, wherein, L is a collection, l ₁, l ₂..., l _mFor forming the least member of software vulnerability sequence;

Concentrate from item and to choose two items arbitrarily, and two items choosing put in order and sequencing that it is concentrated at item is consistent, then claim these two items to be combined as sequence right;

The 2nd step: the software vulnerability sequence that obtains after adopting clustering technique to the 1st step pre-service carries out cluster;

Step 4, the searching software vulnerability sequence clustering cluster the most similar: the doubtful vulnerability sequence of software under testing and the center sequence of each software vulnerability sequence clustering cluster are calculated similarity, find the software vulnerability sequence clustering cluster C the most similar with it with the doubtful vulnerability sequence of software under testing _j, 1≤j≤K; If this doubtful vulnerability sequence does not belong to any one software vulnerability sequence clustering cluster, the running process that the doubtful vulnerability sequence sequence that this software under testing is described is a safety, end operation; If this doubtful vulnerability sequence belongs to software vulnerability sequence clustering cluster C _j, forward step 5 to;

2. the method for distilling of a kind of operation sequence of software vulnerability characteristics as claimed in claim 1 is characterized in that: the concrete operations step that the software vulnerability sequence that the 2nd step of step 2 obtains after adopting the K-means algorithm to the 1st step pre-service carries out cluster is following:

1. define software vulnerability sequence clustering cluster structure: each software vulnerability sequence clustering cluster structure C _iRepresent by a three-dimensional array: C _i={ X _i, (count _I1, count _I2..., count _In), S _i; Wherein, i is a positive integer, and 1≤i≤K is the numbering of software vulnerability sequence clustering cluster, and K is the quantity of the clustering cluster of artificial definition; X _i={ x _I1, x _I2..., x _InBe the center sequence of software vulnerability sequence clustering cluster, and be a n-dimensional vector, n is a positive integer, value is a right quantity of concentrating of sequence; Count _IkBe to comprise sequence to e in each software vulnerability sequence clustering cluster _kThe number of sequence; S _iBe the software vulnerability sequence that comprises in this software vulnerability sequence clustering cluster;

4. the acquisition methods of the center sequence of application software vulnerability sequence clustering cluster; Obtain new renewal center sequence, upgrade the center sequence of each software vulnerability sequence clustering cluster, repeat 3.; No longer change up to cluster result, obtain final software vulnerability sequence clustering cluster structure;

The acquisition methods of the center sequence of said software vulnerability sequence clustering cluster is: for each sequence to e _kDefine a minimum support threshold value θ _j, if 1≤j≤n is software vulnerability sequence clustering cluster C _iInterior sequence is to e _kAverage support more than or equal to θ _j, think that sequence is to e _kBe software vulnerability sequence clustering cluster C _iThe characteristic attribute of interior software vulnerability sequence, x _IkGet 1; Otherwise, think attribute e _kBe software vulnerability sequence clustering cluster C _iThe non-characteristic attribute of interior software vulnerability sequence, x _IkGet 0;

Shown in formula 1;

x_{ij} = \{\begin{matrix} 1, & \frac{{count}_{ij}}{| C_{i} |} &GreaterEqual; θ_{j} \\ 0, & \frac{{count}_{ij}}{| C_{i} |} {< θ}_{j} \end{matrix} - - - (1)

Wherein, | C _i| expression C _iThe number of middle sequence.

3. the method for distilling of a kind of operation sequence of software vulnerability characteristics as claimed in claim 2 is characterized in that: step 2 in the 2nd step 3. in the center sequence of said software vulnerability sequence and the software vulnerability sequence database center sequence similarity degree of the doubtful vulnerability sequence of the similarity of software vulnerability sequence and the software under testing described in the step 4 and each software vulnerability sequence clustering cluster pass through formula 2 acquisitions:

Sim (p_{1}, p_{2}) = \frac{| p_{1} . s \cap p_{2} . s |}{| p_{1} . s \cup p_{2} . s |} - - - (2)

Wherein, p ₁.s represent the right set of sequence that software vulnerability sequence p1 is supported; p ₂.s represent software vulnerability sequence p ₂The set that the sequence of being supported is right.

4. according to claim 1 or claim 2 a kind of method for distilling of operation sequence of software vulnerability characteristics, it is characterized in that: the sequence alignment technology described in the step 5 is the Needleman-Wunsch algorithm; Implementation step is following:

The 1st step: make up (m+1) * (n+1) and get sub matrix;

The 2nd the step: according to build sub matrix, find the solution the doubtful vulnerability sequence S of software under testing and the software vulnerability sequence clustering cluster C the most similar with it _jIn software vulnerability sequence S _JmThe optimal value of prefix subsequence;

The 3rd step: find the solution the optimal value of more extensive subsequence then, up to trying to achieve sequence S and S _JmOptimal value;