CN117834309A

CN117834309A - Vulnerability assessment method based on contrast graph clustering and reinforcement learning

Info

Publication number: CN117834309A
Application number: CN202410251919.1A
Authority: CN
Inventors: 车洵; 谢华平; 孙捷
Original assignee: Nanjing Zhongzhiwei Information Technology Co ltd
Current assignee: Nanjing Zhongzhiwei Information Technology Co ltd
Priority date: 2024-03-06
Filing date: 2024-03-06
Publication date: 2024-04-05
Anticipated expiration: 2044-03-06
Also published as: CN117834309B

Abstract

The invention discloses a vulnerability assessment method based on contrast graph clustering and reinforcement learning, which comprises the following steps: inputting the network security vulnerability data set into a multi-dimensional conditional variation automatic encoder, learning a public feature representation and a specific feature representation, inputting a network environment parameter sample into a text encoder to generate a feature representation of the network environment; clustering based on similarity measurement is carried out to generate a cluster map; weighting the common feature representation using a dynamic sample weighting strategy; the intelligent agent takes the difference between the loopholes and the loophole-environment sample pairs as a reward function for training initial evaluation points, and an evaluation module calculates the initial evaluation points of the loopholes; inputting initial scores of loopholes and actual network environments into a decision module comprising a memory bank to generate final loophole evaluation scores; the method and the device have the advantages that the severity of the loopholes can be accurately predicted according to the actual network environment, and the accuracy of the prediction of the possibility that the loopholes are utilized and the accuracy of high loophole risk assessment are improved.

Description

Vulnerability assessment method based on contrast graph clustering and reinforcement learning

Technical Field

The invention relates to the technical field of network security, in particular to a vulnerability assessment method based on contrast graph clustering and reinforcement learning.

Background

With the increasing innovation and development of internet technology, the network security problem is also becoming more serious, the network attack scale is becoming more organized, the attack means are continuously changed, and the network attack scale is diversified and structured. In this context, vulnerabilities become a significant problem in network security. Vulnerabilities refer to vulnerabilities or errors that exist in a system that a hacker can exploit to conduct attacks and intrusions. The existing vulnerability early warning mechanism has certain hysteresis, and the process from vulnerability discovery and repair to notifying a user often needs a long time.

CVSS has become a widely adopted open framework for assessing the severity of security vulnerabilities in software systems and applications. CVSS was developed by the event response and security team forum to provide a standardized quantitative method for assessing the potential impact of vulnerabilities on affected systems, enabling organizations to more effectively prioritize their security response and repair work. The CVSS score is calculated using a combination of index values, with the result that a numerical score ranging from 0.0 to 10.0 is obtained, with greater values having greater severity. Specifically, the CVSS score is calculated from a set of vulnerability characteristics that fall into three main categories:

basic index feature set: the class indicator describes the inherent properties of the vulnerability, including its availability and potential impact on the affected system if the vulnerability is successfully exploited. The features of the basic set of metrics include attack vector, attack complexity, required permissions, user interactions, confidentiality impact, integrity impact, availability impact, and scope.

Time index feature set: the index reflects the current state of the vulnerability over time, and factors such as availability of the vulnerability exploitation or patch, maturity of repair work, confidence of vulnerability analysis and the like are considered. The time index set includes exploit code maturity, repair level, and reporting confidence.

Environmental index feature set: such metrics take into account the specific environment in which the vulnerable system is located, factors such as the importance of the affected system to the organization, potential collateral damage, and the security requirements of the organization. The environmental metrics include modified base metrics, confidentiality requirements, integrity requirements, and availability requirements.

Although CVSS is widely used to evaluate vulnerability risk and generate scores reflecting its severity, the CVSS framework only provides severity and impact scores for individual vulnerabilities and cannot evaluate the vulnerability risk of the entire system according to the actual network environment at present. Therefore, an improved vulnerability risk assessment algorithm is provided on the basis of the CVSS, which can assess the severity of the vulnerability aiming at the uncertainty of the network environment parameters, so as to solve the problem of helping security managers measure the vulnerability risk of the whole system and improve the efficiency of network security management.

Disclosure of Invention

In order to achieve the above object, the present inventors provide a vulnerability assessment method based on contrast graph clustering and reinforcement learning, comprising the steps of:

s1, inputting a public network security vulnerability data set into a multi-dimensional condition variation automatic encoder, learning public characteristic representation and specific characteristic representation of various indexes of a vulnerability in a potential characteristic space by the multi-dimensional condition variation automatic encoder, and inputting a network environment parameter sample into a text encoder to generate characteristic representation of a network environment;

s2, clustering the specific characteristic representation of various indexes of the loopholes and the characteristic representation of the network environment based on similarity measurement to generate a cluster map;

s3, under the guidance of high confidence information of the cluster map, using the comparison learning of network environment perception to generate a dynamic sample weighting strategy of the network environment, and carrying out weighting treatment on public feature representations of various indexes of the vulnerability;

s4, the intelligent agent takes the difference between the loopholes identified by the client and the loophole-environment sample pairs as a reward function for training initial evaluation scores, and calculates the initial evaluation scores of the loopholes by using a lightweight evaluation module;

s5, inputting the initial scores of the loopholes and the actual network environment into a decision module comprising a memory bank to generate final loophole evaluation scores.

As a preferred mode of the present invention, the step S1 further includes the steps of:

s101, the given public network security hole data set is recorded asWherein->Indicating that the data set together comprises +.>Type of vulnerability to->Each type of loopholes in the model is respectively input into a multidimensional conditional variation automatic encoder, which comprises the step of inputting basic index characteristic variation automatic encoder->Automatic encoder for time index characteristic variation>And environmental index feature variation automatic encoder +.>In the expression:

，

wherein,representing mathematical expectation value, ++>Representing basic index features->Representing the time index feature>Representing environmental index features, < >>Type description representing vulnerability->Posterior distribution of basic index features under vulnerability type description>Posterior distribution of time index features under vulnerability type description>Posterior distribution of environmental index features under vulnerability type description, ++>Representing posterior distribution of all vulnerability characteristics under various index characteristics, +.>Representing a logarithmic function>Indicating the basic index features in time index and environment indexA priori distribution under the target features ∈ ->Representing a priori distribution of time index features under basic index and environmental index features, < ->Representing a priori distribution of environmental indicator features under basic indicator and time indicator features, < >>The relative entropy between posterior distribution and prior distribution is represented, and JSD represents the difference degree between posterior distribution and prior distribution;

s102, learning common characteristic representation and specific characteristic representation of various indexes of a vulnerability in a potential characteristic space by a multidimensional conditional variation automatic encoder, wherein the expressions are as follows:

，

wherein,and->Is composed of 3 x 3 convolution, which respectively represent a common feature mapping layer and a specific feature mapping layer, < ->Is a common characteristic representation of basic index characteristics in potential space,/->Is a specific characteristic representation of the basic index characteristic in the potential space, < ->Is a common feature representation of the time index feature in the potential space,/->Is a time index featureSpecific characteristic representation in potential space, +.>Is a common feature representation of the environmental indicator feature in the potential space, is->Is a specific characteristic representation of the environmental index characteristic in the potential space;

for network environment parameter samplesInput to text encoder->Generating a characteristic representation of the network environment, the expression being:

，

wherein,for the characterization of the network environment, < - > a->Is composed of a plurality of self-attention modules.

As a preferred mode of the present invention, the step S2 further includes the steps of:

s201, carrying out similarity measurement on specific characteristic representations of various indexes of the vulnerability and characteristic representations of a network environment, wherein a similarity measurement function expression is as follows:

，

wherein,representing the similarity between basic index features and network environment features,/->Representing any two of the basic index features, < +.>Representing the mathematical expectation of any two of the basic index features to the network environment features, +.>And->Representing hyper-parameters for adjusting the weights of the sample feature attributes,/->Expressed as natural number +.>An exponential function of the base +.>Representing cosine similarity;

，

wherein,representing the similarity between the time index feature and the network environment feature,/->Representing any two of the time index features, < +.>Mathematical expectation value representing any two characteristics of time index characteristics for network environment characteristics, +.>And->Representing super parameters for adjusting time samplesThe weight of the attribute of the feature;

，

wherein,representing the similarity between the environmental indicator feature and the network environmental feature,/->Representing any two of the basic index features, < +.>Mathematical expectation value representing any two characteristics of the environmental index characteristics for the network environmental characteristics, +.>And->The super-parameters are used for adjusting the weights of the characteristic attributes of the environmental samples;

s202, according to the similarity measurement result, using a structural encoder for K features with similar resultsAnd attribute encoder->Generating a cluster map->And->All the components are formed by graph convolution neural networks, and the expression is:

，

wherein,a cluster diagram representing the difference between the basic index feature of the vulnerability and the network environment feature sample pair, ++>A cluster diagram representing the difference between the vulnerability time index feature and the network environment feature sample pair, ++>A cluster map representing the pairs of vulnerability environmental index features and network environmental feature samples, ++>And the method is used for searching K features with similar results.

As a preferred mode of the present invention, the step S3 further includes the steps of:

s301, under the guidance of high confidence information of a cluster map, generating a dynamic sample weighting strategy of a network environment by using contrast learning of network environment perception, wherein the expression is as follows:

，

wherein,dynamic weights representing basic feature-network environment sample pairs, +.>Dynamic weights representing time feature-network environment sample pairs, +.>Dynamic weights representing environmental feature-network environmental sample pairs, +.>Respectively representing the coordinate indexes;

s302, carrying out weighting processing on common characteristic representations of various indexes of the vulnerability through corresponding weights, wherein the expression is as follows:

，

wherein,graph structure representing weighted base feature-network environment sample pairs, +.>Graph structure representing weighted temporal feature-network environment sample pairs, +.>Representing the graph structure of the weighted environmental feature-network environmental sample pairs, the GCN represents a graph convolutional neural network.

As a preferred mode of the present invention, step S4 further includes the steps of:

s401, using the intelligent agent in reinforcement learning to take the vulnerability identified by the client and the difference between the local environment and various vulnerability characteristic-network environment sample pairs as a reward function for training initial evaluation pointsThe expression is:

，

wherein,representing vulnerability identified by client and local environment, < ->A text encoder is represented by a representation of the text,and->Representing the coordinate index>Representing the distance of squared euclidean;

s402, the agent calculates initial vulnerability assessment score S by using a lightweight assessment module, wherein the assessment module takes vulnerabilities identified by the client and a local environment as input, and the expression is as follows:

，

wherein,representing the normalized exponential function, GMM representing the gaussian mixture model, and PCA representing the principal component analysis.

As a preferred mode of the present invention, step S5 further includes the steps of:

s501, inputting initial scores of loopholes and actual network environments into a decision module comprising a memory bank to generate final loophole evaluation scoresThe expression is:

，

wherein,is shown in memory bank->Finding out +.>Is similar to the actual network environment of (a)Assessment record of->The system consists of a plurality of self-attention modules, represents a text encoder which takes initial scores of loopholes, actual network environments and histories in a memory library as input, and MLP represents a multi-layer perceptron->Representing the mathematical expectation of the initial vulnerability score compared to the historic records in the repository under an actual network environment.

As a preferred mode of the present invention, further comprising the steps of: and S6, storing the difference between the initial evaluation and the final evaluation and the vulnerability characteristic distribution as self-feedback information in a memory bank.

As a preferred mode of the present invention, step S6 further includes the steps of:

s601, the initial evaluation score S and the final evaluation scoreDifference value->And vulnerability characteristic distribution identified by the client>Stored as self-feedback information in memory bank->The expression is:

；

s602, removing memory bank through gate control unit module and variable graph convolutionThe gate control unit module will store +.>Generating a convolution step strategy as input +.>The expression is:

，

wherein,representing a one-dimensional vector, different indexes representing different convolution steps, and +.>Is indicative of the effectiveness of using the convolution step, < >>The weight of each convolution step can be made between 0 and 1, tanh represents the hyperbolic tangent function,>representing a convolution function>Representing a batch normalization layer, +.>Respectively representing convolution parameters, wherein GAP represents a global average pooling layer;

s603, searching convolution step strategyThe index of the maximum effective value of (a) as a convolution step length, and changing the memory bank by controlling the convolution step length +.>Is expressed as:

，

wherein,representing the compressed memory bank, GCN representing the graph convolution neural network, ++>Convolution step length of the representation network, +.>Index for finding the maximum efficiency value, +.>Size of (2) represents memory bank->A kind of electronic device。

Compared with the prior art, the beneficial effects achieved by the technical scheme are as follows:

(1) In the prior art, only the recognition of the vulnerability characteristics is concerned, but the influence of the actual network environment is ignored, and in addition, the prior method ignores important structural information in the aspect of the characteristic recognition of the network security vulnerability, so that the representativeness of the selected characteristics is reduced; therefore, the method provides a new contrast graph clustering method, firstly, the connection between the vulnerability characteristics and the network environment parameters is constructed by introducing comprehensive similarity measurement standards, and a dynamic weighting strategy is provided to ensure that the characteristics of the security vulnerabilities are more discriminant;

(2) Prior reinforcement learning-based vulnerability detection methods prioritize feature sampling according to a reward function, but in complex actual network environments, such algorithms often produce suboptimal results; therefore, the method provides a new reinforcement learning method for vulnerability assessment, firstly, an agent interacts the characteristics of the contrast graph clusters with the identified vulnerabilities to generate initial vulnerability assessment scores, a decision module generates final vulnerability assessment scores according to the initial scores and the actual network environment, in addition, the difference between the initial results and the final results is stored in a memory library as self-feedback information to provide valuable feedback for future assessment, and the iterative process of self-feedback and persistent memory enables the agent to quickly improve the decision capability of the agent in various network environments by utilizing information feedback signals.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment.

FIG. 2 is a diagram of a reinforcement learning-based vulnerability assessment framework in accordance with an embodiment.

Detailed Description

In order to describe the technical content, constructional features, achieved objects and effects of the technical solution in detail, the following description is made in connection with the specific embodiments in conjunction with the accompanying drawings.

In order to aim at the disclosed network security vulnerability data, how to accurately predict the severity of the vulnerability according to the actual network environment is researched, and how to improve the accuracy of vulnerability exploitation possibility prediction and the accuracy of high vulnerability risk assessment. As shown in fig. 1 and 2, the embodiment provides a vulnerability assessment method based on contrast graph clustering and reinforcement learning, which includes the following steps:

In the implementation process of this embodiment, as shown in fig. 1, step S1 further includes the steps of:

s101, the given public network security hole data set is recorded asWherein->Indicating that the data set together comprises +.>Type of vulnerability to->Each type of loopholes in the model is respectively input into a multi-dimensional condition variation automatic encoder, and the multi-dimensional condition variation automatic encoder comprises a basic index feature variation automatic encoder ∈>Automatic encoder for time index characteristic variation>And environmental index feature variation automatic encoder +.>The expression:

，，，

wherein,representing mathematical expectation value, ++>Representing basic index features->Representing the time index feature>Representing environmental index features, < >>Type description representing vulnerability->Posterior distribution of basic index features under vulnerability type description>Posterior distribution of time index features under vulnerability type description>Posterior distribution of environmental index features under vulnerability type description, ++>Representing posterior distribution of all vulnerability characteristics under various index characteristics, +.>Representing a logarithmic function>Representing a priori distribution of basic index features under time index and environmental index features, < + >>Representing a priori distribution of time index features under basic index and environmental index features, < ->Representing a priori distribution of environmental indicator features under basic indicator and time indicator features, < >>The relative entropy between posterior distribution and prior distribution is represented, and JSD represents the difference degree between posterior distribution and prior distribution;

，

wherein,and->Are each composed of a 3 x 3 convolution, representing a common feature mapping layer and a specific feature mapping layer,is a common characteristic representation of basic index characteristics in potential space,/->Is a specific characteristic representation of the basic index characteristic in the potential space, < ->Is a common feature representation of the time index feature in the potential space,/->Is a specific characteristic representation of the time index characteristic in the potential space,/->Is a common feature representation of the environmental indicator feature in the potential space, is->Is a specific characteristic representation of the environmental index characteristic in the potential space;

，

In this embodiment, step S2 further includes the steps of:

，

wherein,representing the similarity between basic index features and network environment features,/->Representing any two of the basic index features, < +.>Mathematical period representing any two of basic index features to network environment featuresWaning value and->And->Representing hyper-parameters for adjusting the weights of the sample feature attributes,/->Expressed as natural number +.>An exponential function of the base +.>Representing cosine similarity;

，

wherein,representing the similarity between the time index feature and the network environment feature,/->Representing any two of the time index features, < +.>Mathematical expectation value representing any two characteristics of time index characteristics for network environment characteristics, +.>And->The super-parameters are used for adjusting the weights of the characteristic attributes of the time samples;

，

s202, according to the similarity measurement result, using a structural encoder for K features with the nearest resultsAnd attribute encoder->Generating a cluster map->And->All the components are formed by graph convolution neural networks, and the expression is:

，

wherein,a cluster diagram representing the relationship between the basic index feature of the vulnerability and the network environment feature sample pair, wherein each relationship in the cluster diagram represents the basic feature-network environment sample pair, +.>A cluster diagram representing the relationship between the vulnerability time index feature and the network environment feature sample pair, wherein each relationship in the cluster diagram represents the time feature-network environment sample pair, +.>A cluster diagram representing the relationship between the vulnerability environmental index feature and the network environmental feature sample pair, wherein each relationship in the cluster diagram represents the environmental feature-network environmental sample pair, +.>And the K features are used for searching the K features with the nearest results.

In this embodiment, step S3 further includes the steps of:

，

wherein,dynamic weights representing basic feature-network environment sample pairs, +.>Dynamic weights representing time feature-network environment sample pairs, +.>Dynamic weights representing environmental feature-network environmental sample pairs, +.>Respectively representing the coordinate indexes; the step takes corresponding various characteristic-network environment sample pairs as positive samples and the rest are negative samples, and generates a dynamic sample weighting strategy of the network environment in a comparison learning mode.

S302, weighting is carried out on the public feature representation of various indexes of the loopholes according to the weight corresponding to the step S301, so that the weight of a high-correlation loophole-environment sample pair is improved, the feature of the security loophole is more discriminative, and the expression is as follows:

，

wherein,graph structure representing weighted base feature-network environment sample pairs, +.>Graph structure representing weighted temporal feature-network environment sample pairs, +.>Representing the graph structure of weighted environmental feature-network environmental sample pairs, the GCN represents a graph convolutional neural network, consisting of a plurality of graph convolutional neural networks.

As shown in fig. 2, in the present embodiment, step S4 further includes the steps of:

s401, using Agent in reinforcement learning to take the differences between the loopholes identified by the client and the local environment and the various loophole characteristic-network environment sample pairs in step S302 as a reward function of training initial evaluation pointsThe expression is:

，

wherein,representing vulnerability identified by client and local environment, < ->A text encoder is represented by a representation of the text,and->Representing a coordinate index representing a distance of squared euclidean;

，

wherein,representing the normalized exponential function, GMM representing the gaussian mixture model, PCA representing principal component analysis, the evaluation module finds the initial vulnerability assessment score optimal solution under the direction of the reward function in step S401.

In this embodiment, step S5 further includes the steps of:

，

the decision module is based on the Q iterative algorithm in conventional reinforcement learning, wherein,is shown in memory bank->Finding out +.>Is similar to the actual network environment of the network,the system consists of a plurality of self-attention modules, represents a text encoder which takes initial scores of loopholes, actual network environments and histories in a memory library as input, and MLP represents a multi-layer perceptron->Representing the mathematical expectation of the initial vulnerability score compared to the historic records in the repository under an actual network environment.

In some embodiments, the method further comprises the step of: and S6, storing the difference between the initial evaluation and the final evaluation and the vulnerability characteristic distribution as self-feedback information in a memory bank. Specific:

；

s602, records in the memory library can provide value for future evaluationValue feedback information, but with the increase of the vulnerability count, the memory bank may risk memory overflow, so the memory bank is removed by a gating unit module and variable graph convolutionThe redundant information in (1) is first of all the memory bank is gated by the unit module>Generating convolution step strategy as inputThe expression is:

，/>

s603, searching convolution step strategyThe index of the maximum effective value of (a) as a convolution step length, and changing the memory bank by controlling the convolution step length>Is expressed as:

，

To verify the effectiveness of the present invention, the present invention conducted experiments on different vulnerability assessment data sets, such as: malware Training Sets dataset is a vulnerability detection dataset of the primary malware analysis, the EMBER dataset is used to train a machine learning model to statically detect Malicious Windows portable executable files, the maltools URLs dataset includes Malicious URL instances from large webmail providers that provide 6000-7500 spam and phishing URL instances per day, the MAWILab dataset is a network traffic anomaly detection dataset consisting of sets of labels of traffic anomalies in the MAWI archive, the Aposemat IoT-23 dataset is a network traffic dataset from internet of things (IoT) devices, the results of the invention at different vulnerability assessment datasets are shown in table 1.

Table 1: experimental results of the invention in different vulnerability assessment data sets

As can be seen from table 1, the present invention performs best on the EMBER dataset, mainly because the EMBER dataset contains more attack means samples, and sufficient training samples enable the present invention to be sufficiently trained, making the performance more excellent.

In addition, in order to verify the effectiveness of reinforcement learning adopted by the invention, the method comprises the following steps of: linear discriminant analysis algorithm (LDA), word displacement distance algorithm (WMD), linear discriminant analysis and word displacement distance comprehensive algorithm (WMD-LDA), local sensitive hash algorithm (Simhash) and Euclidean algorithm; the experimental results are shown in table 2.

Table 2: the invention compares experimental results with different evaluation decision algorithms

Table 2 shows that the vulnerability assessment method based on contrast graph clustering reinforcement learning of the invention has better performance than other algorithms in accuracy and F1 value.

It should be noted that, although the foregoing embodiments have been described herein, the scope of the present invention is not limited thereby. Therefore, based on the innovative concepts of the present invention, alterations and modifications to the embodiments described herein, or equivalent structures or equivalent flow transformations made by the present description and drawings, apply the above technical solution, directly or indirectly, to other relevant technical fields, all of which are included in the scope of the invention.

Claims

1. The vulnerability assessment method based on contrast graph clustering and reinforcement learning is characterized by comprising the following steps of:

2. The vulnerability assessment method based on contrast graph clustering and reinforcement learning of claim 1, wherein step S1 further comprises the steps of:

，，，

wherein,representing mathematical expectation value, ++>Representing basic index features->Representing the time index feature>Representing environmental index features, < >>Type description representing vulnerability->Represents posterior distribution of basic index features under the vulnerability type description,posterior distribution of time index features under vulnerability type description>Posterior distribution of environmental index features under vulnerability type description, ++>Represents posterior distribution of all vulnerability characteristics under various index characteristics,representing a logarithmic function>Representing the prior distribution of basic index features under the time index and environment index features,representing a priori distribution of time index features under basic index and environmental index features, < ->Representing a priori distribution of environmental indicator features under basic indicator and time indicator features, < >>The relative entropy between posterior distribution and prior distribution is represented, and JSD represents the difference degree between posterior distribution and prior distribution;

，

wherein,and->Is composed of 3 x 3 convolution, which respectively represent a common feature mapping layer and a specific feature mapping layer, < ->Is a common characteristic representation of basic index characteristics in potential space,/->Is a specific characteristic representation of the basic index characteristic in the potential space,is a common feature representation of the time index feature in the potential space,/->Is a specific characteristic representation of the time index characteristic in the potential space,/->Is a common feature representation of the environmental indicator feature in the potential space, is->Is a specific characteristic representation of the environmental index characteristic in the potential space;

，

3. The vulnerability assessment method based on contrast graph clustering and reinforcement learning of claim 2, wherein step S2 further comprises the steps of:

，

，，，

4. The vulnerability assessment method based on contrast graph clustering and reinforcement learning of claim 3, wherein step S3 further comprises the steps of:

，

5. The vulnerability assessment method based on contrast graph clustering and reinforcement learning of claim 4, wherein step S4 further comprises the steps of:

，

wherein,representing vulnerability identified by client and local environment, < ->Representing text encoder, ++>And->Representing the coordinate index>Representing the distance of squared euclidean;

，

6. The vulnerability assessment method based on contrast graph clustering and reinforcement learning of claim 5, wherein step S5 further comprises the steps of:

，

wherein,is shown in memory bank->Finding out +.>Evaluation records similar to the actual network environment, +.>The system consists of a plurality of self-attention modules, represents a text encoder which takes initial scores of loopholes, actual network environments and histories in a memory library as input, and MLP represents a multi-layer perceptron->Representing the mathematical expectation of the initial vulnerability score compared to the historic records in the repository under an actual network environment.

7. The contrast graph clustering and reinforcement learning based vulnerability assessment method of claim 6, further comprising the steps of: and S6, storing the difference between the initial evaluation and the final evaluation and the vulnerability characteristic distribution as self-feedback information in a memory bank.

8. The vulnerability assessment method based on contrast graph clustering and reinforcement learning of claim 7, wherein step S6 further comprises the steps of:

；

，