Disclosure of Invention
The invention aims to overcome the defects in the background technology and realize accurate evaluation of the data leakage risk of the information system.
In order to achieve the above purpose, a data leakage risk assessment method is adopted, which comprises the following steps:
acquiring data leakage historical information of an information system, processing the data leakage historical information, and constructing a data leakage information feature word set;
matching the characteristic words in the data leakage information characteristic word set with the sensitive information in the confidential sensitive information base item by item, and constructing a characteristic word matching set by using the characteristic words successfully matched with the sensitive information;
and processing the feature words in the feature word matching set by using an analytic hierarchy process and a fuzzy mathematical process to obtain a data leakage risk value of the information system.
Further, the acquiring data leakage history information of the information system, processing the data leakage history information, and constructing a data leakage information feature word set includes:
extracting text information of the data leakage historical information;
performing word segmentation processing on the text information by using a statistical word segmentation method to obtain a word segmentation list;
and performing stop word filtering on the words in the word list, and constructing the data leakage information characteristic word set by using the words after the stop words are deleted.
Matching the characteristic words in the data leakage information characteristic word set with the sensitive information in the confidential sensitive information base item by item, and constructing a characteristic word matching set by using the characteristic words successfully matched with the sensitive information, wherein the matching process comprises the following steps: comparing the data leakage characteristic words with the sensitive words in the sensitive information base one by one, if the data leakage characteristic words are consistent with the sensitive words in the sensitive information base, matching successfully, and recording the characteristic words in a characteristic word set base; if the data is not matched with the sensitive words in the sensitive information base, the matching is not successful, the data leakage characteristic words are not recorded in the characteristic word set base, and another sensitive word is selected from the sensitive information base to repeat the process until all the sensitive words in the sensitive information base are compared.
Constructing a data leakage information characteristic word set library, comprising the following steps: extracting historical text information of the data leakage; performing word segmentation processing on the text information by using a statistical word segmentation method to obtain a word segmentation list; and filtering stop words of the words in the word list, comparing the deleted stop words with the sensitive words in the sensitive information base one by one to obtain a characteristic word set, and constructing a data leakage information characteristic word set base.
Processing the feature words in the feature word matching set by using an analytic hierarchy process and a fuzzy mathematical process to obtain a data leakage risk value of an information system, wherein the data leakage risk value comprises the following steps:
constructing hierarchical structures of the importance degrees of the data leakage risk factors of different levels;
judging the relative importance of the data leakage risk factors of different levels by adopting a scaling method according to the hierarchical structure, and constructing a judgment matrix;
calculating a sorting weight vector of a judgment matrix by using the analytic hierarchy process;
constructing a risk element set according to the data leakage risk elements, and constructing a risk evaluation set for each data leakage risk element according to a fuzzy mathematical method;
calculating a membership matrix according to the risk element set and the risk evaluation set, and calculating a ranking weight vector of the membership matrix;
and synthesizing the sequencing weight vector of the judgment matrix and the sequencing weight vector of the membership degree matrix by using the fuzzy mathematical method to obtain a data leakage risk value of the information system.
Constructing hierarchical structure of importance of data leakage risk factors of different levels, comprising:
and setting the influence factors of the importance of the data leakage risk factors, including the data leakage occurrence probability, the data leakage influence degree and the information system equipment importance, and constructing a hierarchical structure for describing the importance of the data leakage risk factors of different levels of the information system.
According to the hierarchical structure, judging the relative importance of the data leakage risk factors of different levels by adopting a scaling method, and constructing a judgment matrix, wherein the judgment matrix comprises the following steps:
setting the relative importance of the data leakage occurrence probability as IpThe relative importance of the data leakage influence degree is IFAnd the relative importance of the information system equipment is IDThe method comprises the following steps:
wherein, FhData leakage events, D, representing a File level hgA data leak event representing a leaking device class g;
constructing a judgment matrix B according to the ratio of the data leakage occurrence probability, the data leakage influence degree and the relative importance of the information system equipment importance:
wherein, the element B in the judgment matrix BijAnd represents the ratio of the data leakage occurrence probability of the ith element to the data leakage occurrence probability of the jth element.
Calculating the sequencing weight vector of the judgment matrix by using the analytic hierarchy process, wherein the method comprises the following steps:
and calculating to obtain a feature vector M by using the judgment matrix:
M=(m1,m2,…mi,…mn)
wherein the content of the first and second substances,
b
i1b
i2…b
inthe element of the judgment matrix is n, and the order of the judgment matrix is n;
normalizing the characteristic vector M to obtain a sorting weight vector W of the judgment matrix (W ═ W)
1,W
2,…,W
n) Wherein, in the step (A),
constructing a risk element set according to the data leakage risk elements, and constructing a risk evaluation set for each data leakage risk element according to a fuzzy mathematical method, wherein the risk evaluation set comprises the following steps:
the set of construction risk elements is U ═ U1,u2,…uk,…uK};
Setting the risk evaluation set of the data leakage influence degree and the data leakage occurrence probability as an equipment set of an information systemE={E1,E2,…Et,…ET}。
Calculating a membership matrix according to the risk element set and the risk evaluation set, and calculating a ranking weight vector of the membership matrix, wherein the ranking weight vector comprises the following steps:
establishing a fuzzy mapping function of the risk element set and the risk evaluation set:
f:U→F(E)
wherein F (E) is the fuzzy set totality on the risk evaluation set E, and u is satisfiedk→f(uk)=(pk1,pk2,…,pkK) E.g. the relationship of F (E), the mapping f represents the risk factor ukThe membership degree of the centralized evaluation standard of the risk evaluation, and the risk factor ukForming a membership vector P according to the membership degree of the risk evaluation setl=(pl1,pl2,…,plm),l=1,2,…,m;
Constructing a membership matrix P according to the membership degree of the risk element set U to the risk evaluation set E:
wherein the element p of the membership matrixkmRepresenting the probability of belonging to the mth judgment factor for the kth risk element;
and giving weight to the evaluation factors in the risk evaluation set, and setting the weight distribution set as A ═ a1,a2,…ak,…aK),akAnd (3) representing the weight of the kth judging factor relative to other judging factors, and carrying out fuzzy transformation operation:
v is the relative weight of the risk factors of the data level under each criterion of the equipment layer, and the ranking weight vector is obtained after normalization:
Wvb=(Wvb1,Wvb2,…,Wvbk,…,WvbK)
wherein, WvbA ranking weight vector, W, representing the system level b-th criterionvbkRepresenting the weight of the kth risk factor relative to other risk factors under system-level criteria.
Further, the synthesizing the ranking weight vector of the judgment matrix and the ranking weight vector of the membership matrix by using the fuzzy mathematical method to obtain the data leakage risk value of the information system includes:
and transposing the sorting weight vector W of the judgment matrix to obtain:
W′=WT;
calculating a data leakage risk value of each device of the information system:
r={r1,r2,…rz,…rZ}
rz=WvbW′;
wherein, WvbAn ordering weight vector of the membership matrix;
and calculating the total risk value R of the information system by adopting a weighted average method as follows:
compared with the prior art, the invention has the following technical effects: the method comprises the step of matching feature words in a data leakage information feature word set with sensitive information in a confidential sensitive information base item by item, wherein the sensitive word base is composed of information defining confidentiality, and the matching form of the feature words and the sensitive information comprises a regular expression, a dictionary, a script and a file type. And processing the acquired feature words based on an analytic hierarchy process and a fuzzy mathematical process, calculating a data leakage risk value of the information system, obtaining quantitative evaluation of the data leakage risk of the information system, and realizing accurate evaluation of the data leakage risk of the information system.
Detailed Description
To further illustrate the features of the present invention, refer to the following detailed description of the invention and the accompanying drawings. The drawings are for reference and illustration purposes only and are not intended to limit the scope of the present disclosure.
As shown in fig. 1, the present embodiment discloses a data leakage risk assessment method, including the following steps S1 to S3:
s1, acquiring data leakage history information of the information system, processing the data leakage history information, and constructing a data leakage information feature word set;
s2, matching the feature words in the data leakage information feature word set with the sensitive information in the confidential sensitive information base item by item, and constructing a feature word matching set by using the feature words successfully matched with the sensitive information;
and S3, processing the feature words in the feature word matching set by using an analytic hierarchy process and a fuzzy mathematical method to obtain a data leakage risk value of the information system.
As a further preferred technical solution, the main purpose of step S1 is to construct a data leakage information feature word set by data preprocessing, adopting text extraction, word segmentation, stop word filtering, and then placing the processed data into a corresponding position of a database, specifically as follows, S11 to S13:
s11, extracting text information of the data leakage history information, specifically:
extracting text contents of information system data leakage information with different formats, and removing format marks such as hyperlinks, stop words, punctuation marks, space marks, special characters and the like.
S12, performing word segmentation processing on the text information according to the special noun dictionary dic to obtain a word segmentation list, which specifically comprises the following steps:
and performing word segmentation processing on the extracted text information by using a statistical word segmentation algorithm to obtain a plurality of independent entries. Due to the particularity and diversity of Chinese, a special word related to information system data leakage, namely a special noun dictionary dic, is pre-stored in an information system data leakage information word segmentation library based on the statistical word segmentation, so that the word segmentation is performed on the information system data leakage content in a targeted manner.
And S13, performing stop word filtering on the words in the word list, and constructing the data leakage information characteristic word set by using the words after the stop words are deleted.
It should be noted that the stop word in this embodiment indicates that the occurrence is particularly frequent, and words having no special meaning in one sentence, such as "yes", "you", "i", "he", and the like, are removed by using a statistical word segmentation algorithm, and then the words form the data leakage information feature word set.
As a further preferable technical solution, the sensitive information in the confidential sensitive information base in step S2 describes the characteristics of the sensitive confidential information in the information system data leakage information, and the numerical values are convenient for calculation, specifically:
1) and importing a confidential sensitive information base which comprises a dictionary, a regular expression, a script and a file type, wherein the data items comprise confidential contents, time, names, behaviors and the like in information system data leakage information.
It should be noted that the sensitive information base is designed according to the professional term definition dictionary database, and the sensitive information base can be improved and increased according to a specific information system. The sensitive information base can better evaluate the data leakage of the information system, and the accuracy of the data leakage risk evaluation is improved.
2) And performing matching calculation on the information system data leakage information characteristic words obtained through data preprocessing, counting matching data if the data leakage information characteristic words are the same as certain information in the sensitive information base, and storing the characteristic words successfully matched with the sensitive information into a characteristic word matching set.
The matching process is as follows: comparing the data leakage characteristic words with the sensitive words in the sensitive information base one by one, if the data leakage characteristic words are consistent with the sensitive words in the sensitive information base, matching successfully, and recording the characteristic words in a characteristic word set base; if the data is not matched with the sensitive words in the sensitive information base, the matching is not successful, the data leakage characteristic words are not recorded in the characteristic word set base, and another sensitive word is selected from the sensitive information base to repeat the process until all the sensitive words in the sensitive information base are compared.
Constructing a data leakage information characteristic word set library, comprising the following steps: extracting historical text information of the data leakage; performing word segmentation processing on the text information by using a statistical word segmentation method to obtain a word segmentation list; and filtering stop words of the words in the word list, comparing the deleted stop words with the sensitive words in the sensitive information base one by one to obtain a characteristic word set, and constructing a data leakage information characteristic word set base.
The sensitive information base is designed according to a professional term definition dictionary database, and can be improved and increased according to a specific information system. The sensitive information base can better evaluate the data leakage of the information system, and the accuracy of the data leakage risk evaluation is improved.
As a more preferable embodiment, as shown in fig. 2, the step S3: the method for processing the feature words in the feature word matching set by using the analytic hierarchy process and the fuzzy mathematic process to obtain the data leakage risk value of the information system comprises the following steps of S31 to S36:
s31, constructing hierarchical structures of the importance degrees of the data leakage risk factors of different levels, wherein the hierarchical structures are used for describing the importance degrees of the data leakage risk factors of different levels of the information system in a hierarchical manner and are related to the factors such as 'data leakage occurrence probability', 'data leakage influence degree' and 'information system equipment importance' of the information system, as shown in FIG. 3.
S32, judging the relative importance of the data leakage risk factors of different levels by adopting a scaling method according to the hierarchical structure, and constructing a judgment matrix;
s33, calculating a sorting weight vector of the judgment matrix by using the analytic hierarchy process;
s34, constructing a risk element set according to the data leakage risk elements, and constructing a risk evaluation set for each data leakage risk element according to a fuzzy mathematical method;
s35, calculating a membership matrix according to the risk element set and the risk evaluation set, and calculating a ranking weight vector of the membership matrix;
and S36, synthesizing the sequencing weight vector of the judgment matrix and the sequencing weight vector of the membership degree matrix by using the fuzzy mathematical method to obtain the data leakage risk value of the information system.
As a more preferable embodiment, in step S32: according to the hierarchical structure, judging the relative importance of the data leakage risk factors of different levels by adopting a scaling method, and constructing a judgment matrix, which specifically comprises S321-S322:
s321, setting the relative importance of the data leakage occurrence probability as IpThe relative importance of the data leakage influence degree is IFAnd the relative importance of the information system equipment is IDThe method comprises the following steps:
wherein, FhThe data leakage event of the file level h is represented, the file level can be divided into 5 levels, namely open, internal, secret and secret, and respectively assigned with values of 1, 3, 5, 7 and 9; dgThe data leakage event of the leakage equipment grade g is represented, the leakage equipment grade is divided into 5 grades which are respectively public, internal, secret and secret, and respectively assigned with values of 1, 3, 5, 7 and 9;
s322, judging the relative importance of the data leakage risk factors of different levels of the information system by using a ninth scale method or a fifth scale method of the AHP, and constructing a judgment matrix B according to the ratio of the data leakage occurrence probability, the data leakage influence degree and the relative importance of the information system equipment importance:
wherein, the element B in the judgment matrix BijAnd represents the ratio of the data leakage occurrence probability of the ith element to the data leakage occurrence probability of the jth element.
As a more preferable embodiment, in step S33: calculating the sequencing weight vector of the judgment matrix by using the analytic hierarchy process, which specifically comprises the following steps:
and calculating to obtain a feature vector M by using the judgment matrix:
M=(m1,m2,…mi,…mn)
wherein the content of the first and second substances,
b
i1b
i2…b
inthe element of the judgment matrix is n, and the order of the judgment matrix is n;
normalizing the characteristic vector M to obtain a sorting weight vector W of the judgment matrix (W ═ W)
1,W
2,…,W
n) Wherein, in the step (A),
as a further preferred technical solution, in this embodiment, consistency check is further performed on the determination matrix, and the process is as follows:
obtaining the maximum characteristic root lambda of the matrixmax:
And (3) carrying out consistency check:
and when the C.I. <0.1 shows that the matrix consistency judgment is established, all weights have no logic errors, and the judgment matrix can be used for subsequent calculation.
As a more preferable embodiment, in step S34: constructing a risk element set according to the data leakage risk elements, and constructing a risk evaluation set for each data leakage risk element according to a fuzzy mathematical method, wherein the method specifically comprises the following steps:
the set of construction risk elements is U ═ U1,u2,…uk,…uK};
Setting the risk evaluation set of the data leakage influence degree and the data leakage occurrence probability as an equipment set E ═ { E ═ E of an information system1,E2,…Et,…ET}。
It should be noted that, in the conventional fuzzy judgment, a risk evaluation set is constructed according to each risk factor, and experts evaluate each risk factor with respect to the criterion of the previous layer, so as to measure the importance degree of the risk factor. However, because the data leakage influence degree and the data leakage occurrence probability of the information system are related to the equipment of the information system, aiming at the characteristics of the data leakage of the information system, the general fuzzy judgment method is improved, and the equipment set E of the information system is set to { E ═ E1,E2,…Et,…ETAnd the risk judgment set is changed as a risk judgment set of the data leakage influence degree and the data leakage occurrence probability of the information system.
As a more preferable embodiment, in step S35: calculating a membership matrix according to the risk element set and the risk evaluation set, and calculating a ranking weight vector of the membership matrix, wherein the method comprises the following steps S351 to S353:
s351, establishing a fuzzy mapping function of the risk element set and the risk evaluation set:
f:U→F(E)
wherein F (E) is the fuzzy set totality on the risk evaluation set E, and u is satisfiedk→f(uk)=(pk1,pk2,…,pkK) E.g. the relationship of F (E), the mapping f represents the risk factor ukThe membership degree of the centralized evaluation standard of the risk evaluation, and the risk factor ukForming a membership vector P according to the membership degree of the risk evaluation setl=(pl1,pl2,…,plm),l=1,2,…,m;
S352, constructing a membership matrix P according to the membership degree of the risk element set U to the risk evaluation set E:
wherein the element p of the membership matrixkmRepresenting the probability of belonging to the mth determinant for the kth risk element.
S353, the importance of the risk factors is greatly influenced by the size of the judgment factors in the judgment set, the judgment factors in the risk judgment set are weighted, and the weight distribution set is set as A ═ a1,a2,…ak,…aK),akAnd (3) representing the weight of the kth judging factor relative to other judging factors, and carrying out fuzzy transformation operation:
wherein v isbRepresenting the relative weight of data leakage of the b-th criterion, wherein V is the relative weight of the risk factors of the data level under each criterion of the equipment layer, and normalizing to obtain the sequencing weight vector of the membership matrix:
Wvb=(Wvb1,Wvb2,…,Wvbk,…,WvbK)
wherein, WvbA ranking weight vector, W, representing the system level b-th criterionvbkRepresenting the weight of the kth risk factor relative to other risk factors under system-level criteria.
As a more preferable embodiment, in step S36: synthesizing the sequencing weight vector of the judgment matrix and the sequencing weight vector of the membership matrix by using the fuzzy mathematical method to obtain a data leakage risk value of the information system, and the method comprises the following steps:
and transposing the sorting weight vector W of the judgment matrix to obtain:
W′=WT=(W1,W2,…,Wn)T;
calculating a data leakage risk value of each device of the information system:
r={r1,r2,…rz,…rZ}
rz=(Wv1z,Wv2z,…,Wvbz)W′;
wherein z represents the z-th device, WvbAnd the sorting weight vector is the membership matrix.
Because the equipment of the information system is in the same network, the importance of all the equipment of the information system is the same, and the total risk value R of the information system is calculated by adopting a weighted average method as follows:
as a further preferred technical solution, after calculating the risk value R of the information system, the data leakage risk level of the information system is divided into low risk, medium risk, and high risk according to the data leakage risk value, and the corresponding risk values are shown in table 1:
TABLE 1 information system data leakage risk level and risk value corresponding relation
It should be noted that, in this embodiment, the matching degree between the data leakage information of the information system and the confidential sensitive information base is processed, an analytic hierarchy process is used to find the ranking weight vector of the judgment matrix, the ranking weight vector of the risk elements and the membership matrix of the judgment set is found, and then a fuzzy mathematical method is used to synthesize the ranking weight vector of the judgment matrix and the ranking weight vector of the membership matrix, so as to obtain the data leakage risk value of the information system.
The technical effects of the present invention are as follows:
(1) the analytic hierarchy process, the fuzzy mathematical process and the probability calculation process are comprehensively applied, so that the operability of the data leakage risk assessment method is stronger, the calculation is easier, the influence of subjective factors on the data leakage risk assessment is effectively reduced, and the data leakage risk assessment method is more objective.
(2) Based on basic judgment that the data leakage of the information system mainly comes from using data leakage, transmitting data leakage and storing data leakage, the evaluation method for the data leakage risk of the information system, which is established by applying an analytic hierarchy process to the terminal data, the network data and the file data to layer and determining evaluation factors, weights, membership functions and component evaluation matrixes, is more targeted.
(3) The method comprises the steps of dividing elements related to data leakage risk assessment into three levels of a data level, an equipment level and a system level by utilizing the thought of an analytic hierarchy process, carrying out quantitative analysis and calculation by utilizing a fuzzy mathematical method and a probability calculation method on the basis, and converting a multi-target comprehensive evaluation problem of a network information system into a hierarchical weight decision and fuzzy mathematical membership problem for analysis and calculation.
The embodiment is mainly used for correctly mastering the essence of the analytic hierarchy process, the fuzzy mathematical process and the probability calculation process aiming at the characteristics and accurate classification of data leakage of the information system, and flexibly applying the analytic hierarchy process, the fuzzy mathematical process and the probability calculation process to the evaluation of data leakage risks of the information system.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.