CN118349895B - Vulnerability sample library construction method, vulnerability identification device, vulnerability sample library construction equipment and vulnerability sample library medium - Google Patents

Vulnerability sample library construction method, vulnerability identification device, vulnerability sample library construction equipment and vulnerability sample library medium Download PDF

Info

Publication number
CN118349895B
CN118349895B CN202410780908.2A CN202410780908A CN118349895B CN 118349895 B CN118349895 B CN 118349895B CN 202410780908 A CN202410780908 A CN 202410780908A CN 118349895 B CN118349895 B CN 118349895B
Authority
CN
China
Prior art keywords
feature vector
vulnerability
information
group
vector matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410780908.2A
Other languages
Chinese (zh)
Other versions
CN118349895A (en
Inventor
李永刚
王利斌
林亮成
尹琴
潘善民
许斐
王延
陈晓雪
谷五勋
郑杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Siji Testing Technology Beijing Co ltd
State Grid Siji Network Security Beijing Co ltd
Original Assignee
State Grid Siji Testing Technology Beijing Co ltd
State Grid Siji Network Security Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Siji Testing Technology Beijing Co ltd, State Grid Siji Network Security Beijing Co ltd filed Critical State Grid Siji Testing Technology Beijing Co ltd
Priority to CN202410780908.2A priority Critical patent/CN118349895B/en
Publication of CN118349895A publication Critical patent/CN118349895A/en
Application granted granted Critical
Publication of CN118349895B publication Critical patent/CN118349895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a vulnerability sample library construction method, a vulnerability identification device, equipment and a medium. One embodiment of the method comprises the following steps: determining a pre-stored information security vulnerability data set as a historical information security vulnerability data set; for each historical information security vulnerability data, the following steps are performed: determining the characteristic information set as a characteristic information set to be vectorized; carrying out vectorization processing on each piece of characteristic information to be vectorized to obtain a historical characteristic vector group; constructing a historical feature vector matrix; determining each constructed historical feature vector matrix as a historical feature vector matrix group; classifying each historical feature vector matrix to obtain a vulnerability class information set; and constructing a vulnerability sample library. According to the method and the device, the calculated amount of vulnerability sample identification is reduced, and the accuracy of vulnerability classification is improved.

Description

Vulnerability sample library construction method, vulnerability identification device, vulnerability sample library construction equipment and vulnerability sample library medium
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a vulnerability sample library construction method, a vulnerability identification device, vulnerability sample library construction equipment and a vulnerability identification medium.
Background
Vulnerability sample recognition is a technology for classifying and recognizing vulnerability data. Currently, when classifying and identifying vulnerability data, the following methods are generally adopted: the function similarity detection technology aims at identifying code fragments with similarity or the same function by comparing function codes among different programs so as to find loopholes and categories corresponding to the loophole data.
However, when identifying the vulnerability samples in the above manner, there are often the following technical problems:
Firstly, comparing function codes among different programs faces the problems of large calculated amount and low accuracy in actual application, so that the accuracy of identifying a vulnerability sample is low.
Secondly, only code segments with similarity or the same function as the original vulnerability codes to be identified are identified, so that categories corresponding to vulnerabilities and vulnerability data are determined, however, in an actual process, part of vulnerabilities possibly have the code segments with the similarity or the same function but the categories of the vulnerabilities are different, and therefore the accuracy of vulnerability classification is low.
Disclosure of Invention
The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose a vulnerability sample library construction method for information security vulnerability identification, an information security vulnerability identification method, an apparatus, an electronic device, and a computer readable medium to solve one or more of the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide a vulnerability sample library construction method for information security vulnerability identification, the method comprising: determining a pre-stored information security vulnerability data set as a historical information security vulnerability data set, wherein each historical information security vulnerability data in the historical information security vulnerability data set comprises vulnerability information and a characteristic information set corresponding to the vulnerability information; for each historical information security vulnerability data in the historical information security vulnerability data set, performing the following steps: determining a characteristic information group included in the historical information security vulnerability data as a characteristic information group to be vectorized; carrying out vectorization processing on each piece of characteristic information to be vectorized in the characteristic information group to be vectorized to obtain a historical characteristic vector group; constructing a history feature vector matrix based on the history feature vector group; determining each constructed historical feature vector matrix as a historical feature vector matrix group; classifying each historical feature vector matrix in the historical feature vector matrix group to obtain a vulnerability category information group set; and constructing a vulnerability sample library based on the vulnerability category information set, wherein the vulnerability sample library is used for carrying out matching processing on the acquired real-time vulnerability characteristic information so as to identify a vulnerability sample corresponding to the vulnerability characteristic information.
In a second aspect, some embodiments of the present disclosure provide a method for identifying information security vulnerabilities, the method comprising: obtaining current information security vulnerability data, wherein the current information security vulnerability data comprises vulnerability information and a characteristic information group corresponding to the vulnerability information; constructing a current feature vector matrix based on the feature information set; and carrying out matching processing on the current feature vector matrix based on a pre-constructed vulnerability sample library to obtain vulnerability samples corresponding to the current feature vector matrix, wherein the vulnerability sample library is constructed by the method described in any implementation mode of the first aspect.
In a third aspect, some embodiments of the present disclosure provide a vulnerability sample library construction apparatus for information security vulnerability identification, where the apparatus includes: a first determining unit configured to determine a pre-stored information security vulnerability data set as a history information security vulnerability data set, wherein each history information security vulnerability data in the history information security vulnerability data set includes vulnerability information and a feature information set corresponding to the vulnerability information; an execution unit configured to execute, for each of the above-described historical information security vulnerability data sets, the steps of: determining a characteristic information group included in the historical information security vulnerability data as a characteristic information group to be vectorized; carrying out vectorization processing on each piece of characteristic information to be vectorized in the characteristic information group to be vectorized to obtain a historical characteristic vector group; constructing a history feature vector matrix based on the history feature vector group; a second determining unit configured to determine each of the constructed history feature vector matrices as a history feature vector matrix group; the classification unit is configured to classify each historical feature vector matrix in the historical feature vector matrix group to obtain a vulnerability class information group set; the construction unit is configured to construct a vulnerability sample library based on the vulnerability category information set, wherein the vulnerability sample library is used for carrying out matching processing on the acquired real-time vulnerability characteristic information so as to identify a vulnerability sample corresponding to the vulnerability characteristic information.
In a fourth aspect, some embodiments of the present disclosure provide an information security breach identification apparatus, the apparatus including: the system comprises an acquisition unit, a storage unit and a storage unit, wherein the acquisition unit is configured to acquire current information security vulnerability data, and the current information security vulnerability data comprises vulnerability information and a characteristic information group corresponding to the vulnerability information; a construction unit configured to construct a current feature vector matrix based on the feature information set; the matching unit is configured to perform matching processing on the current feature vector matrix based on a pre-constructed vulnerability sample library to obtain vulnerability samples corresponding to the current feature vector matrix, wherein the vulnerability sample library is constructed by the method described in any implementation manner of the first aspect.
In a fifth aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first or second aspects above.
In a sixth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements the method described in any of the implementations of the first or second aspects above.
The above embodiments of the present disclosure have the following advantages: according to the vulnerability sample library obtained by the vulnerability sample library construction method for information security vulnerability identification, which is disclosed by some embodiments of the invention, the information security vulnerability identification is improved. Specifically, the reason for the low accuracy of the vulnerability sample recognition is that: comparing function codes among different programs faces the problems of large calculated amount and low accuracy in practical application, so that the accuracy of identifying the vulnerability samples is low. Based on this, in some embodiments of the present disclosure, a vulnerability sample library construction method for identifying an information security vulnerability is first determined as a historical information security vulnerability data set, where each of the historical information security vulnerability data sets includes vulnerability information and a feature information set corresponding to the vulnerability information. Thus, the information security hole data which needs to be processed can be obtained. Then, for each of the above-described historical information security vulnerability data sets, the following steps are performed: first, a feature information set included in the above-mentioned historical information security vulnerability data is determined as a feature information set to be vectorized. Thus, the feature information group included in the information security hole data can be obtained. And then, carrying out vectorization processing on each piece of characteristic information to be vectorized in the characteristic information group to be vectorized to obtain a historical characteristic vector group. Thus, the feature information can be converted into a history feature vector. Then, a history feature vector matrix is constructed based on the history feature vector group. Thus, a history feature vector matrix composed of history feature vector groups can be obtained. Then, each constructed history feature vector matrix is determined as a history feature vector matrix group. Thus, the historical feature vector matrix corresponding to each piece of historical information security vulnerability data can be obtained. And secondly, classifying each historical feature vector matrix in the historical feature vector matrix group to obtain a vulnerability category information group set. Thus, vulnerability category information corresponding to each piece of historical information security vulnerability data can be obtained. And then, constructing a vulnerability sample library based on the vulnerability category information set, wherein the vulnerability sample library is used for carrying out matching processing on the acquired real-time vulnerability characteristic information so as to identify a vulnerability sample corresponding to the vulnerability characteristic information. Thus, a vulnerability sample library for information security vulnerability identification can be constructed. And because the vulnerability characteristic information can be classified and identified by the vulnerability sample library constructed by the historical information security vulnerability data set, the problem that the calculation amount is large in actual application due to comparing function codes among different programs is avoided. And because the vulnerability sample library is constructed based on the historical information security vulnerability data set, the relevance between the vulnerability sample library and the historical information security vulnerability data set is higher, so that the accuracy of identifying the vulnerability data is higher when the new vulnerability characteristic information is acquired.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a flow diagram of some embodiments of a vulnerability sample library construction method for information security vulnerability identification according to the present disclosure;
FIG. 2 is a flow chart of some embodiments of an information security breach identification method according to the present disclosure;
FIG. 3 is a schematic structural diagram of some embodiments of a vulnerability sample library construction apparatus for information security vulnerability identification according to the present disclosure;
FIG. 4 is a schematic diagram of the structure of some embodiments of an information security breach identification device according to the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 illustrates a flow 100 of some embodiments of a vulnerability sample library construction method for information security vulnerability identification according to the present disclosure. The method for constructing the vulnerability sample library for identifying the information security vulnerabilities comprises the following steps:
Step 101, determining a pre-stored information security hole data set as a historical information security hole data set.
In some embodiments, an executing body (e.g., computing device) of the vulnerability sample library construction method may determine a pre-stored information security vulnerability data set as a historical information security vulnerability data set. The information security hole data set may be data in a national database for information security hole management and control work. The historical information security hole data may be data for characterizing an information security hole. For example, the historical information security hole data may be, but is not limited to, financial information security hole data, power information security hole data, energy information security hole data, and telecom information security hole data. Each of the historical information security vulnerability data sets may include vulnerability information and a feature information set corresponding to the vulnerability information. The vulnerability information may be information described by the historical information security vulnerability data in a national vulnerability database. The set of characteristic information may be a set of characteristic information of data in a national vulnerability database. The set of feature information may include respective feature tag types, each feature information corresponding to a feature tag type. The feature tag types described above may include, but are not limited to: system labels and environment labels. The system tag may be a tag for characterizing that the feature information is "system". The above-described environment tag may be a tag for characterizing that the feature information is "environment". Specifically, when the above feature tag type is a system tag, it means that the feature information is feature information corresponding to "system". When the feature tag type is an environment tag, the feature information is the feature information corresponding to the system programming language environment. The execution subject may be a server that identifies or processes information security vulnerabilities.
Step 102, for each historical information security hole data in the historical information security hole data set, executing the following steps:
And step 1021, determining the characteristic information group included in the historical information security hole data as the characteristic information group to be vectorized.
In some embodiments, the executing entity may determine a feature information set included in the historical information security hole data as a feature information set to be vectorized.
Step 1022, vectorizing each feature information to be vectorized in the feature information set to be vectorized to obtain a historical feature vector set.
In some embodiments, the executing body may perform vectorization processing on each feature information to be vectorized in the feature information set to be vectorized to obtain a historical feature vector set. In practice, the execution body may perform vectorization processing on each feature information to be vectorized in the feature information set to be vectorized by adopting a preset encoding mode, so as to obtain a historical feature vector. The preset encoding mode may be, but is not limited to: single hot coding, tag coding, min-max normalization or Z-score normalization. The historical feature vector may be numerical data for machine learning model processing.
Step 1023, constructing a historical feature vector matrix based on the historical feature vector group.
In some embodiments, the executing entity may construct a historical feature vector matrix based on the set of historical feature vectors. The historical feature vector matrix may be a matrix for representing vulnerability information corresponding to the historical feature vector set. In practice, the execution body may construct each of the history feature vectors in the history feature vector group as a history feature vector matrix.
In some optional implementations of some embodiments, the executing entity may construct the historical feature vector matrix based on the set of historical feature vectors by:
and the first step is to classify the feature information groups to be vectorized according to the types of the feature tags to obtain the feature tag information groups. Wherein the feature tag information in each feature tag information group corresponds to the same feature tag type. The feature tag types described above may include, but are not limited to: system labels and environment labels. The system tag may be a tag for characterizing the characteristic information as a system. The above-mentioned environment tag may be a tag for characterizing the characteristic information as an environment. In practice, the executing body may divide each feature tag information with the same feature tag type corresponding to the feature information group to be vectorized into the same feature tag information group to obtain each feature tag information group.
And a second step of determining each obtained characteristic tag information group as a characteristic tag information group set.
Third, for each feature tag information group in the feature tag information group set, the following steps are performed:
And a first sub-step of determining the historical feature vector corresponding to each feature tag information in the feature tag information group as a feature vector to be added to obtain a feature vector group to be added.
And a second sub-step of adding the feature vector group to be added to a feature vector matrix to update the feature vector matrix. The feature vector matrix may be a null matrix, and the number of rows in the feature vector matrix may be the number of feature tag information groups in the feature tag information group set. The number of columns in the feature vector matrix may be the number of feature tag information of the feature tag information group having the largest feature tag information in the feature tag information group set. In practice, first, the execution body may sequentially add each feature vector to be added in the feature vector group to each row of the feature vector matrix in a top-to-bottom arrangement manner. The eigenvectors in the eigenvector matrix may be null values.
Fourth, the updated eigenvector matrix is determined as a historical eigenvector matrix.
And step 103, determining each constructed historical feature vector matrix as a historical feature vector matrix group.
In some embodiments, the executing entity may determine each constructed historical feature vector matrix as a set of historical feature vector matrices.
And 104, classifying each historical feature vector matrix in the historical feature vector matrix group to obtain a vulnerability category information group set.
In some embodiments, the executing body may perform classification processing on each of the historical feature vector matrices in the historical feature vector matrix set to obtain a vulnerability category information set. Each of the vulnerability category information groups in the vulnerability category information group set may include a respective historical feature vector matrix. In practice, first, the execution body may perform classification processing on each of the history feature vector matrices to obtain each vulnerability list information set. Then, each of the obtained vulnerability list information sets is determined as a vulnerability category information set.
In some optional implementations of some embodiments, the executing entity may classify each historical feature vector matrix in the historical feature vector matrix set to obtain the vulnerability category information set by:
the first step, based on the above-mentioned historical feature vector matrix group, executes the following first loop steps:
And a first sub-step, carrying out random dimension reduction processing on each historical feature vector matrix in the historical feature vector matrix group to obtain a dimension reduction feature vector matrix group. The dimension reduction feature vector matrix set may be a matrix obtained by randomly reducing dimensions of the history feature vector matrix. Each of the dimension reduction feature vector matrices in the dimension reduction feature vector matrix set may include selection information. The selection information may be a string for indicating whether the feature vector matrix for dimension reduction is selected. The selection information may be an empty string. In practice, the execution body may perform random dimension reduction processing on each historical feature vector matrix in the historical feature vector matrix set to obtain a dimension reduction feature vector matrix set. Specifically, the above random dimension reduction process may include, but is not limited to: principal component analysis, linear discriminant analysis and t-distribution neighborhood embedding algorithm.
A second sub-step, based on the dimension-reduction feature vector matrix set, of executing the following second cyclic step:
And step one, determining the dimension reduction feature vector matrix meeting the preset selection condition in the dimension reduction feature vector matrix group as a dimension reduction feature vector matrix to be compared. The preset selection conditions may be: the dimension-reducing feature vector matrix is any dimension-reducing feature vector matrix with the selection information of the character string being blank character string in the dimension-reducing feature vector matrix group.
And secondly, determining preset selection information as selection information. The preset selection information may be "selected".
And step three, adding the selection information into the dimension reduction feature vector matrix to be compared so as to update the dimension reduction feature vector matrix to be compared.
And step four, determining a preset number of dimension reduction feature vector matrixes meeting preset comparison conditions in the dimension reduction feature vector matrix set as a feature vector matrix set to be compared. The preset comparison condition may be that the dimension-reduction feature vector matrix is a dimension-reduction feature vector matrix different from the dimension-reduction feature vector matrix to be compared in the dimension-reduction feature vector matrix group. The preset number may be a difference between the number of the feature vector matrices of each dimension reduction included in the feature vector matrix group of dimension reduction and 1.
And fifthly, determining the distance between the feature vector matrix to be compared and the feature vector matrix to be compared as a comparison distance for each feature vector matrix to be compared in the feature vector matrix group to be compared.
And step six, sequencing the determined comparison distances to obtain a comparison distance sequence. The alignment distance sequence may be a sequence for characterizing the size of the alignment distance. In practice, the execution body may reorder the alignment distance sequences in an ascending order.
And step seven, taking the comparison distances meeting the preset ordering condition in the comparison distance sequence as a comparison distance group. The preset sorting condition may be that the comparison distance is smaller than a preset distance threshold. The preset distance threshold may be a preset value. The specific setting of the preset distance threshold is not limited here.
And eight, determining a category matrix group based on the comparison distance group. The category matrix set may be a set formed by each of the history feature vector matrices.
And step nine, in response to determining that each dimension reduction feature vector matrix in the dimension reduction feature vector matrix group does not meet a preset adding condition, executing the second circulation step again. The preset adding condition may be that the dimension-reduction feature vector matrix group has a dimension-reduction feature vector matrix without selection information.
And a sub-step ten of determining each determined category matrix group as a category information group set in response to determining that each dimension-reduction feature vector matrix in the dimension-reduction feature vector matrix group meets a preset addition condition.
A third sub-step of, in response to determining that each of the category matrix groups in the set of category information groups does not satisfy a preset number of conditions, clearing the set of category information groups, and performing the first looping step again. The preset number of conditions may be that the number of the historical feature vector matrices in each category matrix group is smaller than a preset number threshold. The predetermined number threshold may be a predetermined value. The specific setting of the above-described preset number threshold is not limited here.
And a fourth sub-step of performing a difference value process on each category matrix group in the category information group to obtain a category difference value group in response to determining that each category matrix group in the category information group meets a preset number condition. Wherein, each category difference value in the category difference value set may be a numerical value for representing a degree of difference of the historical feature vector matrix in each category matrix set in the category information set. In practice, in the first step, the execution body may execute the following steps for each category matrix group in the category information group set: first, the execution body may use a sum of the history feature vector matrices in the category matrix group as a history addition matrix. Then, the execution body may use a ratio of the history summation matrix to the number of the history feature vector matrices in the category matrix group as a history average matrix. Then, the execution body may use the difference between each of the history feature vector matrices and the history average matrix as each of the history difference matrices. Next, the execution subject may use Frobenius norms of the respective history difference matrices as respective difference norms. Finally, the execution subject may use the sum of the difference norms as the class difference value. In the second step, the execution body may determine each of the obtained class difference values as a class difference value group.
And a fifth sub-step of executing the first loop step again in response to determining that each of the class difference values in the class difference value group does not satisfy a preset threshold condition. The preset threshold condition may be that the class difference value is smaller than a preset difference threshold. The preset difference threshold may be a preset value, and the specific setting of the preset difference threshold is not limited here.
And a sixth sub-step of determining the set of category information sets as a vulnerability category information set in response to determining that each of the set of category difference values satisfies a preset threshold condition. Wherein, each vulnerability category information group in the vulnerability category information group set may be the category matrix group.
In some optional implementations of some embodiments, the executing entity may determine the category matrix set based on the aligned distance set by:
First, for each comparison distance in the comparison distance group, the following steps are performed:
and a first sub-step of determining the preset selection information as selection information.
And a second sub-step of adding the selection information to the feature vector matrix to be compared corresponding to the comparison distance so as to update the feature vector matrix to be compared.
And secondly, combining the updated feature vector matrix to be compared with the feature vector to be compared, which corresponds to the comparison distance group, to obtain a category matrix group. In practice, first, the execution body may combine the historical feature vector matrix corresponding to the dimension reduction feature vector to be compared corresponding to the comparison distance group with each historical feature vector matrix corresponding to each updated feature vector matrix to be compared, and use each obtained historical feature vector matrix as a category matrix group.
In some optional implementations of some embodiments, the executing entity may determine a distance between the dimension reduction feature vector matrix to be aligned and the feature vector matrix to be aligned as an alignment distance by:
The first step, the feature vector of each corresponding position in the dimension reduction feature vector matrix to be compared and the corresponding position of the feature vector matrix to be compared are respectively determined to be a first feature vector and a second feature vector. In practice, the execution body may extract the feature vectors at the corresponding positions of the feature vector matrix to be compared and the feature vector matrix to be compared as the first feature vector and the second feature vector. For example, the dimension-reduction feature vector matrix to be compared may be The feature vector matrix to be compared may beIn this case, the dimension-reduction feature vector matrix to be compared and the feature vector matrix to be compared may be the historical feature vector matrix formed in step 1023.Is thatThe vector of row 1 and column 1 of the matrix.Is thatThe vector of row 1 and column 1 of the matrix.Is thatLine 1 of the matrixVector of columns.Is thatLine 1 of the matrixVector of columns.Is thatMatrix NoRow 1 column vector. Is thatThe vector of row 1 and column 1 of the matrix.And (3) withCorresponding to each other, the first feature vector may beThe second feature vector may beAnd (3) withCorresponding to each other, the first feature vector may beThe second feature vector may be
A second step of executing the following steps for the first feature vector and the second feature vector corresponding to each of the respective corresponding positions:
A first sub-step of determining a difference between the first feature vector and the second feature vector as a difference feature vector.
And a second sub-step of generating a square value based on the difference feature vector. In practice, the execution body may determine the square of the difference feature vector as a square value.
And a third sub-step of generating a standard value based on the square value. In practice, the execution body may determine the square root of the square value as a standard value.
And thirdly, determining the sum of the obtained standard values as the comparison distance.
And 105, constructing a vulnerability sample library based on the vulnerability category information group set.
In some embodiments, the executing entity may construct the vulnerability sample library based on the vulnerability category information set. The vulnerability sample library is used for carrying out matching processing on the acquired real-time vulnerability characteristic information so as to identify vulnerability samples corresponding to the vulnerability characteristic information.
In some optional implementations of some embodiments, the executing entity may construct the vulnerability sample library based on the vulnerability category information set by:
The first step, for each vulnerability category information group in the vulnerability category information group set, executes the following steps:
And a first sub-step of generating a vulnerability average matrix corresponding to the vulnerability category information group based on the vulnerability category information group. In practice, first, the execution body may input each element in each dimension-reduction feature vector matrix in the class matrix group corresponding to the vulnerability class information group into a first preset formula, so as to obtain each first vector corresponding to each element. The first preset formula may be that . Wherein the first vector can be used asAnd (3) representing.Averaging the first in the matrix for vulnerabilitiesLine 1Vector of columns.Is the firstThe first dimension-reducing feature vector matrixLine 1Vector of columns.The number of the dimension reduction eigenvector matrices is contained in the category matrix group. The execution body may then combine the first vectors as rows and columns in the matrix into a vulnerability average matrix.
And a second sub-step of generating a vulnerability maximum matrix corresponding to the vulnerability category information group based on the vulnerability category information group. In practice, first, the execution body may input each element in each dimension-reduction feature vector matrix in the class matrix group corresponding to the vulnerability class information group into a second preset formula, so as to obtain each second vector corresponding to each element. The second preset formula may be that. Wherein the second vector can be usedAnd (3) representing.Is the first in the loophole maximum matrixLine 1Vector of columns. The execution body may then combine the respective second vectors as respective rows and columns in the matrix into a vulnerability maximum matrix.
And a third sub-step of generating a vulnerability minimum matrix corresponding to the vulnerability category information group based on the vulnerability category information group. In practice, first, the execution body may input each element in each dimension-reduction feature vector matrix in the class matrix group corresponding to the vulnerability class information group into a third preset formula, so as to obtain each third vector corresponding to each element. The third preset formula may be that. Wherein the third vector can be usedAnd (3) representing.Is the first in the vulnerability minimum matrixLine 1Vector of columns. The execution body may then combine the third vectors as rows and columns in the matrix into a vulnerability minimum matrix.
And a fourth sub-step of generating a vulnerability fluctuation value corresponding to the vulnerability category information group based on the vulnerability minimum matrix and the vulnerability maximum matrix. The vulnerability fluctuation value may be a matrix for representing the fluctuation degree of the vulnerability category information group. In practice, first, the execution body may input each element in the vulnerability minimum matrix and each element in the vulnerability maximum matrix into a fourth preset formula to obtain each fourth vector corresponding to each element. The fourth preset formula may be that. Wherein the fourth vector can be usedAnd (3) representing.Is the first in the vulnerability fluctuation valueLine 1Vector of columns. Then, the execution body may combine each fourth vector as each row and each column in the matrix into the vulnerability fluctuation value.
And a fifth sub-step of generating a commonality feature corresponding to the vulnerability category information group based on the vulnerability average matrix and the vulnerability fluctuation value. The common characteristic can be a matrix for representing common characteristics of the vulnerability category information group. In practice, first, the execution body may input each element in the vulnerability average matrix and the vulnerability fluctuation value into a fifth preset formula to obtain each fifth vector corresponding to each element. The fifth preset formula may be that. Wherein the fifth vector can be usedAnd (3) representing.Is the first in the vulnerability minimum matrixLine 1Vector of columns.Is the first in the loophole average matrixLine 1Vector of columns.Is the first of the vulnerability fluctuation valuesLine 1Vector of columns. The execution body may then combine the fifth vectors as common features for the rows and columns in the matrix.
And a sixth sub-step of combining the average vulnerability matrix, the maximum vulnerability matrix, the minimum vulnerability matrix, the vulnerability fluctuation value and the commonality feature to obtain vulnerability sample information. In practice, the execution body may combine the average vulnerability matrix, the maximum vulnerability matrix, the minimum vulnerability matrix, the vulnerability fluctuation value, and the commonality feature, so as to obtain vulnerability sample information.
And a seventh sub-step of determining the historical information security vulnerability data corresponding to each vulnerability category information in the vulnerability category information group as vulnerability sample data to obtain a vulnerability sample data group. In practice, first, the executing body may extract, from the vulnerability category information group, historical information security vulnerability data corresponding to each vulnerability category information. Then, the executing body may use the obtained security vulnerability data of each history information as a vulnerability sample data set.
And an eighth sub-step of determining the vulnerability sample information and the vulnerability sample data set as vulnerability samples.
And secondly, constructing a vulnerability sample library based on the determined vulnerability samples. In practice, the execution body may store each obtained vulnerability sample into the database, thereby obtaining a vulnerability sample library.
The loophole sample library construction scheme is used as an invention point of the embodiment of the disclosure, and solves the problem that the accuracy of the loophole classification is lower because the classification of the loopholes is different in the actual process although part of loopholes have the code fragments with the similarity or the same function in the actual process only by identifying the code fragments with the similarity or the same function of the original loophole code and the loophole code to be identified, so that the categories corresponding to the loopholes and the loophole data are determined. "technical problem. If the factors are solved, the effect of improving the accuracy of vulnerability classification can be achieved. To achieve this effect. The present disclosure builds a vulnerability sample library by: first, for each vulnerability category information group in the vulnerability category information group set, the following steps are performed: and generating a vulnerability average matrix corresponding to the vulnerability category information group based on the vulnerability category information group. And then, generating a vulnerability maximum matrix corresponding to the vulnerability category information group based on the vulnerability category information group. And then, generating a vulnerability minimum matrix corresponding to the vulnerability category information group based on the vulnerability category information group. And generating a vulnerability fluctuation value corresponding to the vulnerability category information group based on the vulnerability minimum matrix and the vulnerability maximum matrix. And then, generating the commonality characteristic corresponding to the vulnerability category information group based on the vulnerability average matrix and the vulnerability fluctuation value. Thus, vulnerability sample information corresponding to the vulnerability category information group can be obtained. And then, combining the average vulnerability matrix, the maximum vulnerability matrix, the minimum vulnerability matrix, the vulnerability fluctuation value and the commonality characteristic to obtain vulnerability sample information. Secondly, determining historical information security vulnerability data corresponding to each vulnerability category information in the vulnerability category information group as vulnerability sample data to obtain a vulnerability sample data group. Then, the vulnerability sample information and the vulnerability sample data set are determined as vulnerability samples. Thus, the vulnerability sample information and the vulnerability sample data set can be used as vulnerability samples. And then, constructing a vulnerability sample library based on the determined vulnerability samples. Thus, each obtained vulnerability sample can be constructed as a vulnerability sample library. Because the vulnerability sample library is constructed based on the historical information security vulnerability data set, the categories corresponding to the vulnerabilities and the vulnerability data are determined independently of the code segments with similarity or the same function, the obtained vulnerability sample library is high in reliability, and therefore the accuracy of vulnerability classification can be improved.
The above embodiments of the present disclosure have the following advantages: according to the vulnerability sample library obtained by the vulnerability sample library construction method for information security vulnerability identification, which is disclosed by some embodiments of the invention, the information security vulnerability identification is improved. Specifically, the reason for the low accuracy of the vulnerability sample recognition is that: comparing function codes among different programs faces the problems of large calculated amount and low accuracy in practical application, so that the accuracy of identifying the vulnerability samples is low. Based on this, in some embodiments of the present disclosure, a vulnerability sample library construction method for identifying an information security vulnerability is first determined as a historical information security vulnerability data set, where each of the historical information security vulnerability data sets includes vulnerability information and a feature information set corresponding to the vulnerability information. Thus, the information security hole data which needs to be processed can be obtained. Then, for each of the above-described historical information security vulnerability data sets, the following steps are performed: first, a feature information set included in the above-mentioned historical information security vulnerability data is determined as a feature information set to be vectorized. Thus, the feature information group included in the information security hole data can be obtained. And then, carrying out vectorization processing on each piece of characteristic information to be vectorized in the characteristic information group to be vectorized to obtain a historical characteristic vector group. Thus, the feature information can be converted into a history feature vector. Then, a history feature vector matrix is constructed based on the history feature vector group. Thus, a history feature vector matrix composed of history feature vector groups can be obtained. Then, each constructed history feature vector matrix is determined as a history feature vector matrix group. Thus, the historical feature vector matrix corresponding to each piece of historical information security vulnerability data can be obtained. And secondly, classifying each historical feature vector matrix in the historical feature vector matrix group to obtain a vulnerability category information group set. Thus, vulnerability category information corresponding to each piece of historical information security vulnerability data can be obtained. And then, constructing a vulnerability sample library based on the vulnerability category information set, wherein the vulnerability sample library is used for carrying out matching processing on the acquired real-time vulnerability characteristic information so as to identify a vulnerability sample corresponding to the vulnerability characteristic information. Thus, a vulnerability sample library for information security vulnerability identification can be constructed. And because the vulnerability characteristic information can be classified and identified by the vulnerability sample library constructed by the historical information security vulnerability data set, the problem that the calculation amount is large in actual application due to comparing function codes among different programs is avoided. And because the vulnerability sample library is constructed based on the historical information security vulnerability data set, the relevance between the vulnerability sample library and the historical information security vulnerability data set is higher, so that the accuracy of identifying the vulnerability data is higher when the new vulnerability characteristic information is acquired.
With continued reference to fig. 2, a flow 200 of some embodiments of an information security breach identification method according to the present disclosure is shown. The information security vulnerability identification method comprises the following steps:
Step 201, current information security hole data is obtained.
In some embodiments, an executing body (e.g., computing device) of the information security breach identification method may obtain current information security breach data. The current information security vulnerability data may be data provided for a target user. The current information security vulnerability data may include vulnerability information and a feature information set corresponding to the vulnerability information.
Step 202, constructing a current feature vector matrix based on the feature information set.
In some embodiments, the executing entity may construct the current feature vector matrix based on the feature information set. The manner of constructing the current feature vector matrix may refer to the specific implementation of step 1023 in fig. 1, which is not described herein.
And 203, carrying out matching processing on the current feature vector matrix based on a pre-constructed vulnerability sample library to obtain a vulnerability sample corresponding to the current feature vector matrix.
In some embodiments, the execution body may perform matching processing on the current feature vector matrix based on a pre-constructed vulnerability sample library, so as to obtain a vulnerability sample corresponding to the current feature vector matrix. Wherein the vulnerability sample library is constructed through the steps in the corresponding embodiments in fig. 1. In practice, first, the execution body may generate, as a similarity comparison result, a similarity between the current feature vector matrix and vulnerability sample information included in each vulnerability sample in the vulnerability sample library. Specifically, the similarity may be cosine similarity. Then, the execution body may select, from the similarity comparison results, a vulnerability sample corresponding to the similarity comparison result with the highest similarity as a vulnerability sample corresponding to the current feature vector matrix.
The above embodiments of the present disclosure have the following advantages: according to the vulnerability sample corresponding to the information security vulnerability obtained by the information security vulnerability identification method of some embodiments of the present disclosure, the information security vulnerability identification is improved. Specifically, the reason for the low accuracy of the vulnerability sample recognition is that: comparing function codes among different programs faces the problems of large calculated amount and low accuracy in practical application, so that the accuracy of identifying the vulnerability samples is low. Based on this, in the information security vulnerability identification method of some embodiments of the present disclosure, first, current information security vulnerability data is obtained, where the current information security vulnerability data includes vulnerability information and a feature information set corresponding to the vulnerability information. Thus, information security hole data can be obtained. Then, a current feature vector matrix is constructed based on the feature information set. Therefore, the current eigenvector matrix corresponding to the current information security vulnerability data can be obtained. And then, carrying out matching processing on the current feature vector matrix based on a pre-constructed vulnerability sample library to obtain vulnerability samples corresponding to the current feature vector matrix, wherein the vulnerability sample library is constructed by the method described in any implementation mode of the first aspect. Thus, the vulnerability sample corresponding to the current information security vulnerability data can be obtained. And because the classification and identification of the information security vulnerabilities are carried out through a pre-constructed vulnerability sample library, the accuracy of vulnerability sample identification is improved.
With further reference to fig. 3, as an implementation of the method shown in the foregoing figures, the present disclosure provides some embodiments of a vulnerability sample library construction apparatus for information security vulnerability identification, where the apparatus embodiments correspond to those shown in fig. 1, and the apparatus is particularly applicable to various electronic devices.
As shown in fig. 3, a vulnerability sample library construction apparatus 300 for information security vulnerability identification of some embodiments includes: a first determining unit 301, an executing unit 302, a second determining unit 303, a classifying unit 304, and a constructing unit 305. Wherein the first determining unit 301 is configured to determine a pre-stored information security vulnerability data set as a history information security vulnerability data set, where each history information security vulnerability data in the history information security vulnerability data set includes vulnerability information and a feature information set corresponding to the vulnerability information; the execution unit 302 is configured to perform the following steps for each of the above described sets of historical information security vulnerabilities: determining a characteristic information group included in the historical information security vulnerability data as a characteristic information group to be vectorized; carrying out vectorization processing on each piece of characteristic information to be vectorized in the characteristic information group to be vectorized to obtain a historical characteristic vector group; constructing a history feature vector matrix based on the history feature vector group; the second determining unit 303 is configured to determine each constructed history feature vector matrix as a history feature vector matrix group; the classifying unit 304 is configured to classify each historical feature vector matrix in the historical feature vector matrix set to obtain a vulnerability category information set; the construction unit 305 is configured to construct a vulnerability sample library based on the vulnerability category information set, where the vulnerability sample library is used to perform matching processing on the acquired real-time vulnerability feature information to identify a vulnerability sample corresponding to the vulnerability feature information.
It will be appreciated that the elements described in the apparatus 300 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 300 and the units contained therein, and are not described in detail herein.
With further reference to fig. 4, as an implementation of the method shown in the foregoing figures, the present disclosure provides some embodiments of an information security breach identification apparatus, which correspond to those method embodiments shown in fig. 2, and which are particularly applicable to various electronic devices.
As shown in fig. 4, the information security breach identification apparatus 400 of some embodiments includes: an acquisition unit 401, a construction unit 402, and a matching unit 403. The obtaining unit 401 is configured to obtain current information security vulnerability data, where the current information security vulnerability data includes vulnerability information and a feature information set corresponding to the vulnerability information; the construction unit 402 is configured to construct a current feature vector matrix based on the above-described feature information set; the matching unit 403 is configured to perform matching processing on the current feature vector matrix based on a pre-constructed vulnerability sample library, so as to obtain vulnerability samples corresponding to the current feature vector matrix, where the vulnerability sample library is constructed through steps in the embodiments corresponding to fig. 1.
It will be appreciated that the elements described in the apparatus 400 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 400 and the units contained therein, and are not described in detail herein.
Referring now to fig. 5, a schematic diagram of an electronic device 500 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 5 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 5 may represent one device or a plurality of devices as needed.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communications device 509, or from the storage device 508, or from the ROM 502. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 501.
It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a pre-stored information security vulnerability data set as a historical information security vulnerability data set, wherein each historical information security vulnerability data in the historical information security vulnerability data set comprises vulnerability information and a characteristic information set corresponding to the vulnerability information; for each historical information security vulnerability data in the historical information security vulnerability data set, performing the following steps: determining a characteristic information group included in the historical information security vulnerability data as a characteristic information group to be vectorized; carrying out vectorization processing on each piece of characteristic information to be vectorized in the characteristic information group to be vectorized to obtain a historical characteristic vector group; constructing a history feature vector matrix based on the history feature vector group; determining each constructed historical feature vector matrix as a historical feature vector matrix group; classifying each historical feature vector matrix in the historical feature vector matrix group to obtain a vulnerability category information group set; and constructing a vulnerability sample library based on the vulnerability category information set, wherein the vulnerability sample library is used for carrying out matching processing on the acquired real-time vulnerability characteristic information so as to identify a vulnerability sample corresponding to the vulnerability characteristic information.
Or cause the electronic device to: obtaining current information security vulnerability data, wherein the current information security vulnerability data comprises vulnerability information and a characteristic information group corresponding to the vulnerability information; constructing a current feature vector matrix based on the feature information set; and carrying out matching processing on the current feature vector matrix based on a pre-constructed vulnerability sample library to obtain vulnerability samples corresponding to the current feature vector matrix, wherein the vulnerability sample library is constructed through the steps in the embodiments corresponding to fig. 1.
Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a first determination unit, an execution unit, a second determination unit, a classification unit, and a construction unit. The names of these units do not constitute a limitation of the unit itself in some cases, and for example, the first determination unit may also be described as "a unit that determines a pre-stored information security breach data set as a history information security breach data set".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims (9)

1. A loophole sample library construction method for information security loophole identification comprises the following steps:
Determining a pre-stored information security vulnerability data set as a historical information security vulnerability data set, wherein each historical information security vulnerability data set comprises vulnerability information and a characteristic information set corresponding to the vulnerability information;
For each historical information security vulnerability data in the set of historical information security vulnerability data, performing the steps of:
Determining a characteristic information group included in the historical information security vulnerability data as a characteristic information group to be vectorized;
carrying out vectorization processing on each piece of feature information to be vectorized in the feature information group to be vectorized to obtain a historical feature vector group;
constructing a historical feature vector matrix based on the historical feature vector group;
determining each constructed historical feature vector matrix as a historical feature vector matrix group;
Classifying each historical feature vector matrix in the historical feature vector matrix group to obtain a vulnerability category information group set, wherein classifying each historical feature vector matrix in the historical feature vector matrix group to obtain the vulnerability category information group set comprises the following steps: based on the historical feature vector matrix set, performing the following first loop step: performing random dimension reduction processing on each historical feature vector matrix in the historical feature vector matrix group to obtain a dimension reduction feature vector matrix group; based on the dimension-reduction feature vector matrix group, the following second loop step is executed: determining a dimension reduction feature vector matrix meeting a preset selection condition in the dimension reduction feature vector matrix group as a dimension reduction feature vector matrix to be compared; determining preset selection information as selection information; adding the selection information to the dimension-reduction feature vector matrix to be compared so as to update the dimension-reduction feature vector matrix to be compared; determining a preset number of dimension reduction feature vector matrixes meeting preset comparison conditions in the dimension reduction feature vector matrix group as a feature vector matrix group to be compared; for each feature vector matrix to be compared in the feature vector matrix group to be compared, determining the distance between the feature vector matrix to be compared and the dimension reduction feature vector matrix to be compared as a comparison distance; sequencing the determined comparison distances to obtain a comparison distance sequence; taking the comparison distances meeting the preset ordering condition in the comparison distance sequence as a comparison distance group; determining a category matrix set based on the comparison distance set; in response to determining that each dimension reduction feature vector matrix in the dimension reduction feature vector matrix set does not meet a preset addition condition, executing the second looping step again; in response to determining that each dimension reduction feature vector matrix in the dimension reduction feature vector matrix set meets a preset addition condition, determining each determined category matrix set as a category information set; in response to determining that each of the set of category matrix groups does not meet a preset number of conditions, emptying the set of category information groups and performing the first looping step again; responding to the fact that each category matrix group in the category information group meets the preset quantity condition, and performing difference value processing on each category matrix group in the category information group to obtain a category difference value group; in response to determining that each of the set of class difference values does not meet a preset threshold condition, performing the first looping step again; in response to determining that each of the class difference values in the class difference value set meets a preset threshold condition, determining the class information set as a vulnerability class information set;
And constructing a vulnerability sample library based on the vulnerability category information set, wherein the vulnerability sample library is used for carrying out matching processing on the acquired real-time vulnerability characteristic information so as to identify a vulnerability sample corresponding to the vulnerability characteristic information.
2. The method of claim 1, wherein the constructing a historical feature vector matrix based on the set of historical feature vectors comprises:
classifying the feature information groups to be vectorized according to each feature tag type to obtain each feature tag information group, wherein feature tag information in each feature tag information group corresponds to the same feature tag type;
determining each obtained characteristic tag information group as a characteristic tag information group set;
The following steps are performed for each feature tag information group in the feature tag information group set:
determining a history feature vector corresponding to each feature tag information in the feature tag information group as a feature vector to be added to obtain a feature vector group to be added;
Adding the feature vector group to be added to a feature vector matrix to update the feature vector matrix;
the updated feature vector matrix is determined as a historical feature vector matrix.
3. The method of claim 1, wherein the determining a set of category matrices based on the set of alignment distances comprises:
For each alignment distance in the alignment distance group, performing the steps of:
Determining the preset selection information as selection information;
Adding the selection information to a feature vector matrix to be compared corresponding to the comparison distance so as to update the feature vector matrix to be compared;
and combining the updated feature vector matrixes to be compared with the feature vectors to be compared, which correspond to the comparison distance group, to obtain a category matrix group.
4. The method of claim 1, wherein the determining, for each feature vector matrix to be aligned in the set of feature vector matrices to be aligned, a distance between the feature vector matrix to be aligned and the feature vector matrix to be aligned as an alignment distance comprises:
Respectively determining the feature vector of each corresponding position in each corresponding position of the feature vector matrix to be compared and the feature vector matrix to be compared as a first feature vector and a second feature vector;
the following steps are performed for the first feature vector and the second feature vector corresponding to each of the respective corresponding positions:
determining a difference value between the first feature vector and the second feature vector as a difference feature vector;
generating a square value based on the difference feature vector;
Generating a standard value based on the square value;
the sum of the obtained individual standard values is determined as the alignment distance.
5. An information security vulnerability identification method, comprising:
Obtaining current information security vulnerability data, wherein the current information security vulnerability data comprises vulnerability information and a characteristic information group corresponding to the vulnerability information;
Constructing a current feature vector matrix based on the feature information set;
Based on a pre-constructed vulnerability sample library, carrying out matching processing on the current feature vector matrix to obtain a vulnerability sample corresponding to the current feature vector matrix, wherein the vulnerability sample library is constructed by the method according to one of claims 1-4.
6. A vulnerability sample library construction apparatus for information security vulnerability identification, comprising:
A first determining unit configured to determine a pre-stored information security vulnerability data set as a history information security vulnerability data set, wherein each history information security vulnerability data in the history information security vulnerability data set includes vulnerability information and a feature information set corresponding to the vulnerability information;
An execution unit configured to, for each historical information security vulnerability data in the set of historical information security vulnerability data, perform the steps of: determining a characteristic information group included in the historical information security vulnerability data as a characteristic information group to be vectorized; carrying out vectorization processing on each piece of feature information to be vectorized in the feature information group to be vectorized to obtain a historical feature vector group; constructing a historical feature vector matrix based on the historical feature vector group;
A second determining unit configured to determine each of the constructed history feature vector matrices as a history feature vector matrix group;
The classification unit is configured to perform classification processing on each historical feature vector matrix in the historical feature vector matrix group to obtain a vulnerability class information group set, where the classification processing on each historical feature vector matrix in the historical feature vector matrix group to obtain the vulnerability class information group set includes: based on the historical feature vector matrix set, performing the following first loop step: performing random dimension reduction processing on each historical feature vector matrix in the historical feature vector matrix group to obtain a dimension reduction feature vector matrix group; based on the dimension-reduction feature vector matrix group, the following second loop step is executed: determining a dimension reduction feature vector matrix meeting a preset selection condition in the dimension reduction feature vector matrix group as a dimension reduction feature vector matrix to be compared; determining preset selection information as selection information; adding the selection information to the dimension-reduction feature vector matrix to be compared so as to update the dimension-reduction feature vector matrix to be compared; determining a preset number of dimension reduction feature vector matrixes meeting preset comparison conditions in the dimension reduction feature vector matrix group as a feature vector matrix group to be compared; for each feature vector matrix to be compared in the feature vector matrix group to be compared, determining the distance between the feature vector matrix to be compared and the dimension reduction feature vector matrix to be compared as a comparison distance; sequencing the determined comparison distances to obtain a comparison distance sequence; taking the comparison distances meeting the preset ordering condition in the comparison distance sequence as a comparison distance group; determining a category matrix set based on the comparison distance set; in response to determining that each dimension reduction feature vector matrix in the dimension reduction feature vector matrix set does not meet a preset addition condition, executing the second looping step again; in response to determining that each dimension reduction feature vector matrix in the dimension reduction feature vector matrix set meets a preset addition condition, determining each determined category matrix set as a category information set; in response to determining that each of the set of category matrix groups does not meet a preset number of conditions, emptying the set of category information groups and performing the first looping step again; responding to the fact that each category matrix group in the category information group meets the preset quantity condition, and performing difference value processing on each category matrix group in the category information group to obtain a category difference value group; in response to determining that each of the set of class difference values does not meet a preset threshold condition, performing the first looping step again; in response to determining that each of the class difference values in the class difference value set meets a preset threshold condition, determining the class information set as a vulnerability class information set;
the construction unit is configured to construct a vulnerability sample library based on the vulnerability category information set, wherein the vulnerability sample library is used for carrying out matching processing on the acquired real-time vulnerability characteristic information so as to identify a vulnerability sample corresponding to the vulnerability characteristic information.
7. An information security breach identification device, comprising:
The system comprises an acquisition unit, a storage unit and a storage unit, wherein the acquisition unit is configured to acquire current information security vulnerability data, and the current information security vulnerability data comprises vulnerability information and a characteristic information group corresponding to the vulnerability information;
A construction unit configured to construct a current feature vector matrix based on the feature information group;
the matching unit is configured to perform matching processing on the current feature vector matrix based on a pre-constructed vulnerability sample library to obtain vulnerability samples corresponding to the current feature vector matrix, wherein the vulnerability sample library is constructed by the method according to one of claims 1-4.
8. An electronic device, comprising:
One or more processors;
A storage device having one or more programs stored thereon;
When executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 4 or 5.
9. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1 to 4 or 5.
CN202410780908.2A 2024-06-18 2024-06-18 Vulnerability sample library construction method, vulnerability identification device, vulnerability sample library construction equipment and vulnerability sample library medium Active CN118349895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410780908.2A CN118349895B (en) 2024-06-18 2024-06-18 Vulnerability sample library construction method, vulnerability identification device, vulnerability sample library construction equipment and vulnerability sample library medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410780908.2A CN118349895B (en) 2024-06-18 2024-06-18 Vulnerability sample library construction method, vulnerability identification device, vulnerability sample library construction equipment and vulnerability sample library medium

Publications (2)

Publication Number Publication Date
CN118349895A CN118349895A (en) 2024-07-16
CN118349895B true CN118349895B (en) 2024-09-13

Family

ID=91821115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410780908.2A Active CN118349895B (en) 2024-06-18 2024-06-18 Vulnerability sample library construction method, vulnerability identification device, vulnerability sample library construction equipment and vulnerability sample library medium

Country Status (1)

Country Link
CN (1) CN118349895B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388551A (en) * 2017-08-07 2019-02-26 北京京东尚科信息技术有限公司 There are the method for loophole probability, leak detection method, relevant apparatus for prediction code
CN117034159A (en) * 2022-06-14 2023-11-10 腾讯科技(深圳)有限公司 Abnormal data identification method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118134581A (en) * 2022-12-02 2024-06-04 北京京东尚科信息技术有限公司 Article identification information generation method, apparatus, device, medium and program product
CN117056940B (en) * 2023-10-12 2024-01-16 中关村科学城城市大脑股份有限公司 Method, device, electronic equipment and medium for repairing loopholes of server system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388551A (en) * 2017-08-07 2019-02-26 北京京东尚科信息技术有限公司 There are the method for loophole probability, leak detection method, relevant apparatus for prediction code
CN117034159A (en) * 2022-06-14 2023-11-10 腾讯科技(深圳)有限公司 Abnormal data identification method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN118349895A (en) 2024-07-16

Similar Documents

Publication Publication Date Title
CN108985066B (en) Intelligent contract security vulnerability detection method, device, terminal and storage medium
CN115082920B (en) Deep learning model training method, image processing method and device
CN114494784A (en) Deep learning model training method, image processing method and object recognition method
CN108229680B (en) Neural network system, remote sensing image recognition method, device, equipment and medium
CN114780338A (en) Host information processing method and device, electronic equipment and computer readable medium
CN115690443A (en) Feature extraction model training method, image classification method and related device
CN113190730B (en) Block chain address classification method and device
CN115359308A (en) Model training method, apparatus, device, storage medium, and program for identifying difficult cases
CN114511756A (en) Attack method and device based on genetic algorithm and computer program product
CN112906652A (en) Face image recognition method and device, electronic equipment and storage medium
CN113452700A (en) Method, device, equipment and storage medium for processing safety information
CN118349895B (en) Vulnerability sample library construction method, vulnerability identification device, vulnerability sample library construction equipment and vulnerability sample library medium
CN116881027A (en) Atomic service combination communication method, device, electronic equipment and medium
CN112035334A (en) Abnormal equipment detection method and device, storage medium and electronic equipment
CN115273148A (en) Pedestrian re-recognition model training method and device, electronic equipment and storage medium
CN111046892A (en) Abnormity identification method and device
CN114707638A (en) Model training method, model training device, object recognition method, object recognition device, object recognition medium and product
CN112101390B (en) Attribute information determining method, attribute information determining device and electronic equipment
CN113343047A (en) Data processing method, data retrieval method and device
CN113239687A (en) Data processing method and device
CN116501993B (en) House source data recommendation method and device
CN117522614B (en) Data processing method and device, electronic equipment and storage medium
CN115018009B (en) Object description method, and network model training method and device
CN113591983B (en) Image recognition method and device
CN118569951A (en) Order processing method, order processing device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant