WO2024021874A1 - 漏洞分析方法、装置、设备及计算机可读存储介质 - Google Patents

漏洞分析方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2024021874A1
WO2024021874A1 PCT/CN2023/098487 CN2023098487W WO2024021874A1 WO 2024021874 A1 WO2024021874 A1 WO 2024021874A1 CN 2023098487 W CN2023098487 W CN 2023098487W WO 2024021874 A1 WO2024021874 A1 WO 2024021874A1
Authority
WO
WIPO (PCT)
Prior art keywords
software
packages
vulnerability
software packages
information
Prior art date
Application number
PCT/CN2023/098487
Other languages
English (en)
French (fr)
Inventor
李琳
梁广泰
单丙杰
朱留川
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202211055558.0A external-priority patent/CN117521069A/zh
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024021874A1 publication Critical patent/WO2024021874A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities

Definitions

  • This application relates to the field of computer technology, and in particular to vulnerability analysis methods, devices, equipment and computer-readable storage media.
  • Vulnerabilities refer to security flaws in a computer system.
  • the existence of vulnerabilities will affect the security of the computer system. Therefore, the discovered vulnerabilities need to be repaired to reduce the risks caused by the vulnerabilities.
  • a classification model that can determine the correspondence between vulnerabilities and software packages is trained, and then the classification model is used to determine the correspondence between vulnerabilities and software packages to be analyzed. Packages affected by this vulnerability.
  • the trained classification model is only suitable for determining the software affected by the vulnerabilities included in the training set. It has poor versatility and low accuracy of analysis.
  • This application provides a vulnerability analysis method, device, equipment and computer-readable storage medium to solve the problems provided by related technologies.
  • the technical solutions are as follows:
  • the first aspect provides a vulnerability analysis method.
  • the method includes: the analysis device obtains the software environment information of the vulnerability, and the software environment information is used to describe the software affected by the vulnerability; the analysis device searches to obtain n candidate software packages based on the software environment information. n is an integer greater than or equal to 1; the analysis device performs homologous expansion on n candidate software packages and obtains multiple recommended software packages. The number of multiple recommended software packages is greater than n. The recommended software packages are used to assist in labeling objects to determine the impact of vulnerabilities. scope.
  • candidate software packages are obtained based on software environment information search. For the situation where the corresponding vulnerability is not included in the training set in related technologies, the corresponding candidate software package that will be affected can also be searched. , high versatility. Moreover, after determining the candidate software packages, the candidate software packages will be extended with the same origin to obtain a larger number of recommended software packages and improve the recall rate of vulnerability analysis.
  • obtaining the software environment information of the vulnerability includes: obtaining the vulnerability description information of the vulnerability; identifying the software entity in the vulnerability description information; extracting the context of the software entity from the vulnerability description information, based on the extracted context Get software environment information.
  • searching based on software environment information to obtain n candidate software packages includes: searching in a database including description information of multiple software packages based on software environment information to obtain m initial vulnerability-related packages.
  • Software package, m is an integer greater than n; n initial software packages are selected from m initial software packages based on the software environment information as n candidate software packages. Filter the initial software packages obtained by the search to make the candidate software packages obtained by screening more accurate.
  • selecting n initial software packages from m initial software packages based on software environment information includes: based on the software description information and software environment information of each initial software package, performing Sort to obtain the sorting result.
  • the order of any initial software package in the sorting result is used to indicate the correlation between any initial software package and the vulnerability; according to the sorting result, n initial software packages are selected from the m initial software packages.
  • the initial software packages are screened according to the correlation between the initial software package and the vulnerability.
  • the candidate software packages obtained through screening are all software packages that are highly correlated with the vulnerability and will be affected by the vulnerability, which improves the accuracy of the search.
  • the method further includes: obtaining a software package screening model; selecting n initial software packages from m initial software packages based on the software environment information, including: calling the software package screening model to select m initial software packages based on the software environment information. Select n initial software packages from the initial software packages.
  • the initial software package screening is implemented through the software package screening model, making the operation more convenient.
  • homologous extension is performed on n candidate software packages to obtain multiple recommended software packages, including: clustering multiple software packages based on the description information of the software packages in the database to obtain at least one Target software cluster, the target software cluster is a software cluster that includes candidate software packages, and the target software cluster also includes software packages that are homologous to the candidate software packages; combine at least one candidate software package in the target software cluster and the homologous software package that meets the conditions package as a recommended package.
  • a vulnerability affects a candidate software package, there is also a probability that software packages with the same origin as the candidate software package will be affected by the vulnerability.
  • the range of recommended software packages affected by the vulnerability is further expanded, ensuring the recall of vulnerability analysis. Rate.
  • clustering multiple software packages based on the description information of the software packages in the database to obtain at least one target software cluster includes: obtaining multiple software packages based on the description information of the software packages in the database. Description vectors of each software package in; cluster multiple software packages based on the description vectors of each software package to obtain multiple initial software clusters; calculate the distance between software packages included in each initial software cluster in multiple initial software clusters The code similarity; based on the code similarity between the software packages included in any initial software cluster, the software packages included in any initial software cluster are screened to obtain the candidate software cluster; the candidate software cluster including the candidate software package is used as the target Software cluster.
  • the initial software clusters will also be screened to ensure that software packages with high code similarity are located in the same candidate software cluster. Subsequently, software packages with high code similarity with the candidate software packages will be determined. Improved accuracy of same-origin extensions for recommended packages affected by vulnerabilities.
  • the method further includes: determining a target software package that matches the annotation object from the multiple recommended software packages; and determining the software version affected by the vulnerability in the target software package.
  • the software version affected by the vulnerability in the target software package is further determined, and the impact scope of the determined vulnerability is more detailed and accurate.
  • determining a target software package that matches the annotation object from multiple recommended software packages includes: sending information about multiple recommended software packages to a terminal, and the terminal is used to display the information about multiple recommended software packages. , and return the information of the target software package that matches the annotation object; receive the information of the target software package sent by the terminal.
  • the recommended software package that matches the annotation object is determined as the target software package, and the interactive experience is high.
  • determining the software version affected by the vulnerability in the target software package includes: obtaining the version information of the target software package from an information library including the version information of multiple software packages based on the information of the target software package; The terminal sends the version information of the target software package and receives the software version returned by the terminal.
  • a vulnerability analysis device is provided.
  • the device is applied to analysis equipment.
  • the device includes: an acquisition module for acquiring software environment information of the vulnerability, and the software environment information is used to describe the software affected by the vulnerability; a search module for Based on the software environment information search, n candidate software packages are obtained, n is an integer greater than or equal to 1; the extension module is used to perform homologous expansion of n candidate software packages, and obtain multiple recommended software packages. If the number is greater than n, the recommended software package is used to assist in labeling objects to determine the impact scope of the vulnerability.
  • the acquisition module is used to obtain the vulnerability description information of the vulnerability; identify the software entity in the vulnerability description information; extract the context of the software entity from the vulnerability description information, and obtain the software environment information based on the extracted context. .
  • the search module is used to search in a database including description information of multiple software packages based on software environment information, and obtain m initial software packages related to vulnerabilities, where m is an integer greater than n. ;Select n initial software packages from m initial software packages as n candidate software packages based on the software environment information.
  • the search module is used to sort the m initial software packages based on the software description information and software environment information of each initial software package, and obtain the sorting results. Any initial software package is ranked among the sorting results. The order is used to indicate the correlation between any initial software package and the vulnerability; according to the sorting results, n initial software packages are selected from the m initial software packages.
  • the acquisition module is also used to obtain the software package screening model; the search module is used to call the software package screening model to select n initial software packages from m initial software packages based on the software environment information.
  • the extension module is used to cluster multiple software packages based on the description information of the software packages in the database to obtain at least one target software cluster, where the target software cluster is a software cluster including candidate software packages.
  • the target software cluster also includes software packages with the same origin as the candidate software package; the candidate software packages in at least one target software cluster and the software packages with the same origin that meet the conditions are used as recommended software packages.
  • the extension module is used to obtain the description vector of each software package in multiple software packages based on the description information of the software package in the database; Clustering to obtain multiple initial software clusters; calculating the code similarity between software packages included in each initial software cluster in multiple initial software clusters; based on the code similarity pair between software packages included in any initial software cluster.
  • the software packages included in any initial software cluster are screened to obtain candidate software clusters; the candidate software clusters including candidate software packages are used as target software clusters.
  • the device further includes: a determination module, configured to determine a target software package that matches the annotation object from multiple recommended software packages; and determine a software version affected by the vulnerability in the target software package.
  • the determination module is configured to send information about multiple recommended software packages to the terminal, and the terminal is configured to display information about multiple recommended software packages and return information about target software packages that match the annotation object; Receive information about the target software package sent by the terminal.
  • the determination module is configured to obtain the version information of the target software package from an information library including the version information of multiple software packages based on the information of the target software package; and send the version information of the target software package to the terminal. , receiving the software version returned by the terminal.
  • a computing device cluster includes at least one computing device, each computing device includes a processor and a memory; the processor of the at least one computing device is configured to execute the at least one computing device instructions stored in the memory, so that the computing device cluster executes any one of the vulnerability analysis methods of the first aspect.
  • a computer-readable storage medium includes computer program instructions.
  • the computer program instructions When the computer program instructions are executed by a computing device cluster, the computing device cluster performs the above-described first aspect. Any vulnerability analysis method.
  • a computer program (product) containing instructions is provided.
  • the computing device cluster causes the computing device cluster to execute any of the vulnerability analysis methods of the first aspect.
  • a communication device which includes a transceiver, a memory, and a processor.
  • the transceiver, the memory and the processor communicate with each other through an internal connection path
  • the memory is used to store instructions
  • the processor is used to execute the instructions stored in the memory to control the transceiver to receive signals and control the transceiver to send signals.
  • the processor executes the instructions stored in the memory, the processor is caused to execute the method in the first aspect or any possible implementation of the first aspect.
  • processors there are one or more processors and one or more memories.
  • the memory may be integrated with the processor, or the memory may be provided separately from the processor.
  • the memory can be a non-transitory memory, such as a read-only memory (ROM), which can be integrated on the same chip as the processor, or can be set in different On the chip, this application does not limit the type of memory and the arrangement of the memory and the processor.
  • ROM read-only memory
  • a chip including a processor configured to call from a memory and run instructions stored in the memory, so that a communication device equipped with the chip executes the methods in the above aspects.
  • another chip including: an input interface, an output interface, a processor, and a memory.
  • the input interface, the output interface, the processor, and the memory are connected through an internal connection path.
  • the processing The processor is used to execute the code in the memory, and when the code is executed, the processor is used to execute the methods in the above aspects.
  • Figure 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • Figure 2 is a flow chart of a vulnerability analysis method provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a page provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a search for candidate software packages provided by an embodiment of the present application.
  • Figure 5 is another page schematic diagram provided by an embodiment of the present application.
  • Figure 6 is another schematic diagram of a page provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of another page provided by the embodiment of the present application.
  • Figure 8 is a flow chart of another vulnerability analysis method provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of a homology extension process provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a vulnerability analysis device provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Figure 12 is a schematic connection diagram of a computing device provided by an embodiment of the present application.
  • Vulnerabilities refer to security flaws in computer systems. According to different life cycles, vulnerabilities can be divided into 0-day vulnerabilities (vulnerability), 1-day vulnerabilities and n-day vulnerabilities. . Among them, 0-day vulnerabilities refer to vulnerabilities that do not have corresponding repair patches or mitigation measures, that is, vulnerabilities discovered for the first time. 1-day vulnerabilities refer to vulnerabilities that have corresponding repair patches or mitigation measures, but most users have not yet applied the patches or mitigation measures. N-day vulnerabilities refer to vulnerabilities that have corresponding repair patches or mitigation measures, and most users have already used patches to repair them. Since the existence of vulnerabilities will affect the security of the computer system, the discovered vulnerabilities need to be repaired to reduce the risks caused by the vulnerabilities. Embodiments of this application provide a vulnerability analysis method for determining software packages affected by vulnerabilities, so as to assist the repair object in repairing software packages affected by vulnerabilities.
  • FIG. 1 shows a schematic diagram of the implementation environment of the vulnerability analysis method provided by the embodiment of the present application.
  • the implementation environment is a computing device cluster 10.
  • the computing device cluster 10 includes at least one analysis device, and the analysis devices can communicate with each other through a wired or wireless network.
  • the vulnerability analysis method can be executed independently by one analysis device, or can be executed interactively by multiple analysis devices included in the computing device cluster 10.
  • the embodiment of the present application does not limit the number of analysis devices included in the computing device cluster 10.
  • Figure In 1 only two analysis devices are taken as examples for illustration.
  • the analysis device included in the computing device cluster 10 may be a server, such as a central server, an edge server, or a local server in a local data center.
  • the server can be a physical server or a cloud server that provides cloud computing servers.
  • the analysis devices included in the analysis device cluster 10 may also be terminal devices such as desktop computers, laptop computers, or smart phones.
  • the embodiment of the present application provides a vulnerability analysis method.
  • the vulnerability analysis method can be applied to the implementation environment described in Figure 1.
  • the method can be executed by the analysis device.
  • the flow chart of the method is shown in Figure 2, including S201-S203.
  • the analysis device obtains software environment information of the vulnerability.
  • the software environment information is used to describe the software affected by the vulnerability.
  • the embodiments of this application do not limit the method of determining the vulnerabilities to be analyzed.
  • the method may be to periodically detect the computer system, and use the vulnerabilities in the computer system discovered through the periodic detection as the vulnerabilities to be analyzed.
  • the vulnerabilities included in the security information regularly released by the computer system provider or the software provider involved in the computer system may also be used as vulnerabilities to be analyzed.
  • the analysis device needs to select one vulnerability from multiple vulnerabilities as the vulnerability to be analyzed.
  • the analysis device can be selected randomly or based on the discovery time of the vulnerability.
  • the vulnerability that matches the annotation object can also be determined as the vulnerability to be analyzed.
  • the annotation object refers to the object that indicates the impact scope of the vulnerability. Among them, the vulnerabilities matching the annotation object may be vulnerabilities selected by the annotation object, or vulnerabilities assigned to the annotation object for annotation, etc.
  • the analysis device sends vulnerability information to the terminal, and the terminal displays the vulnerability information, so that the annotation object selects one of the vulnerabilities based on the vulnerability information as a vulnerability to be analyzed.
  • the vulnerability information can include the vulnerability name, vulnerability description information and the release date of the vulnerability as shown in Figure 3.
  • the vulnerability information can also include other contents.
  • the analysis device determines the vulnerability that matches the annotation object as the vulnerability to be analyzed. analysis loopholes.
  • the first page shown in Figure 3 includes annotation controls 301 corresponding to each vulnerability.
  • the embodiment of the present application does not limit the triggering method of the label control and other subsequent controls, which can be triggered by clicking or by voice.
  • click triggering it can be implemented based on the mouse and keyboard connected to the terminal, or when the display screen of the terminal supports the touch screen function, click triggering can be implemented by clicking the screen.
  • the analysis device can obtain the software environment information of the vulnerability to be analyzed.
  • the acquisition process includes but is not limited to: obtaining the vulnerability description information of the vulnerability; identifying the software entity in the vulnerability description information; In the description information, the context of the software entity is extracted, and the software environment information is obtained based on the extracted context.
  • the analysis device obtains vulnerability description information on the Internet through a web crawler. Take the vulnerabilities analyzed in this application as 1-day vulnerabilities and n-day vulnerabilities as an example. These two types of vulnerabilities are not vulnerabilities discovered for the first time, but vulnerabilities that already have repair patches or mitigation measures. Therefore, the Internet will include vulnerability description information for these two types of vulnerabilities, and the analysis device can directly obtain the vulnerability description information through a web crawler. Of course, the analysis device can also receive vulnerability description information sent by other network devices. Taking other network devices as terminals as an example, the terminal obtains the vulnerability description information of each vulnerability through a web crawler. After marking the object and selecting the vulnerability to be analyzed, the vulnerability identification and vulnerability description information can be sent to the analysis device, and the analysis device obtains the information based on this. Vulnerability description information.
  • the vulnerability description information is a string of character fields composed of multiple characters used to describe the vulnerability, such as the vulnerability description information shown in Figure 3.
  • Multiple characters in the vulnerability description include software entities.
  • a software entity refers to a text fragment that identifies a software package, including its coordinates.
  • the coordinates of the software package are preset fields used for software package import. Take the development project through the Maven central warehouse as an example. Since a project development needs to import multiple software packages, you can write the coordinates of the software packages that need to be imported in the open file of the project, and Maven can automatically download them from the Internet. Search the software package that needs to be imported based on the coordinates of the software package and download it locally to implement the import of the software package.
  • the analysis device identifies the position of the software entity in the vulnerability description information and extracts the software entity of the vulnerability description information. Since software entities can identify software packages, the software entities in the vulnerability description information can identify the software packages affected by the vulnerability. In addition, the above process of identifying software entities can be implemented based on a pre-trained software entity extraction model. For example, the vulnerability description information is input into the software entity extraction model, and the software entity extraction model is called to output the software entity. Among them, the software entity extraction model can be trained through deep learning methods.
  • the training set involved in the above training process can be obtained based on an open source database, and the reference software entity can be extracted from the vulnerability description information through manual annotation.
  • the analysis device further extracts the context of the software entity from the vulnerability description information, and uses the context of the software entity and the software entity as software environment information.
  • the context of the software entity refers to the first number of words before and after taking the software entity as the starting point.
  • the first quantity can be any positive integer set based on experience. For example, if the first quantity is 150, then extracting the context of the software entity means extracting the 150 words before and after the software entity, and using the extracted 300 words as the context of the software entity. . Since the software entity does not exist independently in the vulnerability description information, there will be some constraints or influences between it and the context.
  • the search parameters are The more words you have, the more comprehensive your search results will be.
  • the analysis device searches for n candidate software packages based on the software environment information, where n is an integer greater than or equal to 1.
  • the process of searching candidate software packages by the analysis device includes: searching in a database including description information of multiple software packages based on the software environment information to obtain m initial software packages related to the vulnerability, where m is an integer greater than n. ;Select n initial software packages from m initial software packages based on the software environment information as n candidate software packages.
  • the embodiments of this application are not limited to the search method used to search in the database.
  • the database can be searched through the term frequency-inverse document frequency (TF-IDF) algorithm, or other text search methods.
  • the number m of initial software packages obtained by searching can be set based on experience. For example, m is set to 512 based on experience.
  • Figure 4 is a schematic diagram of a search for candidate software packages provided by the embodiment of the present application.
  • the software environment information is extracted from the vulnerability description information, that is, C 1 in Figure 4, and the information of each software package included in the database is extracted. Describe the entities and context in the information, that is, D i in Figure 4. Among them, i is a positive integer, used to identify different entities and contexts, and the maximum value of i is the total number of software packages included in the database.
  • the process of extracting entities and contexts from the description information in the database is similar to the process of extracting software environment information in S201, and will not be described again here.
  • the entities and contexts matching C 1 are D 1 , D 2 and D k respectively, and k is a positive integer.
  • the combination of C 1 and D 1 means that D 1 and C 1 are successfully matched.
  • the meanings of other combination relationships are similar to the meanings of the combination relationship of C 1 and D 1 , and will not be repeated one by one.
  • the analysis device after the analysis device obtains m initial software packages, it will screen the m initial software packages.
  • the screening process includes: based on the software description information and software environment information of each initial software package, The initial software packages are sorted to obtain the sorting results. The order of any initial software package in the sorting result is used to indicate the correlation between any initial software package and the vulnerability; according to the sorting results, n initial software packages are selected from the m initial software packages. Bag.
  • determining the initial software package from the database it is based on the frequency of occurrence of the same characters in the entity and context and software environment information, and the process of sorting the initial software package is based on the character reference in the software description information and software environment information.
  • the content determines the correlation between the initial software package and the vulnerability, and then arranges them in order of correlation.
  • the sorting results of C 1 D 2 , C 1 D 1 and C 1 D k are shown in Figure 4, and the n initial software packages before the sorting results are used as candidate software packages.
  • the above sorting operation performs a deeper sorting.
  • the correlation between each initial software package and the vulnerability is more accurate. Based on the more accurate correlation, the candidate software packages related to the vulnerability are determined. Higher accuracy.
  • screening initial software packages can be implemented through deep learning technology.
  • the analysis device obtains a software package screening model trained through deep learning technology, and calls the software package screening model to select n from m initial software packages based on software environment information.
  • n is a positive integer set based on experience, for example, n is set to 5 based on experience.
  • the software package screening model can be trained based on the training set.
  • the training method is similar to the training method of the software entity extraction model involved in S201, and will not be described again here.
  • candidate software packages are automatically determined. Since multiple initial software packages have been searched based on search technology before adopting the software package screening model, for vulnerabilities that lack corresponding relationships in the training set, the initial software packages affected by the vulnerability can also be determined in the database based on search technology, thereby ensuring that the identified The accuracy and versatility of candidate software packages are strong.
  • the analysis device performs homologous expansion on n candidate software packages, and obtains multiple recommended software packages.
  • the number of multiple recommended software packages is greater than n.
  • the recommended software packages are used to assist the annotation object in determining the impact scope of the vulnerability.
  • the analysis device determines the candidate software packages that the vulnerability will affect, it also performs a same-origin extension on the candidate software packages to improve the recall rate of vulnerability analysis.
  • the same-origin expansion process includes: database-based Cluster multiple software packages based on the description information of the software packages in to obtain at least one target software cluster.
  • the target software cluster is a software cluster that includes candidate software packages.
  • the target software cluster also includes software packages that have the same origin as the candidate software package. ; Use candidate software packages in at least one target software cluster and software packages from the same source that meet the conditions as recommended software packages.
  • the process of obtaining the target software cluster includes: obtaining the description vector of each software package in multiple software packages based on the description information of the software package in the database; clustering the multiple software packages based on the description vector of each software package. , obtain multiple initial software clusters; calculate the code similarity between the software packages included in each initial software cluster in the multiple initial software clusters; based on the code similarity between the software packages included in any initial software cluster, calculate any The software packages included in the initial software cluster are screened to obtain candidate software clusters; the candidate software clusters including the candidate software packages are used as the target software cluster.
  • the description information of the software package is vectorized through sentence embedding to obtain a description vector.
  • the step of vectorizing the description information can be implemented based on the pre-trained sentence embedding model.
  • a clustering algorithm is used for clustering, multiple software packages are grouped according to their degree of similarity, and software packages with a high degree of similarity are divided into an initial software cluster. This application does not limit the clustering algorithms used, such as mean shift clustering, K-Means clustering, graph community detection, etc.
  • the above-mentioned clustering operation of software packages in the database may be performed before the analysis device starts vulnerability analysis, and the above-mentioned clustering operation may be performed by the analysis device or by other network devices.
  • the analysis device calculates the code similarity between software packages included in any initial software cluster.
  • the source code of each software package is stored in the database. Based on the proportion of overlapping characters in the source code in the source code, the code similarity between software packages is calculated, and software packages whose code similarity is lower than the similarity threshold are filtered out. , to obtain candidate software clusters.
  • multiple software packages are sorted based on the magnitude of code similarity, and the software packages whose sort order is within the second number are used as software packages included in the candidate software cluster.
  • both the similarity threshold and the second quantity can be set based on experience.
  • the software packages included in the candidate software cluster are all highly related software packages, they can be called homologous software packages of the candidate software package. Therefore, when a candidate software package is included in a candidate software cluster, other software packages included in the candidate software cluster are highly related to the candidate software package, and in the case where the candidate software package will be affected by the vulnerability, they are highly related to the candidate software package. Other software packages may also be affected by the vulnerability. Taking software entities as coordinates as an example, the coordinates of the software package may change, but the source code of the software package will not change. Through candidate software clusters, software with different coordinates but high source code similarity or the same source code will be selected. Packages are clustered together.
  • the candidate software cluster including the candidate software package can be used as the target software cluster, and then the candidate software package can be homologously extended according to the software packages included in the target software cluster.
  • the analysis device uses software packages and candidate software packages that meet the conditions in the target candidate cluster as recommended software packages.
  • the embodiment of the present application does not limit the conditions for determining the recommended software package. It may be that the similarity between the software package and the candidate software package is higher than the first threshold set based on experience, or it may be that the coordinates of the software package are the same as the candidate software package. The coordinates are continuous coordinates, or when the software package is located in the target candidate cluster, the software package belongs to the software that meets the conditions.
  • the analysis device can also obtain historical data of vulnerabilities and determine software packages that meet the conditions based on the historical data of vulnerabilities.
  • the historical data is the software in which the vulnerability was first discovered, and the software package with the same origin as the software is determined as the software package that meets the conditions.
  • the software packages that meet the conditions may be all software packages in the target candidate cluster, and the software packages that meet the conditions may also be part of the software packages, which is not limited in the embodiments of the present application.
  • recommended software package A includes software version 1 and software version 2.
  • software version 1 is the initial version of recommended software package A
  • software version 2 is a version based on software version 1 that adds fix patches for vulnerabilities. Therefore, software version 1 is affected by the vulnerability, while software version 2 is not affected by the vulnerability.
  • the analysis device determines a target software package that matches the annotated object from multiple recommended software packages; determines the software version affected by the vulnerability in the target software package.
  • the target software package matching the annotation object includes, but is not limited to, the recommended software package selected by the annotation object.
  • the analysis device can send the information of multiple recommended software packages to the terminal, and the terminal is used to display the information of multiple recommended software packages and return the information of the target software package that matches the annotation object; receive the information of the target software package sent by the terminal.
  • the information of the recommended software package is, for example, the name of the recommended software package, or the consequences of the recommended software package being affected by the vulnerability, etc.
  • Figure 5 is a schematic diagram of another page provided by an embodiment of the present application.
  • Figure 5 shows a second page that displays multiple recommended software packages.
  • the second page in addition to displaying recommended software packages, the second page also displays basic information about the vulnerability to assist the annotated object in understanding the vulnerability being analyzed.
  • Basic information about the vulnerability such as vulnerability description information, vulnerability-related code warehouse, and vulnerability customer premises equipment (CPE) information.
  • the second page also provides selection controls corresponding to each recommended software package.
  • the marked object can trigger the selection control and determine the target software package.
  • the annotation object can choose to annotate only recommended software packages, for example, triggering the "use this software" control shown in Figure 5.
  • the annotation object can also choose to annotate the software cluster where the recommended software package is located, for example, triggering the "Use this software” control shown in Figure 5. Controls for "Use Software Clusters".
  • the annotation object selects the recommended software package that needs to be annotated through the operation terminal.
  • the terminal After it is used as the target software package, the terminal will send the information of the target software package to the analysis device.
  • the analysis device receives the information of the target software package returned by the terminal, it obtains the version information of the target software package from an information library including the version information of multiple software packages based on the information of the target software package; and sends the version information of the target software package to the terminal. , receiving the software version returned by the terminal.
  • the above-mentioned information library that stores version information of multiple software packages may be an open source library or a proprietary library collected and organized by the unit where the annotation object is located.
  • the version information of the target software package includes at least one software version, and the annotation object selects the software version affected by the vulnerability from at least one software version.
  • the terminal can display the version information of the target software package for the annotation object to select.
  • the third page shown in Figure 6 is the page displayed by the terminal after the annotation object completes the selection of the target software package on the second page.
  • the target object can trigger the display control 601 on the third page to display the version information of the target software package, and mark the software version affected by the vulnerability from the multiple software versions included in the version information.
  • Figure 7 is a schematic diagram of yet another page provided by an embodiment of the present application.
  • the fourth page shown in Figure 7 is the display situation of the terminal after the annotation object triggers the display control 601 on the third page.
  • the version information includes version numbers 1.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5 and 1.0 respectively. 6. Seven software versions.
  • the annotation object selects the software version that will be affected by the vulnerability from the above seven software versions.
  • the terminal will provide multiple annotation modes when the annotation object checks the annotation control corresponding to the software version. For example, check selection, all selection, range selection, etc. Among them, check-checking means that multiple software versions can be selected, selecting all means selecting all software versions included in the version information, and interval checking means selecting software versions within the interval.
  • check-checking means that multiple software versions can be selected
  • selecting all means selecting all software versions included in the version information
  • interval checking means selecting software versions within the interval.
  • the annotation object will After the interval check control on the fourth page is triggered, the terminal is in the interval check mode, and the annotation object checks version number 1.0.2 and version number 1.0.5 in turn as the version interval.
  • Four software versions with version numbers 1.0.2, 1.0.3, 1.0.4 and 1.0.5 were selected as the software versions affected by the vulnerability.
  • the determination of the software version may be performed interactively between the analysis device and the terminal as shown in the above embodiment, or the determination of the software version may be performed independently by the terminal.
  • the terminal After determining the target software package, the terminal obtains the version information of the target software package and displays it.
  • the annotation object selects the software version affected by the vulnerability based on the version information displayed by the terminal.
  • the process by which the analysis device determines the software version of the target software package is to receive the software version of the target software package returned by the terminal.
  • the audit object will audit the software version, and the audit results will indicate the corresponding relationship between the correct software version and the vulnerability and add it to the vulnerability. library to enrich the correspondence between vulnerabilities and software packages stored in the vulnerability library.
  • the analysis device determines the software version of a recommended software package that is affected by the vulnerability, it will continue to determine the software version of the next recommended software package that is affected by the vulnerability.
  • the third page also includes display controls of other target software packages, such as the display control 602 shown in FIG. 6 .
  • the display control 602 is triggered.
  • the terminal displays other recommended software packages located in the same target software cluster as the first target software package, and the annotation object selects the second one from them.
  • the annotated recommended software package is used as the target software package.
  • the vulnerability analysis method provided by the embodiment of this application obtains the initial software package through search, and then determines the candidate software package from multiple initial software packages.
  • the method provided in the embodiment of this application can also search for candidate software packages that will be affected by the vulnerability, and is highly versatile. Moreover, after the candidate software packages are determined, the candidate software packages will be extended with the same origin to obtain a larger number of recommended software packages, thus improving the recall rate of vulnerability analysis.
  • the automatic determination of recommended software packages is realized through the analysis device, speeding up the automated processing process of vulnerability impact analysis of common vulnerabilities & exposures (CVE), shortening the time of manual vulnerability analysis, improving the response speed of vulnerability warnings, and effectively reducing the exposure time window of vulnerabilities. .
  • CVE common vulnerabilities & exposures
  • Figure 8 is a flow chart of another vulnerability analysis method provided by an embodiment of the present application. It shows the interaction process between the terminal, the annotation object and the audit object when the analysis device is a terminal.
  • step 801 the terminal crawls vulnerability description information.
  • the process of the terminal crawling the vulnerability description information is similar to the process of the analysis device crawling the vulnerability description information in S201 in the embodiment shown in FIG. 2 , and will not be described again here.
  • step 802 the terminal calls the software entity extraction model to extract software entities from the vulnerability description information.
  • the process of the terminal extracting the software entity from the vulnerability description information is similar to the process of the analysis device extracting the software entity from the vulnerability description information in S201 of the embodiment shown in FIG. 2 , and will not be described again here.
  • step 803 the terminal determines candidate software packages through the software package screening model based on the software entities.
  • the process by which the terminal determines the candidate software package through the software package screening model based on the software entity is similar to the process by which the analysis device determines the candidate software package based on the software entity through the software package screening model in S202 in the embodiment shown in FIG. 2.
  • the software entity extraction model in step 802 and the software package screening model in step 803 are pre-trained models.
  • step 804 the terminal performs homology expansion on the candidate software package through the initial software cluster, and determines the recommended software package.
  • the process of performing homologous extension on candidate software packages to obtain multiple recommended software packages is shown in Figure 9.
  • the description information of the software package is processed by sentence embedding to obtain the description vector of the software package.
  • the software packages included in the initial software cluster are screened to obtain candidate software clusters.
  • the candidate software clusters including the candidate software packages are used as the target software cluster.
  • the candidate software packages are expanded based on the target software cluster to obtain multiple recommendations. software package.
  • the terminal displays multiple recommended software packages, the annotation object selects a target software package from the multiple recommended software packages, and the terminal performs the operation of step 805 based on the selection of the annotation object.
  • step 805 the terminal obtains the version information of the target software package from the information database.
  • the terminal also displays the version information of the target software package, and the annotation object selects the software version of the target software package.
  • the database providing the initial software cluster in step 804 and the information base data providing version information in step 805 are both open source libraries.
  • the audit object reviews the annotation results to determine whether the software version marked by the annotation object is accurate. After determining that the annotated software version is correct, the terminal performs the operation of step 806 to add the corresponding relationship between the software version and the vulnerability to the vulnerability database.
  • FIG. 10 is a schematic structural diagram of a vulnerability analysis device provided by an embodiment of the present application. Based on the following modules shown in Figure 10, the vulnerability analysis device shown in Figure 10 can perform all or part of the operations shown in Figure 2 above. It should be understood that the device may include more additional modules than the modules shown or omit some of the modules shown therein, and the embodiments of the present application are not limited to this. As shown in Figure 10, the device includes:
  • the acquisition module 1001 is used to obtain the software environment information of the vulnerability, and the software environment information is used to describe the software affected by the vulnerability;
  • the search module 1002 is used to search to obtain n candidate software packages based on software environment information, where n is an integer greater than or equal to 1;
  • the extension module 1003 is used to perform homologous expansion on n candidate software packages to obtain multiple recommended software packages.
  • the number of multiple recommended software packages is greater than n.
  • the recommended software packages are used to assist the annotation object in determining the scope of impact of the vulnerability.
  • the acquisition module 1001 is used to obtain the vulnerability description information of the vulnerability; identify the software entity in the vulnerability description information; extract the context of the software entity from the vulnerability description information, and obtain the software environment based on the extracted context information.
  • the search module 1002 is used to search in a database including description information of multiple software packages based on software environment information to obtain m initial software packages related to vulnerabilities, where m is greater than n. Integer; select n initial software packages from m initial software packages based on software environment information as n candidate software packages.
  • the search module 1002 is used to sort the m initial software packages based on the software description information and software environment information of each initial software package to obtain the sorting results. Any initial software package is in the sorting result. The order of is used to indicate the correlation between any initial software package and the vulnerability; according to the sorting results, n initial software packages are selected from the m initial software packages.
  • the acquisition module 1001 is also used to obtain the software package screening model; the search module 1002 is used to call the software package screening model to select n initial software packages from m initial software packages based on the software environment information. .
  • the extension module 1003 is configured to cluster multiple software packages based on the description information of the software packages in the database to obtain at least one target software cluster.
  • the target software cluster is software that includes candidate software packages. Clusters, the target software cluster also includes software packages with the same origin as the candidate software package; candidate software packages in at least one target software cluster and software packages with the same origin that meet the conditions are used as recommended software packages.
  • the extension module 1003 is configured to obtain the description vector of each software package in multiple software packages based on the description information of the software package in the database; Perform clustering to obtain multiple initial software clusters; calculate the code similarity between software packages included in each initial software cluster in the multiple initial software clusters; based on the code similarity between software packages included in any initial software cluster Screen the software packages included in any initial software cluster to obtain a candidate software cluster; use the candidate software cluster including the candidate software package as the target software cluster.
  • the device further includes: a determination module, configured to determine a target software package that matches the annotation object from multiple recommended software packages; and determine a software version affected by the vulnerability in the target software package.
  • the determination module is configured to send information about multiple recommended software packages to the terminal, and the terminal is configured to display information about multiple recommended software packages and return information about target software packages that match the annotation object; Receive information about the target software package sent by the terminal.
  • the determination module is configured to obtain the version information of the target software package from an information library including the version information of multiple software packages based on the information of the target software package; and send the version information of the target software package to the terminal. , receiving the software version returned by the terminal.
  • candidate software packages are searched based on software environment information. For situations where the training set of related technologies does not include corresponding vulnerabilities, the corresponding candidate software packages that will be affected can also be searched, and the versatility is high. Moreover, after determining the candidate software packages, the candidate software packages will be extended with the same origin to obtain a larger number of recommended software packages and improve the recall rate of vulnerability analysis.
  • the acquisition module 1001, the search module 1002 and the expansion module 1003 can all be implemented by software, or can be implemented by hardware. Illustratively, the following takes the acquisition module 1001 as an example to introduce the implementation of the acquisition module 1001. Similarly, the implementation of the search module 1002 and the expansion module 1003 can refer to the implementation of the acquisition module 1001.
  • the acquisition module 1001 may include code running on a computing instance.
  • the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the above computing instance may be one or more.
  • the acquisition module 1001 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Furthermore, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.
  • the multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (VPC), or across multiple VPCs.
  • VPC virtual private cloud
  • Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .
  • the acquisition module 1001 may include at least one computing device, such as a server.
  • the acquisition module 1001 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • Multiple computing devices included in the acquisition module 1001 can be distributed in the same region or in different regions. in region. Multiple computing devices included in the acquisition module 1001 may be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the acquisition module 1001 may be distributed in the same VPC or in multiple VPCs.
  • the plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • the acquisition module 1001 can be used to perform any step in the vulnerability analysis method
  • the search module 1002 can be used to perform any step in the vulnerability analysis method
  • the extension module 1003 can be used to perform vulnerability analysis.
  • the steps that the acquisition module 1001, the search module 1002, and the extension module 1003 are responsible for implementing can be specified as needed.
  • the acquisition module 1001, the search module 1002, and the extension module 1003 respectively implement different steps in the vulnerability analysis method. Realize all functions of the vulnerability analysis device.
  • computing device 1100 includes: bus 1102, processor 1104, memory 1106, and communication interface 1108.
  • the processor 1104, the memory 1106 and the communication interface 1108 communicate through a bus 1102.
  • Computing device 1100 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1100.
  • the bus 1102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 11, but it does not mean that there is only one bus or one type of bus.
  • Bus 1102 may include a path that carries information between various components of computing device 1100 (eg, memory 1106, processor 1104, communications interface 1108).
  • the processor 1104 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 1106 may include volatile memory, such as random access memory (RAM).
  • the processor 1104 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
  • ROM read-only memory
  • flash memory flash memory
  • HDD hard disk drive
  • solid state drive solid state drive
  • the memory 1106 stores executable program code, and the processor 1104 executes the executable program code to respectively implement the functions of the aforementioned acquisition module, search module and expansion module, thereby implementing the vulnerability analysis method. That is, the memory 1106 stores instructions for executing the vulnerability analysis method.
  • the communication interface 1108 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1100 and other devices or communication networks.
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device may be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
  • the structure of at least one computing device included in the computing device cluster may refer to the computing device 1100 shown in FIG. 11 .
  • the same instructions for performing the vulnerability analysis method may be stored in the memory 1106 of one or more computing devices 1100 in the computing device cluster.
  • the memory 1106 of one or more computing devices 1100 in the computing device cluster Partial instructions for executing vulnerability analysis methods may also be stored in them.
  • a combination of one or more computing devices 1100 may collectively execute instructions for performing vulnerability analysis methods.
  • the memories 1106 in different computing devices 1100 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the vulnerability analysis device. That is, the instructions stored in the memory 1106 in different computing devices 1100 may implement the functions of one or more modules among the acquisition module, the search module, and the expansion module.
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network may be a wide area network or a local area network, etc.
  • Figure 12 shows a possible implementation.
  • two computing devices 1200A and 1200B are connected through a network.
  • the connection to the network is made through a communication interface in each computing device.
  • computing devices 1200A and 1200B include a bus 1202, a processor 1204, a memory 1206, and a communication interface 1208.
  • Stored in memory 1206 in computing device 1200A are instructions for performing the functions of the acquisition module.
  • memory 1206 in computing device 1200B stores instructions for performing the functions of the search module and the expansion module.
  • connection mode between the computing device clusters shown in Figure 12 can be: Considering that the vulnerability analysis method provided by this application needs to search for candidate software packages and homologous extensions of the candidate software packages, it is therefore considered to implement the functions of the search module and the extension module handed over to computing device 1200B for execution.
  • computing device 1200A shown in FIG. 12 may also be performed by multiple computing devices 1200.
  • computing device 1200B may also be performed by multiple computing devices 1200.
  • An embodiment of the present application also provides a communication device, which includes: a transceiver, a memory, and a processor.
  • the transceiver, the memory and the processor communicate with each other through an internal connection path, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory to control the transceiver to receive signals and control the transceiver to send signals.
  • the processor executes the instructions stored in the memory, the processor is caused to execute the vulnerability analysis method.
  • processor can be a central processing unit (CPU), or other general-purpose processor, digital signal processor (digital signal processing, DSP), application specific integrated circuit (application specific integrated circuit), ASIC), field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor can be a microprocessor or any conventional processor, etc. It is worth noting that the processor may be a processor that supports advanced RISC machines (ARM) architecture.
  • ARM advanced RISC machines
  • the above-mentioned memory may include a read-only memory and a random access memory, and provide instructions and data to the processor.
  • Memory may also include non-volatile random access memory.
  • the memory may also store device type information.
  • the memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available.
  • static random access memory static random access memory
  • dynamic random access memory dynamic random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory direct memory bus random access memory
  • direct rambus RAM direct rambus RAM, DR RAM
  • Embodiments of the present application also provide a computer program (product) containing instructions.
  • the computer program (product) may be a software or program product containing instructions that can be run on a computing device or stored in any available medium.
  • the computer program (product) is run on at least one computing device, at least one computing device is caused to execute the vulnerability analysis method.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
  • the usable media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
  • the computer-readable storage medium includes instructions that instruct a computing device to perform a vulnerability analysis method.
  • Embodiments of the present application also provide a chip, including a processor, configured to call and run instructions stored in the memory, so that the communication device installed with the chip executes any of the vulnerability analysis methods described above. .
  • An embodiment of the present application also provides another chip, including: an input interface, an output interface, a processor, and a memory.
  • the input interface, the output interface, the processor, and the memory are connected through an internal connection path, and the The processor is configured to execute the code in the memory.
  • the processor is configured to execute any of the vulnerability analysis methods described above.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk), etc.
  • Computer program codes for implementing the methods of embodiments of the present application may be written in one or more programming languages. These computer program codes may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable rule-finding device, such that the program code, when executed by the computer or other programmable rule-finding device, causes the flowcharts and/or block diagrams to be displayed. The functions/operations specified in are implemented. Program code can reside entirely on the computer, partially on the computer, or as standalone software. package, execute partially on the computer and partially on the remote computer, or entirely on the remote computer or server.
  • the computer program code or related data may be carried by any appropriate carrier, so that the device, device or processor can perform the various processes and operations described above.
  • carriers include signals, computer-readable media, and the like.
  • signals may include electrical, optical, radio, acoustic, or other forms of propagated signals, such as carrier waves, infrared signals, and the like.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the modules is only a logical function division. In actual implementation, there may be other division methods.
  • multiple modules or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be indirect coupling or communication connection through some interfaces, devices or modules, or may be electrical, mechanical or other forms of connection.
  • the modules described as separate components may or may not be physically separated.
  • the components shown as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed to multiple network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the embodiments of the present application.
  • each functional module in each embodiment of the present application can be integrated into one processing module, or each module can exist physically alone, or two or more modules can be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software function modules.
  • first, second and other words are used to distinguish the same or similar items with basically the same functions and functions. It should be understood that the terms “first”, “second” and “nth” There is no logical or sequential dependency, and there is no limit on the number or execution order. It should also be understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first link may be referred to as a second link, and similarly, a second link may be referred to as a first link, without departing from the scope of various described examples.
  • the size of the sequence number of each process does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not be determined by the execution order of the embodiments of the present application.
  • the implementation process constitutes no limitation.
  • determining B based on A does not mean determining B only based on A, and B can also be determined based on A and/or other information.
  • references throughout this specification to "one embodiment,” “an embodiment,” and “a possible implementation” mean that specific features, structures, or characteristics related to the embodiment or implementation are included herein. In at least one embodiment of the application. Therefore, “in one embodiment” or “in an embodiment” or “a possible implementation” appearing in various places throughout this specification do not necessarily refer to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

一种漏洞分析方法、装置、设备及计算机可读存储介质,属于计算机技术领域。所述方法包括:分析设备获取漏洞的软件环境信息,软件环境信息用于描述漏洞影响的软件(201);分析设备基于软件环境信息搜索得到n个候选软件包,n为大于等于1的整数(202);分析设备对n个候选软件包进行同源扩展,得到多个推荐软件包,多个推荐软件包的数量大于n,推荐软件包用于辅助标注对象确定漏洞的影响范围(203)。候选软件包是基于软件环境信息搜索得到,通用性高。并且,在确定候选软件包后还会对候选软件包进行同源扩展,得到数量更多的推荐软件包,提高漏洞分析的召回率。

Description

漏洞分析方法、装置、设备及计算机可读存储介质
本申请要求于2022年07月25日提交的申请号为202210877922.5、发明名称为“漏洞影响包智能分析方法、装置、设备及介质”的中国专利申请的优先权,本申请要求于2022年08月31日提交的申请号为202211055558.0、发明名称为“漏洞分析方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及漏洞分析方法、装置、设备及计算机可读存储介质。
背景技术
漏洞是指计算机系统在安全方面的缺陷,漏洞的存在会影响计算机系统的安全性,因而需要对发现的漏洞进行修复以降低漏洞带来的风险。而在修复漏洞之前,确定计算机系统中被漏洞影响的软件包是不可缺少的一个环节,通过确定被漏洞影响的软件包,以使修复对象对被漏洞影响的软件包进行修复。因此,需要一种漏洞分析方法,确定被漏洞影响的软件包。
相关技术中,基于训练集中现有的漏洞与软件包之间的对应关系,训练一个能确定漏洞与软件包之间的对应关系的分类模型,再利用分类模型基于待分析的漏洞的描述信息确定该漏洞影响的软件包。
然而,训练集中包括的漏洞与软件包之间的对应关系有限,训练得到的分类模型仅适用于确定训练集中包括的漏洞影响的软件,通用性差,且分析的准确性不高。
发明内容
本申请提供了一种漏洞分析方法、装置、设备及计算机可读存储介质,以解决相关技术提供的问题,技术方案如下:
第一方面,提供了一种漏洞分析方法,该方法包括:分析设备获取漏洞的软件环境信息,软件环境信息用于描述漏洞影响的软件;分析设备基于软件环境信息搜索得到n个候选软件包,n为大于等于1的整数;分析设备对n个候选软件包进行同源扩展,得到多个推荐软件包,多个推荐软件包的数量大于n,推荐软件包用于辅助标注对象确定漏洞的影响范围。
本申请实施例提供的漏洞分析方法,候选软件包是基于软件环境信息搜索得到,对于相关技术中训练集中不包括对应关系的漏洞的情况,也能搜索到对应的会被影响到的候选软件包,通用性高。并且,在确定候选软件包后还会对候选软件包进行同源扩展,得到数量更多的推荐软件包,提高漏洞分析的召回率。
在一种可能的实现方式中,获取漏洞的软件环境信息,包括:获取漏洞的漏洞描述信息;识别漏洞描述信息中的软件实体;从漏洞描述信息中,提取软件实体的上下文,基于提取的上下文得到软件环境信息。采用基于软件实体的上下文确定软件环境信息进行搜索,用于搜索的内容更多,搜索得到的结果更全面。
在一种可能的实现方式中,基于软件环境信息搜索得到n个候选软件包,包括:基于软件环境信息在包括多个软件包的描述信息的数据库中进行搜索,得到m个与漏洞相关的初始软件包,m为大于n的整数;基于软件环境信息从m个初始软件包中选择n个初始软件包,作为n个候选软件包。对搜索得到的初始软件包进行筛选,使得筛选得到的候选软件包更精确。
在一种可能的实现方式中,基于软件环境信息从m个初始软件包中选择n个初始软件包,包括:基于各个初始软件包的软件描述信息与软件环境信息,对m个初始软件包进行排序,得到排序结果,任一初始软件包在排序结果的顺序用于指示任一初始软件包与漏洞的相关度;按照排序结果,从m个初始软件包中选择n个初始软件包。按照初始软件包与漏洞的相关度对初始软件包进行筛选,筛选得到的候选软件包都是与漏洞的相关度高的,会被漏洞影响的软件包,提高了搜索的准确率。
在一种可能的实现方式中,方法还包括:获取软件包筛选模型;基于软件环境信息从m个初始软件包中选择n个初始软件包,包括:调用软件包筛选模型基于软件环境信息从m个初始软件包中选择n个初始软件包。通过软件包筛选模型实现初始软件包的筛选,操作更便捷。
在一种可能的实现方式中,对n个候选软件包进行同源扩展,得到多个推荐软件包,包括:基于数据库中的软件包的描述信息对多个软件包进行聚类,得到至少一个目标软件簇,目标软件簇是包括候选软件包的软件簇,目标软件簇中还包括与候选软件包同源的软件包;将至少一个目标软件簇中的候选软件包以及满足条件的同源的软件包作为推荐软件包。在漏洞影响候选软件包的情况下,与候选软件包同源的软件包也存在受漏洞影响的概率,通过同源扩展,进一步扩大了漏洞影响的推荐软件包的范围,保证了漏洞分析的召回率。
在一种可能的实现方式中,基于数据库中的软件包的描述信息对多个软件包进行聚类,得到至少一个目标软件簇,包括:基于数据库中的软件包的描述信息获取多个软件包中的各个软件包的描述向量;基于各个软件包的描述向量对多个软件包进行聚类,得到多个初始软件簇;计算多个初始软件簇中的各个初始软件簇包括的软件包之间的代码相似度;基于任一初始软件簇包括的软件包之间的代码相似度对任一初始软件簇包括的软件包进行筛选,得到候选软件簇;将包括候选软件包的候选软件簇作为目标软件簇。在对多个软件包进行聚类后,还会对初始软件簇进行筛选,保证代码相似度高的软件包位于同一个候选软件簇中,后续将与候选软件包代码相似度高的软件包确定为会被漏洞影响的推荐软件包,提高了同源扩展的精确度。
在一种可能的实现方式中,得到多个推荐软件包之后,方法还包括:从多个推荐软件包中确定与标注对象匹配的目标软件包;确定目标软件包中漏洞影响的软件版本。进一步确定目标软件包中漏洞影响的软件版本,所确定的漏洞的影响范围更细化更精准。
在一种可能的实现方式中,从多个推荐软件包中确定与标注对象匹配的目标软件包,包括:向终端发送多个推荐软件包的信息,终端用于显示多个推荐软件包的信息,并返回与标注对象匹配的目标软件包的信息;接收终端发送的目标软件包的信息。将与标注对象匹配的推荐软件包确定为目标软件包,交互体验感高。
在一种可能的实现方式中,确定目标软件包中漏洞影响的软件版本,包括:基于目标软件包的信息从包括多个软件包的版本信息的信息库中获取目标软件包的版本信息;向终端发送目标软件包的版本信息,接收终端返回的软件版本。
第二方面,提供了一种漏洞分析装置,该装置应用于分析设备,该装置包括:获取模块,用于获取漏洞的软件环境信息,软件环境信息用于描述漏洞影响的软件;搜索模块,用于基于软件环境信息搜索得到n个候选软件包,n为大于等于1的整数;扩展模块,用于对n个候选软件包进行同源扩展,得到多个推荐软件包,多个推荐软件包的数量大于n,推荐软件包用于辅助标注对象确定漏洞的影响范围。
在一种可能的实现方式中,获取模块,用于获取漏洞的漏洞描述信息;识别漏洞描述信息中的软件实体;从漏洞描述信息中,提取软件实体的上下文,基于提取的上下文得到软件环境信息。
在一种可能的实现方式中,搜索模块,用于基于软件环境信息在包括多个软件包的描述信息的数据库中进行搜索,得到m个与漏洞相关的初始软件包,m为大于n的整数;基于软件环境信息从m个初始软件包中选择n个初始软件包,作为n个候选软件包。
在一种可能的实现方式中,搜索模块,用于基于各个初始软件包的软件描述信息与软件环境信息,对m个初始软件包进行排序,得到排序结果,任一初始软件包在排序结果的顺序用于指示任一初始软件包与漏洞的相关度;按照排序结果,从m个初始软件包中选择n个初始软件包。
在一种可能的实现方式中,获取模块,还用于获取软件包筛选模型;搜索模块,用于调用软件包筛选模型基于软件环境信息从m个初始软件包中选择n个初始软件包。
在一种可能的实现方式中,扩展模块,用于基于数据库中的软件包的描述信息对多个软件包进行聚类,得到至少一个目标软件簇,目标软件簇是包括候选软件包的软件簇,目标软件簇中还包括与候选软件包同源的软件包;将至少一个目标软件簇中的候选软件包以及满足条件的同源的软件包作为推荐软件包。
在一种可能的实现方式中,扩展模块,用于基于数据库中的软件包的描述信息获取多个软件包中的各个软件包的描述向量;基于各个软件包的描述向量对多个软件包进行聚类,得到多个初始软件簇;计算多个初始软件簇中的各个初始软件簇包括的软件包之间的代码相似度;基于任一初始软件簇包括的软件包之间的代码相似度对任一初始软件簇包括的软件包进行筛选,得到候选软件簇;将包括候选软件包的候选软件簇作为目标软件簇。
在一种可能的实现方式中,装置还包括:确定模块,用于从多个推荐软件包中确定与标注对象匹配的目标软件包;确定目标软件包中漏洞影响的软件版本。
在一种可能的实现方式中,确定模块,用于向终端发送多个推荐软件包的信息,终端用于显示多个推荐软件包的信息,并返回与标注对象匹配的目标软件包的信息;接收终端发送的目标软件包的信息。
在一种可能的实现方式中,确定模块,用于基于目标软件包的信息从包括多个软件包的版本信息的信息库中获取目标软件包的版本信息;向终端发送目标软件包的版本信息,接收终端返回的软件版本。
第三方面,提供了一种计算设备集群,所述计算设备集群包括至少一个计算设备,每个计算设备包括处理器和存储器;所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行上述第一方面中任一种的漏洞分析方法。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行上述第一方面中任一种的漏洞分析方法。
第五方面,提供了一种包含指令的计算机程序(产品),当所述指令被计算设备集群运行时,使得所述计算设备集群执行上述第一方面任一种的漏洞分析方法。
第六方面,提供了一种通信装置,该装置包括:收发器、存储器和处理器。其中,该收发器、该存储器和该处理器通过内部连接通路互相通信,该存储器用于存储指令,该处理器用于执行该存储器存储的指令,以控制收发器接收信号,并控制收发器发送信号,并且当该处理器执行该存储器存储的指令时,使得该处理器执行第一方面或第一方面的任一种可能的实施方式中的方法。
可选地,所述处理器为一个或多个,所述存储器为一个或多个。
可选地,所述存储器可以与所述处理器集成在一起,或者所述存储器与处理器分离设置。
在具体实现过程中,存储器可以为非瞬时性(non-transitory)存储器,例如只读存储器(read only memory,ROM),其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型以及存储器与处理器的设置方式不做限定。
第七方面,提供了一种芯片,包括处理器,用于从存储器中调用并运行所述存储器中存储的指令,使得安装有所述芯片的通信设备执行上述各方面中的方法。
第八方面,提供另一种芯片,包括:输入接口、输出接口、处理器和存储器,所述输入接口、输出接口、所述处理器以及所述存储器之间通过内部连接通路相连,所述处理器用于执行所述存储器中的代码,当所述代码被执行时,所述处理器用于执行上述各方面中的方法。
应当理解的是,本申请实施例的第二方面至第八方面的技术方案及对应的可能的实现方式所取得的有益效果可以参见上述对第一方面及其对应的可能的实现方式的技术效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种实施环境的示意图;
图2为本申请实施例提供的一种漏洞分析方法的流程图;
图3为本申请实施例提供的一种页面示意图;
图4为本申请实施例提供的一种候选软件包的搜索示意图;
图5为本申请实施例提供的另一种页面示意图;
图6为本申请实施例提供的又一种页面示意图;
图7为本申请实施例提供的再一种页面示意图;
图8为本申请实施例提供的另一种漏洞分析方法的流程图;
图9为本申请实施例提供的一种同源扩展的过程示意图;
图10为本申请实施例提供的一种漏洞分析装置的结构示意图;
图11为本申请实施例提供的一种计算设备的结构示意图;
图12为本申请实施例提供的一种计算设备的连接示意图。
具体实施方式
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
漏洞是指计算机系统在安全方面的缺陷,根据生命周期的不同,漏洞可以分为0天(0-day)漏洞(vulnerability)、1天(1-day)漏洞和n天(n-day)漏洞。其中,0-day漏洞是指没有对应的修复补丁或者缓解措施的漏洞,也即第一次发现的漏洞。1-day漏洞是指已有对应的修复补丁或者缓解措施的漏洞,但大部分使用方还没有使用补丁或者缓解措施。n-day漏洞是指已有对应的修复补丁或者缓解措施的漏洞,且大部分使用方已经使用补丁修复。由于漏洞的存在会影响计算机系统的安全性,因而需要对发现的漏洞进行修复以降低漏洞带来的风险。本申请实施例提供了一种漏洞分析方法,用于确定被漏洞影响的软件包,以辅助修复对象对被漏洞影响的软件包进行修复。
请参考图1,其示出了本申请实施例提供的漏洞分析方法的实施环境示意图。该实施环境为计算设备集群10,计算设备集群10包括至少一台分析设备,分析设备之间可通过有线或无线网络的方式进行通信连接。可选地,漏洞分析方法可以由一台分析设备独立执行,也可由计算设备集群10中包括的多台分析设备交互执行,本申请实施例不限定计算设备集群10包括的分析设备的数量,图1中仅以两台分析设备为例进行举例说明。
示例性地,计算设备集群10包括的分析设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。服务器可以是物理服务器,还可以是提供云计算服务器的云服务器。在一些实施例中,分析设备集群10包括的分析设备也可以是台式机、笔记本电脑或者智能手机等终端设备。
本申请实施例提供一种漏洞分析方法,该漏洞分析方法可应用于上述图1所述的实施环境,该方法可由分析设备执行,该方法的流程图如图2所示,包括S201-S203。
S201,分析设备获取漏洞的软件环境信息,软件环境信息用于描述漏洞影响的软件。
本申请实施例不限定待分析的漏洞的确定方式,可以是对计算机系统进行周期性检测,将通过周期性检测发现的位于计算机系统中的漏洞作为待分析的漏洞。也可以是将计算机系统提供方或是计算机系统中涉及的软件提供方定期发布的安全信息中包括的漏洞,作为待分析的漏洞。在一种可能的实现方式中,由于基于上述操作发现的漏洞存在多个,例如由计算机系统提供方发布的安全信息中包括多个漏洞。因此,分析设备需要从多个漏洞中选择一个漏洞作为待分析的漏洞。分析设备可以随机选择,也可以基于漏洞的发现时间选择,还可以将与标注对象匹配的漏洞确定为待分析的漏洞,标注对象是指标注漏洞的影响范围的对象。其中,与标注对象匹配的漏洞可以是标注对象选择出的漏洞,也可以是指定给标注对象进行标注的漏洞等。
例如,以图3所示的本申请实施例提供的一种页面示意图为例,其示出了显示多个漏洞的第一页面。第一页面的页面类型为标签(tab)页,分析设备将漏洞的信息发送给终端,终端显示漏洞的信息,以使标注对象基于漏洞的信息选择其中一个漏洞,作为待分析的漏洞。其中,漏洞的信息可以如图3所示包括漏洞名称、漏洞描述信息和漏洞的发布日期,漏洞的信息还可以包括其他内容。标注对象完成漏洞的选择后,可触发该漏洞的标注控件,终端向分析设备发送待分析的漏洞的标识,分析设备基于此确定与标注对象匹配的漏洞,作为待分 析的漏洞。图3示出的第一页面包括各个漏洞对应的标注控件301。本申请实施例不限定标注控件以及后续涉及到的其他控件的触发方式,可通过点击触发,也可通过语音触发。关于点击触发,可以是基于与终端连接的鼠标和键盘实现,或者,当终端的显示屏支持触屏功能,点击触发可以是通过点击屏幕实现。
无论基于何种方式确定待分析的漏洞,分析设备均可获取待分析的漏洞的软件环境信息,获取过程包括但不限于:获取漏洞的漏洞描述信息;识别漏洞描述信息中的软件实体;从漏洞描述信息中,提取软件实体的上下文,基于提取的上下文得到软件环境信息。
在一种可能的实现方式中,分析设备通过网络爬虫获取互联网中的漏洞描述信息。以本申请分析的漏洞属于1-day漏洞与n-day漏洞为例,这两种类型的漏洞并非初次发现的漏洞,而是已经具有修复补丁或缓解措施的漏洞。因此,互联网中会包括这两种类型的漏洞的漏洞描述信息,分析设备可以直接通过网络爬虫的方式获取漏洞描述信息。当然,分析设备也可以接收其他网络设备发送的漏洞描述信息。以其他网络设备为终端为例,终端通过网络爬虫获取各个漏洞的漏洞描述信息,在标注对象选择待分析的漏洞后,可将漏洞的标识与漏洞描述信息发往分析设备,分析设备基于此获取漏洞描述信息。
可选地,漏洞描述信息是一串用于描述漏洞的由多个字符组成的字符段,如图3所示的漏洞描述信息。而漏洞描述信息中的多个字符中包括软件实体。软件实体是指包括软件包的坐标在内的能标识软件包的文本片段。其中,软件包的坐标是预先设置的用于软件包导入的字段。以通过麦文(maven)中央仓开发项目为例,由于一个项目开发需要导入多个软件包,可通过在项目开放的文件中写入需要导入的软件包的坐标,maven就能自动从互联网上基于软件包的坐标搜索需要导入的软件包并下载到本地,实现软件包的导入。
分析设备识别软件实体在漏洞描述信息中的位置,并提取漏洞描述信息的软件实体。由于软件实体能标识软件包,因而漏洞描述信息中的软件实体可以标识漏洞影响的软件包。此外,上述识别软件实体的过程可以基于预训练的软件实体抽取模型实现。示例性地,将漏洞描述信息输入至软件实体抽取模型,调用软件实体抽取模型输出软件实体。其中,软件实体抽取模型可以是通过深度学习方法训练得到的。例如,获取训练集中已知参考软件实体的位置的多个漏洞描述信息,将多个漏洞描述信息输入初始的软件实体抽取模型,输出各个漏洞描述信息的初始软件实体,将各个漏洞描述信息的初始软件实体与各个漏洞描述信息的参考软件实体利用误差函数计算得到误差损失,通过误差损失不断调整初始的软件实体抽取模型的参数,得到本申请使用的软件实体抽取模型。关于训练软件实体抽取模型采用的误差函数,包括但不限于交叉熵损失函数。而上述训练过程中涉及到的训练集可以是基于开源数据库获取得到,参考软件实体可以是通过人工标注的方式,从漏洞描述信息中提取出来。
可选地,分析设备在确定软件实体的位置后,会进一步从漏洞描述信息中提取软件实体的上下文,将软件实体的上下文与软件实体作为软件环境信息。示例性地,软件实体的上下文是指以软件实体为起点前后第一数量个单词。而第一数量可以是基于经验设置的任意正整数,如第一数量为150,则提取软件实体的上下文也即是提取软件实体的前后150个单词,将提取的300个单词作为软件实体的上下文。由于软件实体在漏洞描述信息中并非独立存在,它与上下文之间会存在一些约束或影响,采用包括软件实体的上下文的软件环境信息进行搜索,与采用软件实体进行搜索相比,用于搜索的单词数量更多,搜索得到的结果更全面。
S202,分析设备基于软件环境信息搜索得到n个候选软件包,n为大于等于1的整数。
示例性地,分析设备搜索候选软件包的过程包括:基于软件环境信息在包括多个软件包的描述信息的数据库中进行搜索,得到m个与漏洞相关的初始软件包,m为大于n的整数;基于软件环境信息从m个初始软件包中选择n个初始软件包,作为n个候选软件包。本申请实施例不限定在数据库中进行搜索采用的搜索方法,可以通过词频-逆文本频率(term frequency–inverse document frequency,TF-IDF)算法在数据库中进行搜索,也可以通过其他文本搜索方法。关于搜索得到的初始软件包的数量m,可基于经验设置,例如,基于经验设置m为512。
图4为本申请实施例提供的一种候选软件包的搜索示意图,参见图4,从漏洞描述信息中提取软件环境信息,也即图4中的C1,提取数据库中包括的各个软件包的描述信息中的实体与上下文,也即图4中的Di。其中,i为正整数,用于标识不同实体与上下文,i的最大值为数据库中包括的软件包的总数。从数据库中的描述信息中提取实体与上下文的过程与S201中提取软件环境信息的过程类似,在此不再进行赘述。在多个实体与上下文中搜索与软件环境信息匹配的实体与上下文,将与软件环境信息匹配的实体与上下文对应的软件包确定为初始软件包。以软件环境信息为图4中的C1为例,则与C1匹配的实体与上下文分别为D1、D2和Dk,k为正整数。图4中C1与D1的组合在一起代表D1与C1匹配成功,其他组合关系的含义与C1和D1的组合关系的含义类似,不再一一赘述。
在一种可能的实现方式中,分析设备获取m个初始软件包后,会对m个初始软件包进行筛选,筛选过程包括:基于各个初始软件包的软件描述信息与软件环境信息,对m个初始软件包进行排序,得到排序结果,任一初始软件包在排序结果的顺序用于指示任一初始软件包与漏洞的相关度;按照排序结果,从m个初始软件包中选择n个初始软件包。在从数据库中确定初始软件包时是基于实体和上下文与软件环境信息中的相同字符的出现频率,而对初始软件包进行排序的过程,是根据软件描述信息与软件环境信息中的字符指代的内容确定初始软件包与漏洞的相关度,再按照相关度大小顺序排列。例如图4所示的C1D2、C1D1与C1Dk的排序结果,并将排序结果前n个初始软件包作为候选软件包。上述排序操作相较于确定初始软件包的操作,执行的是更深层次的排序,得到的各个初始软件包与漏洞的相关度更精确,基于更精确的相关度确定的与漏洞相关的候选软件包准确率更高。
可选地,筛选初始软件包可以通过深度学习技术实现,例如,分析设备获取通过深度学习技术训练得到的软件包筛选模型,调用软件包筛选模型基于软件环境信息从m个初始软件包中选择n个初始软件包。n为基于经验设置的正整数,例如基于经验设置n为5。而软件包筛选模型可以基于训练集训练得到,训练方式与S201中涉及的软件实体抽取模型的训练方式类似,在此不再进行赘述。
通过漏洞描述信息与包括软件包的描述信息的数据库,结合搜索技术和深度学习技术,实现候选软件包的自动确定。由于在采用软件包筛选模型之前,已经基于搜索技术搜索得到多个初始软件包,对于训练集中缺少对应关系的漏洞,也能基于搜索技术在数据库中确定漏洞影响的初始软件包,进而保证确定的候选软件包的准确率,通用性较强。
S203,分析设备对n个候选软件包进行同源扩展,得到多个推荐软件包,多个推荐软件包的数量大于n,推荐软件包用于辅助标注对象确定漏洞的影响范围。
在一种可能的实现方式中,分析设备在确定漏洞会影响的候选软件包后,还会对候选软件包进行同源扩展,以提高漏洞分析的召回率。可选地,同源扩展的过程包括:基于数据库 中的软件包的描述信息对多个软件包进行聚类,得到至少一个目标软件簇,目标软件簇是包括候选软件包的软件簇,目标软件簇中还包括与候选软件包同源的软件包;将至少一个目标软件簇中的候选软件包以及满足条件的同源的软件包作为推荐软件包。
示例性地,获取目标软件簇的过程包括:基于数据库中的软件包的描述信息获取多个软件包中的各个软件包的描述向量;基于各个软件包的描述向量对多个软件包进行聚类,得到多个初始软件簇;计算多个初始软件簇中的各个初始软件簇包括的软件包之间的代码相似度;基于任一初始软件簇包括的软件包之间的代码相似度对任一初始软件簇包括的软件包进行筛选,得到候选软件簇;将包括候选软件包的候选软件簇作为目标软件簇。
可选地,通过句嵌入的方式对软件包的描述信息进行向量化,得到描述向量。此外,对描述信息进行向量化的步骤可以基于预先训练完成的句嵌入模型实现。获取描述向量后,利用聚类算法进行聚类,将多个软件包按照相似程度进行分组,将相似程度高的软件包划分至一个初始软件簇。本申请不对采用的聚类算法进行限定,例如均值漂移聚类、K均值(K-Means)聚类和图团体检测(graph community detection)等。
其中,上述对数据库中的软件包进行聚类的操作可以是在分析设备开始漏洞分析之前,且上述聚类操作可以是分析设备执行的,也可以是其他网络设备执行的。通过预先完成初始软件簇的聚类,在后续基于初始软件簇对候选软件包进行同源扩展时,可以节省初始软件簇的聚类时间,进而提高同源扩展效率。
在获取多个初始软件簇后,分析设备会计算任一初始软件簇包括的软件包之间的代码相似度。数据库中存储有各个软件包的源代码,基于源代码中重合的字符在源代码中所占比例,计算得到软件包之间的代码相似度,将代码相似度低于相似阈值的软件包筛选掉,得到候选软件簇。或者,基于代码相似度的大小对多个软件包进行排序,将排序顺序在第二数量内的软件包作为候选软件簇中包括的软件包。其中,相似阈值与第二数量均可基于经验设置。
由于候选软件簇中包括的软件包均是相关程度高的软件包,因而可称作是候选软件包的同源软件包。因此,当候选软件簇中包括候选软件包,该候选软件簇中包括的其他软件包与候选软件包相关程度高,而在候选软件包会被漏洞影响的情况下,与候选软件包相关程度高的其他软件包也存在受漏洞影响的概率。以软件实体为坐标为例,软件包的坐标可能会出现改变,但软件包的源代码则不会改变,通过候选软件簇,将坐标不同,但源代码相似度高或是源代码相同的软件包聚类在一起。当漏洞会影响该候选软件簇中的候选软件包,则与该候选软件包位于同一个候选软件簇的源代码相似度高或是源代码相同的软件包也会被漏洞影响。因此,可将包括候选软件包的候选软件簇作为目标软件簇,再根据目标软件簇中包括的软件包对候选软件包进行同源扩展。
可选地,分析设备将目标候选簇中的满足条件的软件包与候选软件包作为推荐软件包。本申请实施例不限定用于确定推荐软件包的条件,可以是软件包与候选软件包之间的相似度高于基于经验设置的第一阈值,也可以是软件包的坐标与候选软件包的坐标为连续坐标,还可以是当该软件包位于目标候选簇,则该软件包就属于满足条件的软件。当然,分析设备还可以获取漏洞的历史数据,根据漏洞的历史数据确定满足条件的软件包。例如,历史数据为漏洞初次发现的软件,将与该软件同源的软件包确定为满足条件的软件包。基于上述举例可以理解,满足条件的软件包可以是目标候选簇中所有的软件包,满足条件的软件包也可以是部分软件包,本申请实施例对此不进行限定。通过对候选软件包进行同源扩展,确定的被漏 洞影响的推荐软件包数量更多,分析结果更全面,召回率更高。
由于一个推荐软件包可能会存在多种软件版本,而有些软件版本会受漏洞影响,有些版本则不会被漏洞影响。例如,推荐软件包A包括软件版本1与软件版本2。其中,软件版本1为推荐软件包A的初始版本,软件版本2为基于软件版本1添加针对漏洞的修复补丁的版本。因此,受漏洞影响的为软件版本1,而软件版本2则不会受漏洞影响。基于此,确定漏洞的影响范围,除了要明确漏洞影响的推荐软件包,还需要明确该推荐软件包的哪些软件版本会受到漏洞影响,进一步提高分析的粒度,进而提高分析的准确性。可选地,分析设备从多个推荐软件包中确定与标注对象匹配的目标软件包;确定目标软件包中漏洞影响的软件版本。
其中,与标注对象匹配的目标软件包包括但不限于是指由标注对象选中的推荐软件包。分析设备可向终端发送多个推荐软件包的信息,终端用于显示多个推荐软件包的信息,并返回与标注对象匹配的目标软件包的信息;接收终端发送的目标软件包的信息。而推荐软件包的信息例如是推荐软件包的名称,或是推荐软件包被漏洞影响的会造成的结果等。
图5为本申请实施例提供的另一种页面示意图,图5中示出了显示多个推荐软件包的第二页面。参见图5,第二页面上除了显示推荐软件包,还会显示漏洞的基础信息,用以辅助标注对象了解正在分析的漏洞。漏洞的基础信息如漏洞描述信息、漏洞关联代码仓、漏洞的用户驻地设备(customer premises equipment,CPE)信息。第二页面上还提供各个推荐软件包对应的选择控件,标注对象可触发选择控件,确定目标软件包。此外,标注对象可以选择仅标注推荐软件包,例如,触发图5所示的“使用该软件”的控件,标注对象还可以选择标注推荐软件包所在的软件簇,例如,触发图5所示的“使用软件簇”的控件。
标注对象通过操作终端选择需要标注的推荐软件包,作为目标软件包后,终端会向分析设备发送目标软件包的信息。分析设备接收到终端返回的目标软件包的信息后,基于目标软件包的信息从包括多个软件包的版本信息的信息库中获取目标软件包的版本信息;向终端发送目标软件包的版本信息,接收终端返回的软件版本。可选地,上述存储多个软件包的版本信息的信息库可以是开源库,也可以是标注对象所在单位收集整理的专有库。目标软件包的版本信息中包括至少一个软件版本,标注对象从至少一个软件版本中选择被漏洞影响的软件版本。
可选地,向终端发送目标软件包的版本信息之后,终端可显示目标软件包的版本信息,以供标注对象进行选择。以图6所示的本申请实施例提供的又一种页面示意图为例,图6示出的第三页面为标注对象在第二页面上完成目标软件包的选择后,终端显示的页面。目标对象可触发第三页面中的显示控件601,显示目标软件包的版本信息,并从版本信息中包括的多个软件版本中标注被漏洞影响的软件版本。图7为本申请实施例提供的再一种页面示意图,图7示出的第四页面为标注对象触发第三页面上的显示控件601后终端的显示情况。第四页面上存在第一区域用于显示目标软件包的版本信息,版本信息中分别包括版本号为1.0、1.0.1、1.0.2、1.0.3、1.0.4、1.0.5和1.0.6七个软件版本,标注对象从上述七个软件版本中选择会被漏洞影响的软件版本。
此外,终端在标注对象勾选软件版本对应的标注控件时,会提供多种标注模式。例如,复选、全选以及区间勾选等。其中,复选是指可以选择多个软件版本,全选是指选择版本信息中包括的所有软件版本,区间勾选是选择位于区间内的软件版本。示例性地,标注对象将 第四页面上的区间勾选的控件触发后,终端属于区间勾选模式,标注对象依次勾选版本号1.0.2与版本号1.0.5,作为版本区间。选择版本号为1.0.2、1.0.3、1.0.4与1.0.5的四个软件版本为漏洞影响的软件版本。
可选地,软件版本的确定可以是上述实施例示出的由分析设备与终端交互执行的,软件版本的确定还可以是终端独立完成的。终端在确定目标软件包后,获取目标软件包的版本信息并进行显示,标注对象基于终端显示的版本信息从中选择漏洞影响的软件版本。此种情况下,分析设备确定目标软件包的软件版本的过程是接收终端返回的目标软件包的软件版本。但无论分析设备基于何种方式确定目标软件包的软件版本,在确定软件版本之后,会由审核对象对软件版本进行审核,并将审核结果指示标注正确的软件版本与漏洞的对应关系添加至漏洞库中,以丰富漏洞库中存储的漏洞与软件包之间的对应关系。
此外,分析设备在确定一个推荐软件包被漏洞影响的软件版本后,还会继续确定下一个推荐软件包被漏洞影响的软件版本。针对标注对象在选择目标软件包时触发的“使用软件簇”的控件的情况,第三页面还包括其他目标软件包的显示控件,例如图6所示的显示控件602。标注对象在完成第一个目标软件包的软件版本的标注后,触发显示控件602,终端显示与第一个目标软件包位于同一个目标软件簇的其他推荐软件包,标注对象从中选择第二个进行标注的推荐软件包,作为目标软件包。
综上所述,本申请实施例提供的漏洞分析方法,通过搜索得到初始软件包,再从多个初始软件包中确定候选软件包,针对相关技术中训练集中不包括对应关系的漏洞的情况,本申请实施例提供的方法也能搜索到会被该漏洞影响到的候选软件包,通用性高。并且,在确定候选软件包后还会对候选软件包进行同源扩展,得到数量更多的推荐软件包,从而提高漏洞分析的召回率。通过分析设备实现推荐软件包的自动确定,加快漏洞披露(common vulnerabilities&exposures,CVE)的漏洞影响分析的自动化处理流程,缩短了漏洞人工分析的时间,提高漏洞预警响应速度,有效降低漏洞的暴露时间窗口。
图8为本申请实施例提供的另一种漏洞分析方法的流程图,其示出了当分析设备为终端时,终端、标注对象与审核对象之间的交互过程。
在步骤801中,终端爬取漏洞描述信息。
可选地,终端爬取漏洞描述信息的过程与上述图2示出的实施例中的S201中分析设备爬取漏洞描述信息的过程类似,在此暂不进行赘述。
在步骤802中,终端调用软件实体抽取模型从漏洞描述信息中提取软件实体。
可选地,终端从漏洞描述信息中提取软件实体的过程与上述图2示出的实施例的S201中分析设备从漏洞描述信息中提取软件实体的过程类似,在此暂不进行赘述。
在步骤803中,终端基于软件实体通过软件包筛选模型确定候选软件包。
其中,终端基于软件实体通过软件包筛选模型确定候选软件包的过程与上述图2示出的实施例中的S202中分析设备基于软件实体通过软件包筛选模型确定候选软件包的过程类似,在此暂不进行赘述。可选地,步骤802中的软件实体抽取模型与步骤803中的软件包筛选模型为预训练模型。
在步骤804中,终端对候选软件包通过初始软件簇进行同源扩展,确定推荐软件包。
可选地,对候选软件包进行同源扩展得到多个推荐软件包的过程参见图9,对数据库中 的软件包的描述信息进行句嵌入处理,得到软件包的描述向量。基于软件包的描述向量进行软件包聚类得到初始软件簇,计算各个初始软件簇包括的软件包之间的代码相似度。基于代码相似度对初始软件簇中包括的软件包进行筛选,得到候选软件簇,将包括候选软件包的候选软件簇作为目标软件簇,基于目标软件簇对候选软件包进行扩展,得到多个推荐软件包。
示例性地,终端显示多个推荐软件包,标注对象从多个推荐软件包中选择目标软件包,终端基于标注对象的选择,执行步骤805的操作。
在步骤805中,终端从信息库中获取目标软件包的版本信息。
此外,终端还显示目标软件包的版本信息,标注对象选择目标软件包的软件版本。可选地,步骤804中提供初始软件簇的数据库与步骤805中提供版本信息的信息库数据均为开源库。
示例性地,审核对象审核标注结果,确定标注对象标注的软件版本是否准确,并在确定标注的软件版本正确后,终端执行步骤806的操作,将软件版本与漏洞的对应关系添加至漏洞库。
以上介绍了本申请实施例的漏洞分析方法,与上述方法对应,本申请实施例还提供了漏洞分析装置。图10是本申请实施例提供的一种漏洞分析装置的结构示意图。基于图10所示的如下多个模块,该图10所示的漏洞分析装置能够执行上述图2所示的全部或部分操作。应理解到,该装置可以包括比所示模块更多的附加模块或者省略其中所示的一部分模块,本申请实施例对此并不进行限制。如图10所示,装置包括:
获取模块1001,用于获取漏洞的软件环境信息,软件环境信息用于描述漏洞影响的软件;
搜索模块1002,用于基于软件环境信息搜索得到n个候选软件包,n为大于等于1的整数;
扩展模块1003,用于对n个候选软件包进行同源扩展,得到多个推荐软件包,多个推荐软件包的数量大于n,推荐软件包用于辅助标注对象确定漏洞的影响范围。
在一种可能的实现方式中,获取模块1001,用于获取漏洞的漏洞描述信息;识别漏洞描述信息中的软件实体;从漏洞描述信息中,提取软件实体的上下文,基于提取的上下文得到软件环境信息。
在一种可能的实现方式中,搜索模块1002,用于基于软件环境信息在包括多个软件包的描述信息的数据库中进行搜索,得到m个与漏洞相关的初始软件包,m为大于n的整数;基于软件环境信息从m个初始软件包中选择n个初始软件包,作为n个候选软件包。
在一种可能的实现方式中,搜索模块1002,用于基于各个初始软件包的软件描述信息与软件环境信息,对m个初始软件包进行排序,得到排序结果,任一初始软件包在排序结果的顺序用于指示任一初始软件包与漏洞的相关度;按照排序结果,从m个初始软件包中选择n个初始软件包。
在一种可能的实现方式中,获取模块1001,还用于获取软件包筛选模型;搜索模块1002,用于调用软件包筛选模型基于软件环境信息从m个初始软件包中选择n个初始软件包。
在一种可能的实现方式中,扩展模块1003,用于基于数据库中的软件包的描述信息对多个软件包进行聚类,得到至少一个目标软件簇,目标软件簇是包括候选软件包的软件簇,目标软件簇中还包括与候选软件包同源的软件包;将至少一个目标软件簇中的候选软件包以及满足条件的同源的软件包作为推荐软件包。
在一种可能的实现方式中,扩展模块1003,用于基于数据库中的软件包的描述信息获取多个软件包中的各个软件包的描述向量;基于各个软件包的描述向量对多个软件包进行聚类,得到多个初始软件簇;计算多个初始软件簇中的各个初始软件簇包括的软件包之间的代码相似度;基于任一初始软件簇包括的软件包之间的代码相似度对任一初始软件簇包括的软件包进行筛选,得到候选软件簇;将包括候选软件包的候选软件簇作为目标软件簇。
在一种可能的实现方式中,装置还包括:确定模块,用于从多个推荐软件包中确定与标注对象匹配的目标软件包;确定目标软件包中漏洞影响的软件版本。
在一种可能的实现方式中,确定模块,用于向终端发送多个推荐软件包的信息,终端用于显示多个推荐软件包的信息,并返回与标注对象匹配的目标软件包的信息;接收终端发送的目标软件包的信息。
在一种可能的实现方式中,确定模块,用于基于目标软件包的信息从包括多个软件包的版本信息的信息库中获取目标软件包的版本信息;向终端发送目标软件包的版本信息,接收终端返回的软件版本。
上述装置,候选软件包是基于软件环境信息搜索得到,对于相关技术中训练集中不包括对应关系的漏洞的情况,也能搜索到对应的会被影响到的候选软件包,通用性高。并且,在确定候选软件包后还会对候选软件包进行同源扩展,得到数量更多的推荐软件包,提高漏洞分析的召回率。
其中,获取模块1001、搜索模块1002和扩展模块1003均可以通过软件实现,或者可以通过硬件实现。示例性的,接下来以获取模块1001为例,介绍获取模块1001的实现方式。类似的,搜索模块1002和扩展模块1003的实现方式可以参考获取模块1001的实现方式。
模块作为软件功能单元的一种举例,获取模块1001可以包括运行在计算实例上的代码。其中,计算实例可以包括物理主机(计算设备)、虚拟机、容器中的至少一种。进一步地,上述计算实例可以是一台或者多台。例如,获取模块1001可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内,同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。
模块作为硬件功能单元的一种举例,获取模块1001可以包括至少一个计算设备,如服务器等。或者,获取模块1001也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。
获取模块1001包括的多个计算设备可以分布在相同的region中,也可以分布在不同的 region中。获取模块1001包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,获取模块1001包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。
需要说明的是,在其他实施例中,获取模块1001可以用于执行漏洞分析方法中的任意步骤,搜索模块1002可以用于执行漏洞分析方法中的任意步骤,扩展模块1003可以用于执行漏洞分析方法中的任意步骤,获取模块1001、搜索模块1002、以及扩展模块1003负责实现的步骤可根据需要指定,通过获取模块1001、搜索模块1002、以及扩展模块1003分别实现漏洞分析方法中不同的步骤来实现漏洞分析装置的全部功能。
本申请还提供一种计算设备1100。如图11所示,计算设备1100包括:总线1102、处理器1104、存储器1106和通信接口1108。处理器1104、存储器1106和通信接口1108之间通过总线1102通信。计算设备1100可以是服务器或终端设备。应理解,本申请不限定计算设备1100中的处理器、存储器的个数。
总线1102可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线1102可包括在计算设备1100各个部件(例如,存储器1106、处理器1104、通信接口1108)之间传送信息的通路。
处理器1104可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
存储器1106可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器1104还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。
存储器1106中存储有可执行的程序代码,处理器1104执行该可执行的程序代码以分别实现前述获取模块、搜索模块和扩展模块的功能,从而实现漏洞分析方法。也即,存储器1106上存有用于执行漏洞分析方法的指令。
通信接口1108使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备1100与其他设备或通信网络之间的通信。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。
可选地,计算设备集群包括的至少一个计算设备的结构可参见图11示出的计算设备1100。计算设备集群中的一个或多个计算设备1100中的存储器1106中可以存有相同的用于执行漏洞分析方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备1100的存储器1106 中也可以分别存有用于执行漏洞分析方法的部分指令。换言之,一个或多个计算设备1100的组合可以共同执行用于执行漏洞分析方法的指令。
需要说明的是,计算设备集群中的不同的计算设备1100中的存储器1106可以存储不同的指令,分别用于执行漏洞分析装置的部分功能。也即,不同的计算设备1100中的存储器1106存储的指令可以实现获取模块、搜索模块和扩展模块中的一个或多个模块的功能。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图12示出了一种可能的实现方式。如图12所示,两个计算设备1200A和1200B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备1200A和1200B包括总线1202、处理器1204、存储器1206和通信接口1208。计算设备1200A中的存储器1206中存有执行获取模块的功能的指令。同时,计算设备1200B中的存储器1206中存有执行搜索模块和扩展模块的功能的指令。
图12所示的计算设备集群之间的连接方式可以是考虑到本申请提供的漏洞分析方法需要搜索得到候选软件包和候选软件包的同源扩展,因此考虑将搜索模块和扩展模块实现的功能交由计算设备1200B执行。
应理解,图12中示出的计算设备1200A的功能也可以由多个计算设备1200完成。同样,计算设备1200B的功能也可以由多个计算设备1200完成。
本申请实施例还提供了一种通信装置,该装置包括:收发器、存储器和处理器。其中,该收发器、该存储器和该处理器通过内部连接通路互相通信,该存储器用于存储指令,该处理器用于执行该存储器存储的指令,以控制收发器接收信号,并控制收发器发送信号,并且当该处理器执行该存储器存储的指令时,使得该处理器执行漏洞分析方法。
应理解的是,上述处理器可以是中央处理器(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。值得说明的是,处理器可以是支持进阶精简指令集机器(advanced RISC machines,ARM)架构的处理器。
进一步地,在一种可选的实施例中,上述存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。存储器还可以包括非易失性随机存取存储器。例如,存储器还可以存储设备类型的信息。
该存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用。例如,静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic random access memory,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate  SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
本申请实施例还提供了一种包含指令的计算机程序(产品)。所述计算机程序(产品)可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序(产品)在至少一个计算设备上运行时,使得至少一个计算设备执行漏洞分析方法。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行漏洞分析方法。
本申请实施例还提供了一种芯片,包括处理器,用于从存储器中调用并运行所述存储器中存储的指令,使得安装有所述芯片的通信设备执行如上任一所述的漏洞分析方法。
本申请实施例还提供另一种芯片,包括:输入接口、输出接口、处理器和存储器,所述输入接口、输出接口、所述处理器以及所述存储器之间通过内部连接通路相连,所述处理器用于执行所述存储器中的代码,当所述代码被执行时,所述处理器用于执行如上任一所述的漏洞分析方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。
为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各实施例的步骤及组成。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
用于实现本申请实施例的方法的计算机程序代码可以用一种或多种编程语言编写。这些计算机程序代码可以提供给通用计算机、专用计算机或其他可编程的规则查找装置的处理器,使得程序代码在被计算机或其他可编程的规则查找装置执行的时候,引起在流程图和/或框图中规定的功能/操作被实施。程序代码可以完全在计算机上、部分在计算机上、作为独立的软 件包、部分在计算机上且部分在远程计算机上或完全在远程计算机或服务器上执行。
在本申请实施例的上下文中,计算机程序代码或者相关数据可以由任意适当载体承载,以使得设备、装置或者处理器能够执行上文描述的各种处理和操作。载体的示例包括信号、计算机可读介质等等。信号的示例可以包括电、光、无线电、声音或其它形式的传播信号,诸如载波、红外信号等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、设备和模块的具体工作过程,可以参见前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,该模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、设备或模块的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
该作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以是两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种所述示例的范围的情况下,第一链路可以被称为第二链路,并且类似地,第二链路可以被称为第一链路。
还应理解,在本申请的各个实施例中,各个过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本申请中术语“至少一个”的含义是指一个或多个,本申请中术语“多个”的含义是指两个或两个以上,例如,多个第二报文是指两个或两个以上的第二报文。本文中术语“系统”和“网络”经常可互换使用。
应理解,在本文中对各种所述示例的描述中所使用的术语只是为了描述特定示例,而并非旨在进行限制。如在对各种所述示例的描述和所附权利要求书中所使用的那样,单数形式“一个(“a”,“an”)”和“该”旨在也包括复数形式,除非上下文另外明确地指示。
还应理解,术语“包括”(也称“includes”、“including”、“comprises”和/或“comprising”)当在本说明书中使用时指定存在所陈述的特征、整数、步骤、操作、元素、和/或部件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元素、部件、和/或其分组。
还应理解,根据上下文,短语“若确定...”或“若检测到[所陈述的条件或事件]”可被解释为意指“在确定...时”或“响应于确定...”或“在检测到[所陈述的条件或事件]时”或“响 应于检测到[所陈述的条件或事件]”。
应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
还应理解,说明书通篇中提到的“一个实施例”、“一实施例”、“一种可能的实现方式”意味着与实施例或实现方式有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”、“一种可能的实现方式”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的保护范围。

Claims (22)

  1. 一种漏洞分析方法,其特征在于,所述方法包括:
    分析设备获取漏洞的软件环境信息,所述软件环境信息用于描述所述漏洞影响的软件;
    所述分析设备基于所述软件环境信息搜索得到n个候选软件包,所述n为大于等于1的整数;
    所述分析设备对所述n个候选软件包进行同源扩展,得到多个推荐软件包,所述多个推荐软件包的数量大于所述n,所述推荐软件包用于辅助标注对象确定所述漏洞的影响范围。
  2. 根据权利要求1所述的方法,其特征在于,所述获取漏洞的软件环境信息,包括:
    获取所述漏洞的漏洞描述信息;
    识别所述漏洞描述信息中的软件实体;
    从所述漏洞描述信息中,提取所述软件实体的上下文,基于提取的上下文得到所述软件环境信息。
  3. 根据权利要求1或2所述的方法,其特征在于,所述基于所述软件环境信息搜索得到n个候选软件包,包括:
    基于所述软件环境信息在包括多个软件包的描述信息的数据库中进行搜索,得到m个与所述漏洞相关的初始软件包,所述m为大于所述n的整数;
    基于所述软件环境信息从m个初始软件包中选择n个初始软件包,作为所述n个候选软件包。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述软件环境信息从m个初始软件包中选择n个初始软件包,包括:
    基于各个初始软件包的软件描述信息与所述软件环境信息,对所述m个初始软件包进行排序,得到排序结果,任一初始软件包在所述排序结果的顺序用于指示所述任一初始软件包与所述漏洞的相关度;
    按照所述排序结果,从所述m个初始软件包中选择所述n个初始软件包。
  5. 根据权利要求3或4所述的方法,其特征在于,所述方法还包括:
    获取软件包筛选模型;
    所述基于所述软件环境信息从m个初始软件包中选择n个初始软件包,包括:
    调用所述软件包筛选模型基于所述软件环境信息从所述m个初始软件包中选择所述n个初始软件包。
  6. 根据权利要求3-5任一所述的方法,其特征在于,所述对所述n个候选软件包进行同源扩展,得到多个推荐软件包,包括:
    基于所述数据库中的软件包的描述信息对所述多个软件包进行聚类,得到至少一个目标软件簇,所述目标软件簇是包括所述候选软件包的软件簇,所述目标软件簇中还包括与所述 候选软件包同源的软件包;
    将所述至少一个目标软件簇中的候选软件包以及满足条件的同源的软件包作为推荐软件包。
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述数据库中的软件包的描述信息对所述多个软件包进行聚类,得到至少一个目标软件簇,包括:
    基于所述数据库中的软件包的描述信息获取所述多个软件包中的各个软件包的描述向量;
    基于所述各个软件包的描述向量对所述多个软件包进行聚类,得到多个初始软件簇;
    计算所述多个初始软件簇中的各个初始软件簇包括的软件包之间的代码相似度;
    基于任一初始软件簇包括的软件包之间的代码相似度对所述任一初始软件簇包括的软件包进行筛选,得到候选软件簇;
    将包括所述候选软件包的候选软件簇作为所述目标软件簇。
  8. 根据权利要求1-7任一所述的方法,其特征在于,所述得到多个推荐软件包之后,所述方法还包括:
    从所述多个推荐软件包中确定与所述标注对象匹配的目标软件包;
    确定所述目标软件包中所述漏洞影响的软件版本。
  9. 根据权利要求8所述的方法,其特征在于,所述从所述多个推荐软件包中确定与所述标注对象匹配的目标软件包,包括:
    向终端发送多个推荐软件包的信息,所述终端用于显示所述多个推荐软件包的信息,并返回与所述标注对象匹配的目标软件包的信息;
    接收所述终端发送的所述目标软件包的信息。
  10. 根据权利要求9所述的方法,其特征在于,所述确定所述目标软件包中所述漏洞影响的软件版本,包括:
    基于所述目标软件包的信息从包括多个软件包的版本信息的信息库中获取所述目标软件包的版本信息;
    向所述终端发送所述目标软件包的版本信息,接收所述终端返回的所述软件版本。
  11. 一种漏洞分析装置,其特征在于,所述装置应用于分析设备,所述装置包括:
    获取模块,用于获取漏洞的软件环境信息,所述软件环境信息用于描述所述漏洞影响的软件;
    搜索模块,用于基于所述软件环境信息搜索得到n个候选软件包,所述n为大于等于1的整数;
    扩展模块,用于对所述n个候选软件包进行同源扩展,得到多个推荐软件包,所述多个推荐软件包的数量大于所述n,所述推荐软件包用于辅助标注对象确定所述漏洞的影响范围。
  12. 根据权利要求11所述的装置,其特征在于,所述获取模块,用于获取所述漏洞的漏洞 描述信息;识别所述漏洞描述信息中的软件实体;从所述漏洞描述信息中,提取所述软件实体的上下文,基于提取的上下文得到所述软件环境信息。
  13. 根据权利要求11或12所述的装置,其特征在于,所述搜索模块,用于基于所述软件环境信息在包括多个软件包的描述信息的数据库中进行搜索,得到m个与所述漏洞相关的初始软件包,所述m为大于所述n的整数;基于所述软件环境信息从m个初始软件包中选择n个初始软件包,作为所述n个候选软件包。
  14. 根据权利要求13所述的装置,其特征在于,所述搜索模块,用于基于各个初始软件包的软件描述信息与所述软件环境信息,对所述m个初始软件包进行排序,得到排序结果,任一初始软件包在所述排序结果的顺序用于指示所述任一初始软件包与所述漏洞的相关度;按照所述排序结果,从所述m个初始软件包中选择所述n个初始软件包。
  15. 根据权利要求13或14所述的装置,其特征在于,所述获取模块,还用于获取软件包筛选模型;所述搜索模块,用于调用所述软件包筛选模型基于所述软件环境信息从所述m个初始软件包中选择所述n个初始软件包。
  16. 根据权利要求13-15任一所述的装置,其特征在于,所述扩展模块,用于基于所述数据库中的软件包的描述信息对所述多个软件包进行聚类,得到至少一个目标软件簇,所述目标软件簇是包括所述候选软件包的软件簇,所述目标软件簇中还包括与所述候选软件包同源的软件包;将所述至少一个目标软件簇中的候选软件包以及满足条件的同源的软件包作为推荐软件包。
  17. 根据权利要求16所述的装置,其特征在于,所述扩展模块,用于基于所述数据库中的软件包的描述信息获取所述多个软件包中的各个软件包的描述向量;基于所述各个软件包的描述向量对所述多个软件包进行聚类,得到多个初始软件簇;计算所述多个初始软件簇中的各个初始软件簇包括的软件包之间的代码相似度;基于任一初始软件簇包括的软件包之间的代码相似度对所述任一初始软件簇包括的软件包进行筛选,得到候选软件簇;将包括所述候选软件包的候选软件簇作为所述目标软件簇。
  18. 根据权利要求11-17任一所述的装置,其特征在于,所述装置还包括:确定模块,用于从所述多个推荐软件包中确定与所述标注对象匹配的目标软件包;确定所述目标软件包中所述漏洞影响的软件版本。
  19. 根据权利要求18所述的装置,其特征在于,所述确定模块,用于向终端发送多个推荐软件包的信息,所述终端用于显示所述多个推荐软件包的信息,并返回与所述标注对象匹配的目标软件包的信息;接收所述终端发送的所述目标软件包的信息。
  20. 根据权利要求19所述的装置,其特征在于,所述确定模块,用于基于所述目标软件包 的信息从包括多个软件包的版本信息的信息库中获取所述目标软件包的版本信息;向所述终端发送所述目标软件包的版本信息,接收所述终端返回的所述软件版本。
  21. 一种计算设备集群,其特征在于,所述计算设备集群包括至少一个计算设备,每个计算设备包括处理器和存储器;所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1-10中任一所述的漏洞分析方法。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行权利要求1-10中任一所述的漏洞分析方法。
PCT/CN2023/098487 2022-07-25 2023-06-06 漏洞分析方法、装置、设备及计算机可读存储介质 WO2024021874A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210877922 2022-07-25
CN202210877922.5 2022-07-25
CN202211055558.0 2022-08-31
CN202211055558.0A CN117521069A (zh) 2022-07-25 2022-08-31 漏洞分析方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2024021874A1 true WO2024021874A1 (zh) 2024-02-01

Family

ID=89705316

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/098487 WO2024021874A1 (zh) 2022-07-25 2023-06-06 漏洞分析方法、装置、设备及计算机可读存储介质

Country Status (1)

Country Link
WO (1) WO2024021874A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446691A (zh) * 2016-11-24 2017-02-22 工业和信息化部电信研究院 检测软件中集成或定制的开源项目漏洞的方法和装置
CN108763928A (zh) * 2018-05-03 2018-11-06 北京邮电大学 一种开源软件漏洞分析方法、装置和存储介质
CN111310178A (zh) * 2020-01-20 2020-06-19 武汉理工大学 跨平台场景下的固件漏洞检测方法及系统
US20200202005A1 (en) * 2018-12-19 2020-06-25 Blackberry Limited Automated Software Vulnerability Determination
CN112579476A (zh) * 2021-02-23 2021-03-30 北京北大软件工程股份有限公司 一种漏洞和软件对齐的方法、装置以及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446691A (zh) * 2016-11-24 2017-02-22 工业和信息化部电信研究院 检测软件中集成或定制的开源项目漏洞的方法和装置
CN108763928A (zh) * 2018-05-03 2018-11-06 北京邮电大学 一种开源软件漏洞分析方法、装置和存储介质
US20200202005A1 (en) * 2018-12-19 2020-06-25 Blackberry Limited Automated Software Vulnerability Determination
CN111310178A (zh) * 2020-01-20 2020-06-19 武汉理工大学 跨平台场景下的固件漏洞检测方法及系统
CN112579476A (zh) * 2021-02-23 2021-03-30 北京北大软件工程股份有限公司 一种漏洞和软件对齐的方法、装置以及存储介质

Similar Documents

Publication Publication Date Title
US20210326348A1 (en) Website scoring system
AU2019203208B2 (en) Duplicate and similar bug report detection and retrieval using neural networks
CN109697162B (zh) 一种基于开源代码库的软件缺陷自动检测方法
WO2017063538A1 (zh) 挖掘相关词的方法、搜索方法、搜索系统
US20190163742A1 (en) Method and apparatus for generating information
US20200110842A1 (en) Techniques to process search queries and perform contextual searches
US11409642B2 (en) Automatic parameter value resolution for API evaluation
US20220222372A1 (en) Automated data masking with false positive detection and avoidance
US20200159925A1 (en) Automated malware analysis that automatically clusters sandbox reports of similar malware samples
US20160162507A1 (en) Automated data duplicate identification
CN115827895A (zh) 一种漏洞知识图谱处理方法、装置、设备及介质
CN113128209B (zh) 用于生成词库的方法及装置
CN104933044A (zh) 应用卸载原因的分类方法及分类装置
CN113221032A (zh) 链接风险检测方法、装置以及存储介质
US10885188B1 (en) Reducing false positive rate of statistical malware detection systems
CN112148305A (zh) 一种应用检测方法、装置、计算机设备和可读存储介质
WO2016188334A1 (zh) 一种用于处理应用访问数据的方法与设备
CN105468975A (zh) 恶意代码误报的追踪方法、装置及系统
CN107085684B (zh) 程序特征的检测方法和装置
CN111177719A (zh) 地址类别判定方法、装置、计算机可读存储介质及设备
WO2024021874A1 (zh) 漏洞分析方法、装置、设备及计算机可读存储介质
US9342795B1 (en) Assisted learning for document classification
US11379669B2 (en) Identifying ambiguity in semantic resources
CN117521069A (zh) 漏洞分析方法、装置、设备及计算机可读存储介质
CN111178072A (zh) 一种法律条文的确定方法、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23845087

Country of ref document: EP

Kind code of ref document: A1