CN116974947A - Component detection method and device, electronic equipment and storage medium - Google Patents
Component detection method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN116974947A CN116974947A CN202311103950.2A CN202311103950A CN116974947A CN 116974947 A CN116974947 A CN 116974947A CN 202311103950 A CN202311103950 A CN 202311103950A CN 116974947 A CN116974947 A CN 116974947A
- Authority
- CN
- China
- Prior art keywords
- information
- code file
- open source
- source code
- component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003860 storage Methods 0.000 title claims abstract description 27
- 238000001514 detection method Methods 0.000 title abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000004458 analytical method Methods 0.000 claims abstract description 32
- 238000005457 optimization Methods 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 111
- 238000004590 computer program Methods 0.000 claims description 15
- 230000015572 biosynthetic process Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000007689 inspection Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000013433 optimization analysis Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000012384 transportation and delivery Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3628—Software debugging of optimised code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Stored Programmes (AREA)
Abstract
The application discloses a component detection method, a device, electronic equipment and a storage medium, belonging to the technical field of Internet, wherein the method comprises the following steps: extracting directed function call information of a source code file to be detected, generating a call graph of a directed function based on the directed function call information, determining each overlapping community based on the call graph of the directed function, and carrying out clustering optimization on each overlapping community to obtain target overlapping communities, wherein at least one identical code segment exists in any two overlapping communities, the identical code segment contains identical directed function call information, carrying out software component analysis on each target overlapping community and each function in the source code file to obtain characteristic information of each open source component in the source code file to be detected, and determining each target open source component included in the source code file based on each characteristic information. The detection of the open source component in the source code file can be more accurate, and the component detection accuracy is improved.
Description
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method and apparatus for detecting a component, an electronic device, and a storage medium.
Background
Along with the continuous expansion of application range of the open source component in the software research and development process, the open source component has come into development of various fields, such as mobile phones, flat panels, televisions, hand rings and other electronic products used daily, and systems composed of mainframe computers, servers and the like, all of which have open source components. Because the open source components may have vulnerabilities, it is particularly important to detect and protect the open source components in the software supply chain system.
In the prior art, a developer usually processes an open source component when developing software, so that the open source component identified by only analyzing the software components of the source code is inaccurate, and even false alarm can be generated. Because the possible loopholes in the source code cannot be accurately estimated, the loopholes cannot be repaired, and therefore unpredictable risks are caused to the project.
Disclosure of Invention
The embodiment of the application provides a component detection method, a device, electronic equipment and a storage medium, which are used for improving the detection accuracy of an open source component.
In a first aspect, an embodiment of the present application provides a component detection method, including:
extracting directed function call information of a source code file to be detected;
Generating a call graph of the directed function based on the directed function call information;
determining each overlapping community based on the call graph of the directed function, and performing clustering optimization on each overlapping community to obtain a target overlapping community, wherein at least one identical code segment exists in any two overlapping communities, and the identical code segment contains identical directed function call information;
performing software component analysis on each target overlapping community and each function in the source code file to obtain characteristic information of each open source component in the source code file to be detected;
and determining each target open source component included in the source code file based on the characteristic information.
In some embodiments, the performing cluster optimization on each overlapping community includes:
based on the internal structure and calling relation of each overlapping community and the hierarchical structure of each overlapping community, clustering and optimizing each overlapping community to obtain the target overlapping community; the internal structure represents code formation information of the overlapped communities, the calling relationship represents function pointing information in the overlapped communities, and the hierarchical structure represents dependency relationship among the overlapped communities.
In some embodiments, the software component analysis is performed on each target overlapping community and each function in the source code file to obtain feature information of each open source component in the source code file to be detected, including:
performing software component analysis on each target overlapping community and each function in the source code file to obtain a plurality of characteristic elements in the source code file to be detected, wherein the characteristic elements are used for representing the characteristics of an open source component;
extracting word vector features of the plurality of feature elements;
and calculating the extracted word vector features by using an Euclidean distance formula to obtain feature information of each open source component in the source code file to be detected.
In some embodiments, the determining, based on the feature information, each target open source component included in the source code file includes:
based on the feature information, determining an open source component corresponding to the feature information from a mapping relation of the pre-constructed open source component and the feature information corresponding to the open source component, and taking the determined open source component as a target open source component included in the source code file.
In some embodiments, after detecting the target open source component included in the source code file, the method further includes:
Based on the identification information of each target open source component, determining vulnerability information of each target open source component from the pre-constructed corresponding relation between the identification information of each target open source component and the corresponding vulnerability information; the vulnerability information comprises part or all of a vulnerability number, a vulnerability description and a vulnerability restoration suggestion.
In a second aspect, an embodiment of the present application provides a component inspection apparatus, including:
the extraction module is used for extracting the directed function call information of the source code file to be detected;
the generation module is used for generating a call graph of the directed function based on the directed function call information;
the first determining module is used for determining each overlapping community based on the call graph of the directed function, and carrying out clustering optimization on each overlapping community to obtain a target overlapping community, wherein at least one identical code segment exists in any two overlapping communities, and the identical code segment contains identical directed function call information;
the analysis module is used for carrying out software component analysis on each function in each target overlapping community and each source code file to obtain characteristic information of each open source component in the source code file to be detected;
And the second determining module is used for determining each target open source component included in the source code file based on each characteristic information.
In some embodiments, the first determining module is specifically configured to:
based on the internal structure and calling relation of each overlapping community and the hierarchical structure of each overlapping community, clustering and optimizing each overlapping community to obtain the target overlapping community; the internal structure represents code formation information of the overlapped communities, the calling relationship represents function pointing information in the overlapped communities, and the hierarchical structure represents dependency relationship among the overlapped communities.
In some embodiments, the analysis module is specifically configured to:
performing software component analysis on each target overlapping community and each function in the source code file to obtain a plurality of characteristic elements in the source code file to be detected, wherein the characteristic elements are used for representing the characteristics of an open source component;
extracting word vector features of the plurality of feature elements;
and calculating the extracted word vector features by using an Euclidean distance formula to obtain feature information of each open source component in the source code file to be detected.
In some embodiments, the second determining module is specifically configured to:
based on the feature information, determining an open source component corresponding to the feature information from a mapping relation of the pre-constructed open source component and the feature information corresponding to the open source component, and taking the determined open source component as a target open source component included in the source code file.
In some embodiments, further comprising:
the third determining module is configured to determine vulnerability information of each target open source component from a corresponding relationship between the pre-constructed identifier information of each target open source component and the vulnerability information corresponding to the identifier information of each target open source component after the second determining module detects the target open source component included in the source code file; the vulnerability information comprises part or all of a vulnerability number, a vulnerability description and a vulnerability restoration suggestion.
In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:
the memory stores a computer program executable by at least one processor to enable the at least one processor to perform the component detection method described above.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which when executed by a processor of an electronic device, is capable of performing the above-described component detection method.
In the embodiment of the application, the directed function call information of the source code file to be detected is extracted, a directed function call graph is generated based on the directed function call information, each overlapping community is determined based on the directed function call graph, clustering optimization is carried out on each overlapping community to obtain target overlapping communities, wherein at least one same code segment exists in any two overlapping communities, the same code segment contains the same directed function call information, software component analysis is carried out on each target overlapping communities and each function in the source code file to obtain the characteristic information of each open source component in the source code file to be detected, and each target open source component included in the source code file is determined based on each characteristic information. Therefore, the overlapping communities in the source code file are subjected to clustering optimization analysis, and then the detection of the open source component in the source code file can be more accurate by combining with software component analysis, so that the detection accuracy is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a schematic diagram of a system architecture of a component detection method according to an embodiment of the present application;
FIG. 2 is a flowchart of a component detection method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a component detection device according to an embodiment of the present application;
fig. 4 is a schematic hardware structure of an electronic device for implementing a component detection method according to an embodiment of the present application.
Detailed Description
In order to improve the detection accuracy of an open source component, the embodiment of the application provides a component detection method, a device, electronic equipment and a storage medium.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and embodiments of the present application and features of the embodiments may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in other sequences than those illustrated or otherwise described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
In the following, some terms in the embodiments of the present application are explained for easy understanding by those skilled in the art.
(1) The term "plurality" in embodiments of the present application means two or more, and other adjectives are similar.
(2) "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
(3) An open source assembly: the third party component is applicable to the development of software application programs, has the characteristics of openness, multiple elements, convenience and the like, and is widely used in the software development process.
(4) Software supply chain system: the connection of relevant links in the whole process that the software reaches the user from the software provider and is used by the user is from the software design, to the code writing and the software generation, to the software distribution and the user downloading, and finally to the chain structure between the software provider and the software user used by the user.
Along with the continuous expansion of application range of the open source component in the software research and development process, the open source component has come into development of various fields, such as mobile phones, flat panels, televisions, hand rings and other electronic products used daily, and systems composed of mainframe computers, servers and the like, all of which have open source components. Because the open source components may have vulnerabilities, it is particularly important to detect and protect the open source components in the software supply chain system.
In the prior art, a developer usually processes an open source component when developing software, so that the open source component identified by only analyzing the software components of the source code is inaccurate, and even false alarm can be generated. Because the possible loopholes in the source code cannot be accurately estimated, the loopholes cannot be repaired, and therefore unpredictable risks are caused to the project.
Therefore, the application provides a component detection method, device, electronic equipment and storage medium, which can enable the detection of the open source component in the source code file to be more accurate and improve the detection accuracy by carrying out clustering optimization analysis on overlapping communities in the source code file and then extracting the characteristics of the open source component by combining software component analysis.
After the design idea of the embodiment of the present application is introduced, some simple descriptions are made below for application scenarios applicable to the technical solution of the embodiment of the present application, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present application and are not limiting. In the specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.
Fig. 1 is a schematic diagram of a system architecture of a component detection method according to an embodiment of the present application.
As shown in fig. 1, the system architecture may include a network 101, a terminal device 102, and a server 103. The medium used by the network 101 to provide a communication link between the terminal device 102 and the server 103 may be a wired network or a wireless network.
Alternatively, the wireless network or wired network described above uses standard communication techniques and/or protocols. The network is typically the Internet, but may be any network including, but not limited to, a local area network (Local Area Network, LAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), a mobile, wired or wireless network, a private network, or any combination of virtual private networks. In some embodiments, data exchanged over the network is represented using techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as secure socket layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.
The terminal device 102 may be a variety of electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, wearable devices, augmented reality devices, virtual reality devices, and the like.
Alternatively, the clients of the applications installed in different terminal devices 102 are the same or clients of the same type of application based on different operating systems. The specific form of the application client may also be different based on the different terminal platforms, for example, the application client may be a mobile phone client, a PC client, etc.
The server 103 may be a server providing various services, such as a background management server providing support for devices operated by the user with the terminal device 102. The background management server can analyze and process the received data such as the request and the like, and feed back the processing result to the terminal equipment.
Alternatively, the server 103 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Based on at least one of the above problems, in an embodiment of the present application, a component detection method is provided, where a server 103 extracts directional function call information of a source code file to be detected, generates a call graph of a directional function based on the directional function call information, determines each overlapping community based on the call graph of the directional function, and performs cluster optimization on each overlapping community to obtain a target overlapping community, where any two overlapping communities have at least one same code segment, the same code segment contains the same directional function call information, performs software component analysis on each target overlapping community and each function in a source code file to obtain feature information of each open source component in the source code file to be detected, and determines each target open source component included in the source code file based on each feature information. The method can be applied to the scene of component detection, such as security detection of APP (application software) components in the communication field, component composition analysis and component detection. According to the method, the overlapped communities in the source code file are subjected to clustering optimization analysis, and then the software component analysis is combined, so that the detection of the open source component in the source code file can be more accurate, the detection accuracy is improved, the report missing rate of component detection is reduced, and therefore more complete component vulnerability information is obtained.
The technical scheme provided by the embodiment of the application will be described.
Those skilled in the art will appreciate that the number of terminal devices 102, networks 101, and servers 103 in fig. 1 is merely illustrative, and that any number of terminal devices 102, networks 101, and servers 103 may be provided as desired. The embodiment of the present application is not limited thereto.
After the system architecture provided by the embodiment of the present application is initially introduced, a component detection method provided by the embodiment of the present application is described next. The method may be performed by any electronic device having computing processing capabilities. In some embodiments, the component detection method provided in the embodiments of the present application may be performed in the server 103 shown in fig. 1.
Fig. 2 is a flowchart of a component detection method according to an embodiment of the present application, where the method includes the following steps.
In step 201, directed function call information of a source code file to be detected is extracted.
In specific implementation, a set of all functions in a source code file can be determined according to the source code file to be detected, then the calling relation among the functions is determined, the direction of the functions is determined, for example, the direction of the functions is from a caller a to a callee b, and the directed function calling information is determined according to the calling relation among the functions and the direction of the functions.
In step 202, a call graph of the directed function is generated based on the directed function call information.
In the specific implementation, a call graph of the directed function is generated according to call information of the directed functions, call weights among the functions and call times.
In step 203, based on the call graph of the directed function, each overlapping community is determined, and clustering optimization is performed on each overlapping community, so as to obtain a target overlapping community, wherein at least one identical code segment exists in any two overlapping communities, and the identical code segment contains identical directed function call information.
In specific implementation, according to different functions implemented by codes with different statement numbers, the source code file can be regarded as a plurality of communities with different sizes, code segments in different communities are different, if at least one identical code segment exists in any two communities, the two communities can be regarded as overlapped communities if the identical code segment contains identical directed function call information, for example, community 1 and community 2 both contain code segment A, and the code segment A has identical directed function call information, for example, function a points to function b, and community 1 and community 2 overlap communities.
When the method is implemented, after each overlapping community is determined, clustering optimization can be carried out on each overlapping community based on the internal structure and calling relation of each overlapping community and the hierarchical structure of each overlapping community to obtain a target overlapping community; the internal structure represents code composition information of the overlapped communities, the calling relationship represents function pointing information in the overlapped communities, and the hierarchical structure represents dependency relationship among the overlapped communities.
For example, the communities 1 and 2 are overlapping communities, the communities 1 and 2 each contain 10 lines of codes, the directional function call information in the communities 1 is directed to the function a, the function b is directed to the function c and the function d is directed to the function e, the directional function call information in the communities 2 is directed to the function a, the function f is directed to the function c and the function d, and after the communities 1 and 2 are clustered according to the code composition information and the function direction information of the communities 1 and 2 and the dependency relationship among the communities, redundant code information can be abandoned to obtain target overlapping communities, namely, the call information of the function in the target overlapping communities is directed to the function a, the function b is directed to the function c and the function a is directed to the function c, and the function f is directed to the function d and the function d is directed to the function e. Therefore, before clustering optimization, the data are discrete, undirected and irrelevant, global relation structures, pointing and cross-domain relations and the like can be cleared out after the clustering optimization, possible relations of each line of data in codes are fully mined, local fitness optimization is carried out on complex functions, high-stacking density functions can be effectively combed, and the accuracy of component detection is further improved.
In step 204, software component analysis is performed on each target overlapping community and each function in the source code file, so as to obtain feature information of each open source component in the source code file to be detected.
In specific implementation, after determining the target overlapping communities, each function in the target overlapping communities and the source code file may be used as an input information stream to perform software component analysis, for example, component analysis may be performed from the following aspects:
1. the code is formed by: by looking at the source code of the software, the components thereof, including various modules, classes, functions, etc., are analyzed to help understand the overall structure and functional implementation of the software.
2. Third party library and dependencies: many software uses third party libraries and dependencies in the development process, which libraries can provide additional functionality and toolsets, and by looking at the libraries and dependencies used by the software, the external resources on which the software depends can be known.
3. Database and storage: some software may require the use of databases or other forms of data storage to hold user information, configuration data, and the like. The database structure and data storage means of the analysis software can help to understand the organization and management means of the data.
According to the analysis results of the above classes of relationships, a plurality of characteristic feature elements for representing the open source components in the source code file to be detected can be determined, wherein the characteristic elements at least comprise an assembly code instruction set, a code structure, a control flow and elements of function call relationship data flow information, word vector features of the plurality of characteristic elements are extracted, the extracted word vector features are calculated by adopting an Euclidean distance formula, so that feature information of each open source component in the source code file to be detected is obtained, the feature information of the open source component can be a feature value, and one open source component corresponds to one feature value.
In step 205, each target open source component included in the source code file is determined based on each feature information.
In the implementation, based on the feature information, determining the open source component corresponding to the feature information from the pre-constructed mapping relation of the open source component and the feature information corresponding to the open source component, and taking the determined open source component as the target open source component included in the source code file.
In specific implementation, for example, an open source component knowledge base for storing the open source components and the feature information corresponding to the open source components is constructed in advance, after each feature information of the source code file to be detected is determined, each feature information in the open source component indication base can be matched, and the open source component corresponding to the feature information with the same matching is determined as the target open source component included in the source code file.
According to the embodiment of the application, the detection of the open source component in the source code file can be more accurate by carrying out cluster optimization analysis on the overlapped communities in the source code file and then combining with software component analysis, so that the detection accuracy is improved.
In specific implementation, after detecting the open source component included in the source code file to be detected, determining the vulnerability information of each target open source component from the corresponding relation between the pre-built identification information of each target open source component and the vulnerability information corresponding to the target open source component based on the identification information of each target open source component, wherein the vulnerability information comprises part or all of vulnerability numbers, vulnerability descriptions and vulnerability restoration suggestions, and the open source component such as information of names, versions, manufacturers, licenses and the like of the open source component can form the identification information of the open source component based on the names and versions of the open source component.
The pre-built identification information of each target open source component and the corresponding relationship of the corresponding vulnerability information can be crawled from a vulnerability publishing platform. The application is not limited with respect to which platforms the vulnerability distribution platform specifically comprises. For example, the vulnerability distribution platform includes one or more of a national vulnerability database (National Vulnerability Database, NVD), a general vulnerability disclosure (Common Vulnera bilities and Exposures, CVE), a Security Focus network (Security Focus network), a national information Security vulnerability database (China National Vulnerability Database of Information Security, CNNVD), a national information Security vulnerability sharing platform (China National Vulnerability Database, CNVD) and a woyun (ool cloud network), the vulnerability distribution platform is crawled by a crawler, and the crawled information is built into a component vulnerability information database. During the establishment, the method can comprise the steps of deduplication, translation, association project, association component, manual audit and the like, and the vulnerability information data source has authority, and has the characteristics of universality, richness and the like.
It should be noted that the component vulnerability information base may be updated according to needs, and the component vulnerability information base may be updated according to a preset time, for example, the component vulnerability information base is updated once a day; the component vulnerability information base can also be updated the day before the detection of the component to be detected, specifically when to update the component vulnerability information base, which is not limited in this embodiment, and only needs to be updated according to the need. According to the method, a plurality of vulnerability publishing platforms are crawled, the collected vulnerability information data is more perfect, and therefore the report missing rate of component detection results is reduced.
In the implementation, after the vulnerability information corresponding to each open source component is determined based on the detected open source components, the analysis report of the source code file to be detected can be generated by combining the vulnerability number, the vulnerability description and the vulnerability restoration suggestion in the vulnerability information of each open source component, so that a user can determine each open source component contained in the source code file to be detected, the vulnerability related to each open source component and the vulnerability information of each vulnerability based on the analysis report, and the user can conveniently detect and restore the vulnerabilities of each open source component according to the vulnerability information. In this way, the cost of manpower, time and resources can be reduced in the risk treatment process of the subsequent projects.
Based on the same technical concept, the embodiment of the application also provides a component detection device, and the principle of solving the problem of the component detection device is similar to that of the component detection method, so that the implementation of the component detection device can be referred to the implementation of the component detection method, and the repetition is omitted.
Fig. 3 is a schematic structural diagram of a component detection device according to an embodiment of the present application, which includes an extracting module 301, a generating module 302, a first determining module 303, an analyzing module 304, and a second determining module 305.
The extracting module 301 is configured to extract directed function call information of a source code file to be detected;
a generating module 302, configured to generate a call graph of the directed function based on the directed function call information;
the first determining module 303 is configured to determine each overlapping community based on the call graph of the directed function, and perform cluster optimization on each overlapping community to obtain a target overlapping community, where at least one identical code segment exists in any two overlapping communities, and the identical code segment contains identical directed function call information;
the analysis module 304 is configured to perform software component analysis on each function in the target overlapping communities and the source code file to obtain feature information of each open source component in the source code file to be detected;
A second determining module 305, configured to determine, based on the feature information, each target open source component included in the source code file.
In some embodiments, the first determining module 303 is specifically configured to:
based on the internal structure and calling relation of each overlapping community and the hierarchical structure of each overlapping community, clustering and optimizing each overlapping community to obtain the target overlapping community; the internal structure represents code formation information of the overlapped communities, the calling relationship represents function pointing information in the overlapped communities, and the hierarchical structure represents dependency relationship among the overlapped communities.
In some embodiments, the analysis module 304 is specifically configured to:
performing software component analysis on each target overlapping community and each function in the source code file to obtain a plurality of characteristic elements in the source code file to be detected, wherein the characteristic elements are used for representing the characteristics of an open source component;
extracting word vector features of the plurality of feature elements;
and calculating the extracted word vector features by using an Euclidean distance formula to obtain feature information of each open source component in the source code file to be detected.
In some embodiments, the second determining module 305 is specifically configured to:
based on the feature information, determining an open source component corresponding to the feature information from a mapping relation of the pre-constructed open source component and the feature information corresponding to the open source component, and taking the determined open source component as a target open source component included in the source code file.
In some embodiments, further comprising:
a third determining module 306, configured to determine, after the second determining module detects the target open source components included in the source code file, vulnerability information of each target open source component from a corresponding relationship between the pre-constructed identifier information of each target open source component and the vulnerability information corresponding to the identifier information of each target open source component based on the identifier information of each target open source component; the vulnerability information comprises part or all of a vulnerability number, a vulnerability description and a vulnerability restoration suggestion.
The division of the modules in the embodiments of the present application is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The coupling of the individual modules to each other may be achieved by means of interfaces which are typically electrical communication interfaces, but it is not excluded that they may be mechanical interfaces or other forms of interfaces. Thus, the modules illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices. The integrated modules may be implemented in hardware or in software functional modules.
Having described the component detection method and apparatus of an exemplary embodiment of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.
An electronic device 130 implemented according to such an embodiment of the present application is described below with reference to fig. 4. The electronic device 130 shown in fig. 4 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present application.
As shown in fig. 4, the electronic device 130 is in the form of a general-purpose electronic device. Components of electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 connecting the various system components, including the memory 132 and the processor 131.
Bus 133 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.
Memory 132 may include readable media in the form of volatile memory such as Random Access Memory (RAM) 1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the electronic device 130, and/or any device (e.g., router, modem, etc.) that enables the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur through an input/output (I/O) interface 135. Also, electronic device 130 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 130, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
In an exemplary embodiment, a storage medium is also provided, which when a computer program in the storage medium is executed by a processor of an electronic device, the electronic device is capable of performing the above-described component detection method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
In an exemplary embodiment, the electronic device of the present application may include at least one processor, and a memory communicatively coupled to the at least one processor, where the memory stores a computer program executable by the at least one processor, and the computer program when executed by the at least one processor causes the at least one processor to perform the steps of any of the component detection methods provided by the embodiments of the present application.
In an exemplary embodiment, a computer program product is also provided, which, when executed by an electronic device, is capable of carrying out any one of the exemplary methods provided by the application.
Also, a computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (EPROM), flash Memory, optical fiber, compact disc read-Only Memory (Compact Disk Read Only Memory, CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for component detection in embodiments of the present application may take the form of a CD-ROM and include program code that can run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio Frequency (RF), etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In cases involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, such as a local area network (Local Area Network, LAN) or wide area network (Wide Area Network, WAN), or may be connected to an external computing device (e.g., connected over the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.
Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (10)
1. A component inspection method, comprising:
extracting directed function call information of a source code file to be detected;
generating a call graph of the directed function based on the directed function call information;
determining each overlapping community based on the call graph of the directed function, and performing clustering optimization on each overlapping community to obtain a target overlapping community, wherein at least one identical code segment exists in any two overlapping communities, and the identical code segment contains identical directed function call information;
performing software component analysis on each target overlapping community and each function in the source code file to obtain characteristic information of each open source component in the source code file to be detected;
and determining each target open source component included in the source code file based on the characteristic information.
2. The method of claim 1, wherein said performing cluster optimization on said overlapping communities comprises:
Based on the internal structure and calling relation of each overlapping community and the hierarchical structure of each overlapping community, clustering and optimizing each overlapping community to obtain the target overlapping community; the internal structure represents code formation information of the overlapped communities, the calling relationship represents function pointing information in the overlapped communities, and the hierarchical structure represents dependency relationship among the overlapped communities.
3. The method of claim 1, wherein the performing software component analysis on each target overlapping community and each function in the source code file to obtain feature information of each open source component in the source code file to be detected comprises:
performing software component analysis on each target overlapping community and each function in the source code file to obtain a plurality of characteristic elements in the source code file to be detected, wherein the characteristic elements are used for representing the characteristics of an open source component;
extracting word vector features of the plurality of feature elements;
and calculating the extracted word vector features by using an Euclidean distance formula to obtain feature information of each open source component in the source code file to be detected.
4. The method of claim 1, wherein determining, based on the characteristic information, each target open source component included in the source code file comprises:
Based on the feature information, determining an open source component corresponding to the feature information from a mapping relation of the pre-constructed open source component and the feature information corresponding to the open source component, and taking the determined open source component as a target open source component included in the source code file.
5. The method of claim 1, wherein after detecting the target open source component included in the source code file, further comprising:
based on the identification information of each target open source component, determining vulnerability information of each target open source component from the pre-constructed corresponding relation between the identification information of each target open source component and the corresponding vulnerability information; the vulnerability information comprises part or all of a vulnerability number, a vulnerability description and a vulnerability restoration suggestion.
6. A component inspection apparatus, comprising:
the extraction module is used for extracting the directed function call information of the source code file to be detected;
the generation module is used for generating a call graph of the directed function based on the directed function call information;
the first determining module is used for determining each overlapping community based on the call graph of the directed function, and carrying out clustering optimization on each overlapping community to obtain a target overlapping community, wherein at least one identical code segment exists in any two overlapping communities, and the identical code segment contains identical directed function call information;
The analysis module is used for carrying out software component analysis on each function in each target overlapping community and each source code file to obtain characteristic information of each open source component in the source code file to be detected;
and the second determining module is used for determining each target open source component included in the source code file based on each characteristic information.
7. The apparatus of claim 6, wherein the first determining module is specifically configured to:
based on the internal structure and calling relation of each overlapping community and the hierarchical structure of each overlapping community, clustering and optimizing each overlapping community to obtain the target overlapping community; the internal structure represents code formation information of the overlapped communities, the calling relationship represents function pointing information in the overlapped communities, and the hierarchical structure represents dependency relationship among the overlapped communities.
8. The apparatus of claim 6, wherein the analysis module is configured to:
performing software component analysis on each target overlapping community and each function in the source code file to obtain a plurality of characteristic elements in the source code file to be detected, wherein the characteristic elements are used for representing the characteristics of an open source component;
Extracting word vector features of the plurality of feature elements;
and calculating the extracted word vector features by using an Euclidean distance formula to obtain feature information of each open source component in the source code file to be detected.
9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
10. A computer readable storage medium, characterized in that a computer program in the computer readable storage medium, when executed by a processor of an electronic device, is capable of performing the method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311103950.2A CN116974947A (en) | 2023-08-29 | 2023-08-29 | Component detection method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311103950.2A CN116974947A (en) | 2023-08-29 | 2023-08-29 | Component detection method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116974947A true CN116974947A (en) | 2023-10-31 |
Family
ID=88481581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311103950.2A Pending CN116974947A (en) | 2023-08-29 | 2023-08-29 | Component detection method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116974947A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117806624A (en) * | 2023-12-11 | 2024-04-02 | 北京北大软件工程股份有限公司 | Extraction method of reusable component |
-
2023
- 2023-08-29 CN CN202311103950.2A patent/CN116974947A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117806624A (en) * | 2023-12-11 | 2024-04-02 | 北京北大软件工程股份有限公司 | Extraction method of reusable component |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11281751B2 (en) | Digital asset traceability and assurance using a distributed ledger | |
US20210357211A1 (en) | Meta-indexing, search, compliance, and test framework for software development | |
US10169005B2 (en) | Consolidating and reusing portal information | |
US20170249126A1 (en) | Easy storm topology design and execution | |
US11449408B2 (en) | Method, device, and computer program product for obtaining diagnostic information | |
CN116974947A (en) | Component detection method and device, electronic equipment and storage medium | |
CN112016138A (en) | Method and device for automatic safe modeling of Internet of vehicles and electronic equipment | |
CN111865927B (en) | Vulnerability processing method and device based on system, computer equipment and storage medium | |
CN111563257A (en) | Data detection method and device, computer readable medium and terminal equipment | |
US9569335B1 (en) | Exploiting software compiler outputs for release-independent remote code vulnerability analysis | |
CN111367791B (en) | Method, device, medium and electronic equipment for generating test case | |
CN111124541B (en) | Configuration file generation method, device, equipment and medium | |
CN113377342A (en) | Project construction method and device, electronic equipment and storage medium | |
CN112988607B (en) | Application program component detection method and device and storage medium | |
CN113052305B (en) | Method for operating a neural network model, electronic device and storage medium | |
EP4276665A1 (en) | Analyzing scripts to create and enforce security policies in dynamic development pipelines | |
US9069643B1 (en) | Creating a prerequisite checklist corresponding to a software application | |
US12019749B2 (en) | Intelligent detection of cyber supply chain anomalies | |
CN115203674A (en) | Automatic login method, system, device and storage medium for application program | |
CN114840429A (en) | Method, apparatus, device, medium and program product for identifying version conflicts | |
CN114021133A (en) | Code processing method and device, electronic equipment and storage medium | |
CN112015394B (en) | Android function module development method and device, computer system and storage medium | |
US9250870B2 (en) | Automated creation of shim programs and interfaces | |
CN116452208B (en) | Method, device, equipment and medium for determining change transaction code | |
CN116955184A (en) | Open source component detection method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |