CN111338622B - Supply chain code identification method, device, server and readable storage medium - Google Patents

Supply chain code identification method, device, server and readable storage medium Download PDF

Info

Publication number
CN111338622B
CN111338622B CN202010413984.1A CN202010413984A CN111338622B CN 111338622 B CN111338622 B CN 111338622B CN 202010413984 A CN202010413984 A CN 202010413984A CN 111338622 B CN111338622 B CN 111338622B
Authority
CN
China
Prior art keywords
name information
target
code
node
supply chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010413984.1A
Other languages
Chinese (zh)
Other versions
CN111338622A (en
Inventor
赵豪
李文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010413984.1A priority Critical patent/CN111338622B/en
Publication of CN111338622A publication Critical patent/CN111338622A/en
Application granted granted Critical
Publication of CN111338622B publication Critical patent/CN111338622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/36Software reuse
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Abstract

The embodiment of the specification discloses a supply chain code identification method, a supply chain code identification device, a supply chain code identification server and a computer readable storage medium. By the scheme, the supply chain code can be accurately identified in the target code, and further effective preconditions can be provided for avoiding potential safety hazards caused by the untrusted supply chain code.

Description

Supply chain code identification method, device, server and readable storage medium
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a supply chain code identification method, a supply chain code identification device, a supply chain code identification server and a readable storage medium.
Background
With the continuous development of scientific technology, electronic technology has also been developed rapidly, and more functions can be realized by application programs installed in electronic devices. In the prior art, during the development process of the application program, a third-party SDK, namely supply chain code, is introduced, for example, in the whole application program development project, a large number of teams are involved, and different teams may need to use different supply chain code to complete respective development.
Disclosure of Invention
The embodiment of the specification provides a supply chain code identification method, a supply chain code identification device, a supply chain code identification server and a computer readable storage medium.
In a first aspect, an embodiment of the present specification provides a supply chain code identification method, including:
acquiring an object code of a target application program, wherein the object code comprises a supply chain code used by the target application program in a development process;
constructing a calling map corresponding to the target code, wherein the calling map comprises a plurality of nodes, and the node attribute of each node comprises the packet name information of the code block to which the node belongs;
based on the target package name information, screening out nodes with package name information matched with the target package name information from a plurality of nodes contained in the calling map to form a target node set;
and acquiring a target characteristic value of the target node set, and if the target characteristic value meets a preset supply chain code node characteristic value range, determining a code block corresponding to the target package name information as a supply chain code.
In a second aspect, an embodiment of the present specification provides a supply chain code identification apparatus, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an object code of a target application program, and the object code comprises a supply chain code used by the target application program in a development process;
the calling map building module is used for building a calling map corresponding to the target code, wherein the calling map comprises a plurality of nodes, and the node attribute of each node comprises the packet name information of the code block to which the node belongs;
the node screening module is used for screening out nodes with packet name information matched with the target packet name information from a plurality of nodes contained in the calling map based on the target packet name information to form a target node set;
and the identification module is used for acquiring a target characteristic value of the target node set, and if the target characteristic value meets a preset supply chain code node characteristic value range, determining a code block corresponding to the target packet name information as a supply chain code.
In a third aspect, embodiments of the present specification provide a server, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor performs the steps of any one of the methods described above.
In a fourth aspect, the present specification provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of any of the above methods.
The embodiment of the specification has the following beneficial effects:
in the supply chain code identification method provided in the embodiments of the present specification, an object code of a target application program is obtained, where the object code includes a supply chain code used by the target application program in a development process, a call graph corresponding to the object code is constructed, node attributes of nodes in the call graph include packet name information of code blocks of the nodes, the nodes in the call graph are screened according to the target packet name information, nodes whose packet name information matches the target packet name information are selected, and these nodes form a target node set, and it is determined whether a code block corresponding to the target packet name information is a supply chain code by detecting whether a target characteristic value of the node set satisfies a preset supply chain code node characteristic value range. In the scheme, whether the target package name information is the package name information of the supply chain code is determined by detecting whether the node characteristics under the target package name information accord with the node characteristics of the supply chain code, and the supply chain code can be effectively screened out from the huge target code. Due to the fact that some supply chain codes are not credible codes, if the supply chain codes cannot be effectively identified, the non-credible supply chain codes can threaten the safety of the application program, through the scheme, the supply chain codes can be effectively identified, and a basis is provided for further risk control of the supply chain codes.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the specification. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flowchart of a method for supply chain code identification provided in a first aspect of an embodiment of the present specification;
FIG. 2 is a schematic diagram of a supply chain code identification apparatus provided in a second aspect of an embodiment of the present specification;
fig. 3 is a schematic diagram of a server provided in the third aspect of the embodiments of the present specification.
Detailed Description
In order to better understand the technical solutions, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations of the technical solutions of the present specification, and the technical features of the embodiments and embodiments of the present specification may be combined with each other without conflict.
In a first aspect, the present specification provides a method for identifying a supply chain code, which may be applied to an electronic device, such as a user terminal, a server, and the like, and by using the method provided by the present specification, it is possible to detect whether a supply chain code exists in an application program being developed or already developed, and mark the identified supply chain code to enhance risk control of the supply chain code.
As shown in fig. 1, a flowchart of a supply chain code identification method provided in an embodiment of the present specification is provided, where the method includes the following steps:
step S11: acquiring an object code of a target application program, wherein the object code comprises a supply chain code used by the target application program in a development process;
step S12: constructing a calling map corresponding to the target code, wherein the calling map comprises a plurality of nodes, and the node attribute of each node comprises the packet name information of the code block to which the node belongs;
step S13: based on the target package name information, screening out nodes with package name information matched with the target package name information from a plurality of nodes contained in the calling map to form a target node set;
step S14: and acquiring a target characteristic value of the target node set, and if the target characteristic value meets a preset supply chain code node characteristic value range, determining a code block corresponding to the target package name information as a supply chain code.
In the embodiment of the present specification, the target application may be any application that has already been developed, or may be an application that is being developed. In the Development process of the target application, untrusted third party-provided supply chain code, that is, Software code developed by the target application developer itself, is usually introduced, but the third party-provided code, for example, the supply chain code may be a third party SDK (Software Development Kit), that is, a packaged code block with a specific function provided by the third party. In order to secure the target application, additional security measures are required for the supply chain code, and the premise of performing the security measures is to accurately identify the supply chain code.
In this embodiment, the target code may be a source code of the target application, or may be other codes including a supply chain code. The object code may be obtained in various ways, for example, directly obtain a source code written in a development process of the object application program, or obtain the object code through an APK package of the object application program, which is not limited herein, and the object code includes an introduced supply chain code.
After the target code is obtained, the target code is traversed to generate a calling graph of the target code, wherein the calling graph is composed of nodes and relations among the nodes and is used for representing a complex directed graph of an internal code calling link. In the specific implementation process, each node in the call graph is obtained by analyzing the target code, the node type in the call graph may include a function node, a variable node and the like, each node has respective node attribute, and the node attribute at least includes packet name information of a code block to which the node belongs.
Further, after the call graph is generated, the supply chain code is screened based on the call graph. In a specific implementation process, the target package name information may be package name information corresponding to any node in the calling map, or may be package name information obtained by processing, such as cutting, extracting, and the like, the package name information corresponding to any node in the calling map, which is not limited herein. In this embodiment of the present specification, since the packet name of the supply chain code is unknown, it may be determined whether the target packet name information is the packet name information of the supply chain code by determining whether the node characteristics in the code block corresponding to the target packet name information satisfy the node characteristics of the supply chain code, and thus determining whether the code block corresponding to the target packet name information is the supply chain code.
In a specific implementation process, all nodes matched with the target package name information are screened out from the call graph based on the target package name information, for example, when the package name information of a certain node is completely or partially the same as the target package name information, the node is extracted and added into a target node set. Further, a target characteristic value of the target node set is obtained, and the target characteristic value may include, but is not limited to, a total number of nodes, a number of ingress nodes, a number of egress nodes, a number of external invocation nodes, and the like included in the target node set. The entry node is a node inside a code block corresponding to the target packet name information, and the node is called externally; the exit node is a node inside the code block corresponding to the target packet name information, and the node calls an external node; the number of external calling nodes is the number of times of external calling of the nodes in the code block corresponding to the target packet name information.
In this embodiment of the present specification, the target package name information may be one or more, and when there are a plurality of pieces of target package name information, the above steps are performed for each piece of target package name information, a plurality of target node sets are generated, and a target feature value of each target node set is obtained.
It should be noted that the preset range of the characteristic value of the supply chain code node may be set according to actual needs, and the preset range of the characteristic value of the supply chain code node may include one or more ranges, for example, the preset range of the characteristic value of the supply chain code node may include a total node number range, an entry node number range, an exit node number range, an external calling node number and an entry node number ratio value range, and the like, which is not limited herein.
And when the target characteristic value of the target node set meets the preset supply chain code node characteristic value range, indicating that the code block corresponding to the target packet name information is a supply chain code. The determined supply chain code may be further marked to enforce wind control over the marked supply chain code when wind control is performed.
In this embodiment of the present specification, since the supply chain code has a case of introducing in a passive code manner, when acquiring the object code of the object application, the supply chain code may be acquired in the following manner: acquiring a binary code of the target application program; and performing decompiling on the binary code, and taking a decompiling result as the target code.
In a specific implementation process, a binary code file can be extracted from an APK package of a target application program, and the content in the binary code file is decompiled, wherein the decompiled result can be assembly code or code in the form of Java and the like, and the decompiled result is used as a target code.
In the embodiment of the present specification, when a calling graph is constructed, the following method may be implemented: performing code analysis on the target code, and extracting node attributes of each node and relations among the nodes to construct a graph database; generating the calling graph based on the graph database.
In the specific implementation process, the code analysis on the target code may be static analysis on the target code, such as lexical analysis, syntactic analysis, control flow, data flow analysis, or dynamic analysis on the target code, such as analyzing a function, or other analysis manners or a combination of multiple analysis manners. In the process of generating the code calling map, for each node, obtaining a node attribute of each node, and recording the node attribute into a map database, wherein the node attribute may include node package name information, node type information and the like corresponding to the node. Meanwhile, the relationship between the nodes is obtained and recorded into a chart database, and the relationship between the nodes can comprise an inheritance relationship, a direct calling relationship, an indirect calling relationship and the like. Since the graph database stores the node attributes and the relationships between the nodes, the call graph can be generated according to the graph database, for example, when there is a direct call relationship between the nodes, the two nodes are connected by edges, and the edges mark the relationship as a direct call. The generated call map can represent the call condition among all functions in the target code, so that the characteristic values such as the number of the entrance nodes and the exit nodes, the number of the external call nodes and the like can be counted according to the generated call map.
Further, since the package name of the supply chain code is unknown, and there may be multiple supply chain codes, in order to screen out all supply chain codes in the object code, in this embodiment of the present specification, it may be detected, one by one, whether a code block corresponding to the object package name information is a supply chain code, that is, whether a node feature corresponding to the object package name information conforms to a feature of a supply chain code node, by constructing multiple package names of possible supply chain codes, that is, constructing multiple object package name information.
In a specific implementation process, a plurality of target package name information can be constructed in the following way: generating a packet name information set based on the packet name information corresponding to each node in the calling map; respectively carrying out M rounds of package name cutting on the package name information sets based on M preset package name lengths to generate M groups of cut package name information sets, wherein for each round of package name cutting, the corresponding preset package name length is cut based on the round of package name, the length of each piece of package name information in the package name information sets is cut into the preset package name length to generate the round of cut package name information sets, and M is a positive integer; and determining the target package name information in the M groups of cut package name information sets.
For example, if the calling graph includes 5 ten thousand nodes, the packet name information corresponding to the 5 ten thousand nodes is subjected to deduplication processing to obtain 5 thousand packet name information, and the packet name information set is the set of the 5 thousand packet name information.
In addition, it should be noted that the package name information com.al. mobile. auth. activity may be a package name of a supply chain code, and com.al. mobile. auth. may also be a package name of a supply chain code, and com.al. mobile may also be a package name of a supply chain code, and therefore, in order not to miss any possibility of a supply chain code package name, in this embodiment of the present specification, M rounds of clipping are performed on the package name information sets respectively based on M preset package name lengths, where the preset package name lengths corresponding to each round of clipping may be different. It should be understood that M may be a value according to actual needs, for example, M may be any value between 3 and 10, or M is another value, which is not limited herein.
For example, eight rounds of cutting are performed on the packet name information set, the preset packet name length of the first round of cutting is 3, and the preset packet name length of the second round of cutting is 4. Alternatively, the number of rounds per round may be used as the preset packet name length for the round of cutting.
For each round of cutting, a group of cut package name information sets are generated, the package name lengths in the same group of cut package name information sets are all the same, and if the cutting is performed with the preset package name length being 3, the length of each package name information in the generated cut package name information sets is 3.
When the packet name information is clipped, the end length of the packet name information may be clipped, and the start length of the packet name information may be reserved, for example, when the packet name com. Of course, other ways of cutting may be performed, such as cutting two ends of the packet name, reserving the middle length of the packet name, or reserving the end length of the packet name, which is not limited herein. When the package name is cut, only the package name information with the package name length larger than the preset package name length in the package name information set is cut, and the package name information with the package name length equal to the preset package name length is reserved to form the cut package name information set of the round cutting, and the package name information with the package name length smaller than the preset package name length is not included.
After M rounds of cutting are carried out on the packet name information sets in the mode, M groups of cut packet name information sets are generated, and each group of cut packet name information sets comprises a plurality of packet name information with the same length. Therefore, the package name clipping in the embodiment of the present specification can more comprehensively cover the package name information possibly existing in the supply chain code.
Further, after the package name information is clipped, a node corresponding to each clipped package name information is extracted in the call graph, so as to judge whether the code block corresponding to each package name information is a supply chain code or not based on the extracted node. Specifically, the process of judging the corresponding code may be executed after each round of cutting, or the process of judging the code may be executed after all the M rounds of cutting are completed, which is not limited herein.
In this embodiment of this specification, for each group of clipped packet name information sets, each packet name information included in the group of clipped packet name information sets may be packet name information of a supply chain code, and therefore, for each packet name information, a corresponding target node set needs to be acquired, and in a specific implementation process, the following method may be implemented: traversing each packet name information in the M groups of cut packet name information sets, and obtaining a target node set corresponding to each packet name information through the following steps: regarding each packet name information in each group of cut packet name information sets, taking the packet name information as the target packet name information, and cutting the packet name information of each node in the calling map based on the preset packet name length corresponding to the group of cut packet name information sets; clustering the cut package name information of each node based on the target package name information to obtain a target node set corresponding to the target package name information, wherein the similarity between the cut package name information of each node in the target node set and the target package name information is not less than a preset similarity.
For example, if the packet name information set is cut with the preset packet name length of 3, each piece of packet name information in the set is sequentially selected as the target packet name information in the obtained cut packet name information set, for example, the packet name information of com.
In the embodiment of the present specification, when determining the target node set, each node included in the call graph is subjected to a packet name information clipping operation in the same manner as the above-described clipping of the packet name information set, for example, when the packet name information set is clipped, the start length of the packet name information is retained, and when the packet name information of each node in the call graph is clipped, the start length of each packet name information is retained. Still continuing with the above example, when the preset packet name length is 3, the packet name information of each node in the calling graph is also cut by the length of 3, and a node whose similarity between the cut packet name information and com. Thereby forming a set of target nodes corresponding to com. The target node set can be regarded as a node set included in a code block with com. Therefore, each packet name information in the cut packet name information set can be used as the packet name information of one code block, and the target node set of each code block can be screened out through the steps.
Further, after the target node set is obtained, whether the code block corresponding to the target package name information is a supply chain code is judged according to the target characteristic value of the target node set. In the implementation of the present specification, the determination can be made in the following manner: acquiring the total number of nodes, the number of exit nodes, the number of entry nodes and the number of external calling nodes contained in the target node set; and matching the total number of the nodes, the number of the inlet nodes, the number of the outlet nodes and the number of the external calling nodes with the preset supply chain code node characteristic value range, and if the matching is successful, determining a code block corresponding to the target packet name information as a supply chain code.
In a specific implementation process, each target characteristic value of the target node set can be counted according to the calling map, and since the calling relationship among the nodes is clear in the calling map, the type of each node in the target node set can be determined from the calling map, for example, a node a in the target node set calls a node B, and packet name information of the node B is different from packet name information of the node a, so that the node a calls an external node, and the node a is an exit node. In this embodiment of the present description, the type (such as an exit node type and an entry node type) of each node in the call graph may be determined according to the call relationship between the nodes, and the type is labeled, so that the number of exit nodes, the number of entry nodes, and the number of external call nodes of the target node set may be directly read in the call graph.
Of course, other feature values may be obtained as needed in addition to the number of egress nodes, the number of ingress nodes, and the number of external call nodes, which is not limited herein.
Furthermore, because the supply chain code has the characteristics of few entry nodes, many external calling nodes and the like, the range of the characteristic value of the supply chain code node can be preset according to the node characteristics of the supply chain code, and the preset range of the characteristic value of the supply chain code node can be set according to actual needs. When the target characteristic value of the target node set meets the preset supply chain code node characteristic value range, it is indicated that the target packet name information corresponding to the target node set is the packet name of the supply chain code, and the code block under the target packet name information is the supply chain code.
In the specific implementation process, the preset range of the supply chain code node characteristic values includes: the total number of the nodes is larger than a first threshold value, the ratio of the number of the exit nodes to the total number of the nodes is smaller than a second threshold value, and the ratio of the number of the external calling nodes to the number of the entry nodes is larger than a third threshold value. The first threshold, the second threshold and the third threshold can be set according to actual needs.
For example, the preset supply amount code node characteristic value range is that the total number of nodes is more than 600, the ratio between the number of egress nodes and the total number of nodes is less than 0.01, and the ratio between the number of external calling nodes and the total number of ingress nodes is more than 5. And acquiring the total number of nodes in the target node set, the ratio between the number of exit nodes and the total number of nodes in the target node set, and the ratio between the number of external calling nodes and the total number of entry nodes, wherein if the total number of the nodes meets the preset supply chain code node characteristic value range, the node characteristics in the target node set are similar to the node characteristics of the supply chain code, and the code block of the target package name information corresponding to the target node set is determined as the supply chain code.
Further, in this embodiment of the present disclosure, after the supply chain code is identified in the target code, security audit and security reinforcement may be performed on the supply chain code to ensure security of the target application. In addition, if a plurality of supply chain codes are determined in the target code, determining the risk level of each supply chain code in the plurality of supply chain codes; security hardening the each supply chain code based on the risk level of the each supply chain code.
Specifically, the risk level of each supply chain code may be determined in various ways, for example, for each supply chain code, a respective target node set is corresponding, and risk level classification may be performed according to the matching degree between the target characteristic value of the target node set and a preset supply chain code, for example, the risk level of the supply chain code corresponding to the target node set with the high matching degree is high, and the risk level of the supply chain code corresponding to the target node set with the low matching degree is low. For another example, the identified supply chain code is matched with a preset non-compliant code (such as a backdoor code), and if the matching degree is high, the corresponding supply chain code is taken as a high risk level.
Further, the security enforcement of the supply chain codes with different risk levels may be the same or different, for example, for the supply chain codes with high risk levels, the security enforcement with high level is also required, and for the supply chain codes with low risk levels, the security enforcement with low level may be used. Alternatively, the removal may be performed directly for the supply chain code with a high risk level, which is not limited herein.
In order to better understand the method for identifying a supply chain code provided by the embodiments of the present disclosure, a process flow of a specific embodiment is described below. Firstly, binary codes of target application programs are obtained, and the binary codes are inversely compiled into Java codes through code reversal. And constructing a calling graph based on the generated Java code, wherein the code is split into a plurality of code segments, namely code blocks, by analyzing the code, when the code is split, one class can be regarded as one code block, and the name of the class is used as a package name, so that package name information corresponding to each node can be obtained, and meanwhile, the relationship between the nodes is obtained, so that the calling graph is generated. And then screening the supply chain codes, wherein N rounds of calculation are required for screening the supply chain codes, when each round of calculation is carried out, the preset packet name length corresponding to the round is adopted to cut the packet name information set, a plurality of target packet name information screened in the round is generated, a target node set matched with each packet name information is determined for each target packet name information, and the total node number, the outlet node number, the inlet node number and the external calling node number of the target node set are calculated as target characteristic values. And aiming at each round of calculation, producing a plurality of groups of target characteristic values calculated by the round, comparing each group of target characteristic values with a preset supply chain code characteristic value range, determining target package name information meeting the preset supply chain code node characteristic value range, and taking a corresponding code as a supply chain code.
In summary, the supply chain code identification method provided in the embodiments of the present specification can accurately identify and screen the supply chain code in the target application program, provide a basis for risk control of the supply chain code, and meanwhile, perform security reinforcement on the supply chain code, thereby ensuring the security of the target application program.
In a second aspect, based on the same inventive concept, an embodiment of the present specification provides a supply chain code identification apparatus, please refer to fig. 2, the apparatus includes:
an obtaining module 21, configured to obtain an object code of a target application, where the object code includes a supply chain code used by the target application in a development process;
a calling map construction module 22, configured to construct a calling map corresponding to the target code, where the calling map includes multiple nodes, and a node attribute of each node includes packet name information of a code block to which the node belongs;
the node screening module 23 is configured to screen, based on the target packet name information, a node whose packet name information matches the target packet name information from a plurality of nodes included in the calling graph, so as to form a target node set;
and the identifying module 24 is configured to obtain a target characteristic value of the target node set, and determine that the code block corresponding to the target packet name information is a supply chain code if the target characteristic value meets a preset supply chain code node characteristic value range.
In an alternative implementation, the obtaining module 21 is configured to:
acquiring a binary code of the target application program;
and performing decompiling on the binary code, and taking a decompiling result as the target code.
In an alternative implementation, the graph building module 22 is invoked for:
performing code analysis on the target code, and extracting node attributes of each node and relations among the nodes to construct a graph database;
generating the calling graph based on the graph database.
In an alternative implementation, the apparatus further includes:
the package name information set generating module is used for generating a package name information set based on the package name information corresponding to each node in the calling map;
the processing module is used for respectively carrying out M rounds of package name cutting on the package name information sets based on M preset package name lengths to generate M groups of cut package name information sets, wherein for each round of package name cutting, the length of each piece of package name information in the package name information sets is cut into the preset package name length based on the preset package name length corresponding to the round of package name cutting, the cut package name information sets are generated, and M is a positive integer; and determining the target package name information in the M groups of cut package name information sets.
In an optional implementation manner, the node screening module 23 is configured to:
traversing each packet name information in the M groups of cut packet name information sets, and obtaining a target node set corresponding to each packet name information through the following steps:
regarding each packet name information in each group of cut packet name information sets, taking the packet name information as the target packet name information, and cutting the packet name information of each node in the calling map based on the preset packet name length corresponding to the group of cut packet name information sets; clustering the cut package name information of each node based on the target package name information to obtain a target node set corresponding to the target package name information, wherein the similarity between the cut package name information of each node in the target node set and the target package name information is not less than a preset similarity.
In an alternative implementation, the identifying module 24 is configured to:
acquiring the total number of nodes, the number of entry nodes, the number of exit nodes and the number of external calling nodes contained in the target node set;
and matching the total number of the nodes, the number of the inlet nodes, the number of the outlet nodes and the number of the external calling nodes with the preset supply chain code node characteristic value range, and if the matching is successful, determining a code block corresponding to the target packet name information as a supply chain code.
In an alternative implementation, if a plurality of supply chain codes are determined, the apparatus further includes:
a risk level determination module to determine a risk level for each of the plurality of supply chain codes;
a reinforcement module to perform security reinforcement on each supply chain code based on the risk level of each supply chain code.
With regard to the above-mentioned apparatus, the specific functions of the respective modules have been described in detail in the embodiments of the supply chain code identification method provided in the embodiments of the present specification, and will not be elaborated herein.
In a third aspect, based on the same inventive concept as the supply chain code identification method in the foregoing embodiments, the present specification further provides a server, as shown in fig. 3, including a memory 404, a processor 402, and a computer program stored on the memory 404 and executable on the processor 402, wherein the processor 402 executes the computer program to implement the steps of any one of the foregoing supply chain code identification methods.
Where in fig. 3 a bus architecture (represented by bus 400), bus 400 may include any number of interconnected buses and bridges, bus 400 linking together various circuits including one or more processors, represented by processor 402, and memory, represented by memory 404. The bus 400 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 406 provides an interface between the bus 400 and the receiver 401 and transmitter 403. The receiver 401 and the transmitter 403 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 402 is responsible for managing the bus 400 and general processing, while the memory 404 may be used for storing data used by the processor 402 in performing operations.
In a fourth aspect, based on the inventive concept based on the supply chain code identification method in the foregoing embodiments, the present specification embodiment further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of any one of the foregoing supply chain code identification method.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.

Claims (16)

1. A supply chain code identification method, the method comprising:
acquiring an object code of a target application program, wherein the object code comprises a supply chain code used by the target application program in a development process, and the supply chain code is a code provided by an untrusted third party introduced by the target application program in the development process;
constructing a calling map corresponding to the target code, wherein the calling map comprises a plurality of nodes, and the node attribute of each node comprises the packet name information of the code block to which the node belongs;
based on the target package name information, screening out nodes with package name information matched with the target package name information from a plurality of nodes contained in the calling map to form a target node set;
and acquiring a target characteristic value of the target node set, and if the target characteristic value meets a preset supply chain code node characteristic value range, determining a code block corresponding to the target package name information as a supply chain code.
2. The method of claim 1, the obtaining object code for a target application, comprising:
acquiring a binary code of the target application program;
and performing decompiling on the binary code, and taking a decompiling result as the target code.
3. The method of claim 1, the building a call graph corresponding to the object code, comprising:
performing code analysis on the target code, and extracting node attributes of each node and relations among the nodes to construct a graph database;
generating the calling graph based on the graph database.
4. The method of claim 1, before the filtering out nodes having packet name information matching the target packet name information from a plurality of nodes included in the calling graph based on the target packet name information, the method further comprising:
generating a packet name information set based on the packet name information corresponding to each node in the calling map;
respectively carrying out M rounds of package name cutting on the package name information sets based on M preset package name lengths to generate M groups of cut package name information sets, wherein for each round of package name cutting, the corresponding preset package name length is cut based on the round of package name, the length of each piece of package name information in the package name information sets is cut into the preset package name length to generate the round of cut package name information sets, and M is a positive integer;
and determining the target package name information in the M groups of cut package name information sets.
5. The method of claim 4, wherein the screening out nodes with packet name information matching with the target packet name information from a plurality of nodes included in the calling graph based on the target packet name information to form a target node set comprises:
traversing each packet name information in the M groups of cut packet name information sets, and obtaining a target node set corresponding to each packet name information through the following steps:
regarding each packet name information in each group of cut packet name information sets, taking the packet name information as the target packet name information, and cutting the packet name information of each node in the calling map based on the preset packet name length corresponding to the group of cut packet name information sets; clustering the cut package name information of each node based on the target package name information to obtain a target node set corresponding to the target package name information, wherein the similarity between the cut package name information of each node in the target node set and the target package name information is not less than a preset similarity.
6. The method according to claim 1, wherein the obtaining of the target characteristic value of the target node set, and if the target characteristic value satisfies a preset supply chain code node characteristic value range, determining that the code block corresponding to the target packet name information is a supply chain code, includes:
acquiring the total number of nodes, the number of entry nodes, the number of exit nodes and the number of external calling nodes contained in the target node set;
and matching the total number of the nodes, the number of the inlet nodes, the number of the outlet nodes and the number of the external calling nodes with the preset supply chain code node characteristic value range, and if the matching is successful, determining a code block corresponding to the target packet name information as a supply chain code.
7. The method of claim 1, if a plurality of supply chain codes are determined, further comprising:
determining a risk level for each of the plurality of supply chain codes;
security hardening the each supply chain code based on the risk level of the each supply chain code.
8. A supply chain code identification apparatus, the apparatus comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring an object code of a target application program, the object code comprises a supply chain code used by the target application program in a development process, and the supply chain code is a code provided by an untrusted third party introduced by the target application program in the development process;
the calling map building module is used for building a calling map corresponding to the target code, wherein the calling map comprises a plurality of nodes, and the node attribute of each node comprises the packet name information of the code block to which the node belongs;
the node screening module is used for screening out nodes with packet name information matched with the target packet name information from a plurality of nodes contained in the calling map based on the target packet name information to form a target node set;
and the identification module is used for acquiring a target characteristic value of the target node set, and if the target characteristic value meets a preset supply chain code node characteristic value range, determining a code block corresponding to the target packet name information as a supply chain code.
9. The apparatus of claim 8, the means for obtaining configured to:
acquiring a binary code of the target application program;
and performing decompiling on the binary code, and taking a decompiling result as the target code.
10. The apparatus of claim 8, the call graph building module to:
performing code analysis on the target code, and extracting node attributes of each node and relations among the nodes to construct a graph database;
generating the calling graph based on the graph database.
11. The apparatus of claim 8, the apparatus further comprising:
the package name information set generating module is used for generating a package name information set based on the package name information corresponding to each node in the calling map;
the processing module is used for respectively carrying out M rounds of package name cutting on the package name information sets based on M preset package name lengths to generate M groups of cut package name information sets, wherein for each round of package name cutting, the length of each piece of package name information in the package name information sets is cut into the preset package name length based on the preset package name length corresponding to the round of package name cutting, the cut package name information sets are generated, and M is a positive integer; and determining the target package name information in the M groups of cut package name information sets.
12. The apparatus of claim 11, the node screening module to:
traversing each packet name information in the M groups of cut packet name information sets, and obtaining a target node set corresponding to each packet name information through the following steps:
regarding each packet name information in each group of cut packet name information sets, taking the packet name information as the target packet name information, and cutting the packet name information of each node in the calling map based on the preset packet name length corresponding to the group of cut packet name information sets; clustering the cut package name information of each node based on the target package name information to obtain a target node set corresponding to the target package name information, wherein the similarity between the cut package name information of each node in the target node set and the target package name information is not less than a preset similarity.
13. The apparatus of claim 8, the identification module to:
acquiring the total number of nodes, the number of entry nodes, the number of exit nodes and the number of external calling nodes contained in the target node set;
and matching the total number of the nodes, the number of the inlet nodes, the number of the outlet nodes and the number of the external calling nodes with the preset supply chain code node characteristic value range, and if the matching is successful, determining a code block corresponding to the target packet name information as a supply chain code.
14. The apparatus of claim 8, if a plurality of supply chain codes are determined, further comprising:
a risk level determination module to determine a risk level for each of the plurality of supply chain codes;
a reinforcement module to perform security reinforcement on each supply chain code based on the risk level of each supply chain code.
15. A server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 7 when executing the program.
16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010413984.1A 2020-05-15 2020-05-15 Supply chain code identification method, device, server and readable storage medium Active CN111338622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010413984.1A CN111338622B (en) 2020-05-15 2020-05-15 Supply chain code identification method, device, server and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010413984.1A CN111338622B (en) 2020-05-15 2020-05-15 Supply chain code identification method, device, server and readable storage medium

Publications (2)

Publication Number Publication Date
CN111338622A CN111338622A (en) 2020-06-26
CN111338622B true CN111338622B (en) 2020-08-11

Family

ID=71184910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010413984.1A Active CN111338622B (en) 2020-05-15 2020-05-15 Supply chain code identification method, device, server and readable storage medium

Country Status (1)

Country Link
CN (1) CN111338622B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148305A (en) * 2020-10-28 2020-12-29 腾讯科技(深圳)有限公司 Application detection method and device, computer equipment and readable storage medium
CN113656794A (en) * 2021-08-19 2021-11-16 建信金融科技有限责任公司 Method and device for identifying third-party SDK referenced by Android application
CN116955719B (en) * 2023-09-20 2023-12-05 布谷云软件技术(南京)有限公司 Code management method and system for digital storage of chained network structure

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710492A (en) * 2018-04-20 2018-10-26 四川普思科创信息技术有限公司 A method of third party library in identification APP programs
US10146512B1 (en) * 2015-08-28 2018-12-04 Twitter, Inc. Feature switching kits
CN110727716A (en) * 2019-10-24 2020-01-24 北京智游网安科技有限公司 Identification method for integrated SDK in application, intelligent terminal and storage medium
CN111046388A (en) * 2019-12-16 2020-04-21 北京智游网安科技有限公司 Method for identifying third-party SDK in application, intelligent terminal and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10146512B1 (en) * 2015-08-28 2018-12-04 Twitter, Inc. Feature switching kits
CN108710492A (en) * 2018-04-20 2018-10-26 四川普思科创信息技术有限公司 A method of third party library in identification APP programs
CN110727716A (en) * 2019-10-24 2020-01-24 北京智游网安科技有限公司 Identification method for integrated SDK in application, intelligent terminal and storage medium
CN111046388A (en) * 2019-12-16 2020-04-21 北京智游网安科技有限公司 Method for identifying third-party SDK in application, intelligent terminal and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LibD: Scalable and Precise Third-party Library;Menghao Li等;《2017 IEEE/ACM 39th Interational Conference on Software Engineering》;20170720;第1-12页 *
大规模移动应用第三方库自动检测和分类方法;王浩宇等;《软件学报》;20170630;第28卷(第6期);第1373-1388页 *

Also Published As

Publication number Publication date
CN111338622A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN111338622B (en) Supply chain code identification method, device, server and readable storage medium
CN107392619B (en) Intelligent contract processing method and device
CN106295333B (en) method and system for detecting malicious code
CN108985057B (en) Webshell detection method and related equipment
CN111222137A (en) Program classification model training method, program classification method and device
CN111368289B (en) Malicious software detection method and device
CN112685410A (en) Business rule checking method and device, computer equipment and storage medium
CN111913878A (en) Program analysis result-based bytecode instrumentation method, device and storage medium
CN111338716A (en) Data processing method and device based on rule engine and terminal equipment
CN113065748A (en) Business risk assessment method, device, equipment and storage medium
CN117034273A (en) Android malicious software detection method and system based on graph rolling network
CN114285587A (en) Domain name identification method and device and domain name classification model acquisition method and device
CN113486359B (en) Method and device for detecting software loopholes, electronic device and storage medium
RU168346U1 (en) VULNERABILITY IDENTIFICATION DEVICE
CN115310087A (en) Website backdoor detection method and system based on abstract syntax tree
CN112016057B (en) Privacy protection method and device, evaluation method and device of code file and electronic equipment
CN113254837A (en) Application program evaluation method, device, system, equipment and medium
CN112379922B (en) Program comparison method and system
CN115630754B (en) Intelligent networking automobile information security prediction method, device, equipment and medium
CN115484105B (en) Modeling method and device for attack tree, electronic equipment and readable storage medium
CN116841564B (en) Data processing method, device, equipment and computer readable storage medium
CN115809466B (en) Security requirement generation method and device based on STRIDE model, electronic equipment and medium
CN114172705B (en) Network big data analysis method and system based on pattern recognition
CN116226854B (en) Malware detection method, system, readable storage medium and computer
CN114553550A (en) Request detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant