CN115934517A - Method for supplementing static analysis missing semantics based on network framework - Google Patents

Method for supplementing static analysis missing semantics based on network framework Download PDF

Info

Publication number
CN115934517A
CN115934517A CN202211482223.7A CN202211482223A CN115934517A CN 115934517 A CN115934517 A CN 115934517A CN 202211482223 A CN202211482223 A CN 202211482223A CN 115934517 A CN115934517 A CN 115934517A
Authority
CN
China
Prior art keywords
semantics
missing
configuration
supplementing
static analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211482223.7A
Other languages
Chinese (zh)
Inventor
孟海宁
李昊峰
曹立庆
刘晨
陆杰
李炼
高琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Tianqi Shanxi Software Security Technology Research Institute Co ltd
Original Assignee
Zhongke Tianqi Shanxi Software Security Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Tianqi Shanxi Software Security Technology Research Institute Co ltd filed Critical Zhongke Tianqi Shanxi Software Security Technology Research Institute Co ltd
Priority to CN202211482223.7A priority Critical patent/CN115934517A/en
Publication of CN115934517A publication Critical patent/CN115934517A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method for supplementing static analysis missing semantics based on a network framework, which comprises the following steps: acquiring an execution log of a target network application program, wherein the target network application program is generated based on a target network framework; extracting a configuration marker from the target web application based on the execution log; performing correlation analysis on the configuration marks to obtain supplementary semantics; and supplementing the missing semantics in the static analysis based on the supplementary semantics. According to the method, the configuration identification is obtained through automatically mining the execution log in the running process, and then missing semantics are accurately, quickly and effectively obtained through correlation analysis for supplement, so that the analysis capability of a static analysis tool is effectively improved, and compared with a manual compiling mode of researchers, the method not only effectively solves the problem of semantic missing caused by network framework configuration in static analysis, but also saves a large amount of time and human resources.

Description

Method for supplementing static analysis missing semantics based on network framework
Technical Field
The invention relates to the technical field of network services, in particular to a method for supplementing static analysis missing semantics based on a network framework.
Background
Modern Web applications are built based on one or more Web frameworks. These frameworks often provide the user with a variety of configurations and rely on dynamic language features in the implementation of the framework, making it difficult to perform static analysis of the entire program or just the application code. Existing approaches to enhance the capabilities of static analysis web-framework based applications simulate the potential behavior of the framework by adding mapping rules of the framework configuration to the static analysis semantics. The concept of static program analysis influenced by the web framework can be divided into two types, one is behavior semantic missing caused by the framework actively calling application program codes, and the other is object semantic missing caused by object management of the framework to the runtime. Both concepts describe the information required for static analysis, but are generally not directly available through pure code analysis. In the prior art, the problem of configuration amount caused by different frame types and version alternation is solved by manually configuring through technicians, and certain professional knowledge is required for characteristics of different frames and the field of static analysis, so that manually supplementing missing semantics in the static analysis process according to network frame configuration becomes extremely high-requirement and time-consuming work.
Disclosure of Invention
In view of this, the embodiment of the present invention provides a method for supplementing static analysis missing semantics based on a network framework, so as to solve the problem in the prior art that the efficiency of supplementing static analysis missing semantics is low.
In order to achieve the purpose, the invention provides the following technical scheme:
the embodiment of the invention provides a method for supplementing static analysis missing semantics based on a network framework, which comprises the following steps:
acquiring an execution log of a target network application program, wherein the target network application program is generated based on a target network framework;
extracting a configuration mark corresponding to preset missing semantics from the target network application program based on the execution log;
performing correlation analysis on the configuration marks to obtain supplementary semantics;
and supplementing the missing semantics in the static analysis based on the supplementary semantics.
Optionally, the obtaining the execution log of the target network application includes:
matching the target network application program with a preset pile inserting tool;
and the preset pile inserting tool inserts a plurality of recorders into the target network application program according to the matching result, records the running information in the running process after the target network application program is started, and generates a running log.
Optionally, the extracting, from the target network application program based on the execution log, a configuration flag corresponding to a preset missing semantic, includes:
performing semantic analysis on the execution log to obtain program execution semantics;
obtaining a program code segment corresponding to the program execution semantics according to the corresponding relation between the program execution semantics and the program code in the target network application program;
matching the program code segments with the code segments of the preset missing semantics, and establishing mapping between the preset missing semantics and the program code segments according to a matching result;
and extracting the configuration mark corresponding to the program code segment from the target network application program.
Optionally, the performing semantic analysis on the execution log to obtain a program execution semantic includes:
extracting execution sequence information and calling point information from the execution log to construct a dynamic call graph with auxiliary information, wherein the auxiliary information is information for embodying the mapping relation between a calling point and a calling target in the execution log;
traversing the dynamic call graph according to the call relation to extract entry point semantics and non-entry point semantics;
traversing the dynamic call graph according to the auxiliary information to extract indirect call semantics;
and extracting operation information related to the field semantics from the execution log to obtain the injection field semantics and the common field semantics.
Optionally, the traversing the dynamic call graph according to the call relationship to extract entry point semantics and non-entry point semantics, including:
performing node analysis on the dynamic call graph to obtain a root node and a non-root node;
extracting, by the root node, entry point semantics from the target web application, extracting non-entry point semantics from the target web application based on the non-root node.
Optionally, the performing association analysis on the configuration tag to obtain a complementary semantic, includes:
extracting the prediction missing semantics associated with each configuration mark from a preset association database;
and comparing the relevance between the predicted missing semantics and the corresponding configuration marks with a preset threshold value, and taking the predicted missing semantics with the relevance higher than the preset threshold value as supplementary semantics.
Optionally, the method further includes:
acquiring historical execution logs of a plurality of network applications, and extracting historical configuration marks corresponding to preset missing semantics from the plurality of network applications on the basis of the historical execution logs;
extracting key configuration marks from the historical configuration marks according to the occurrence frequency of the configuration marks;
calculating the association degree of all preset missing semantics based on the key configuration marks;
and establishing the association database based on the calculated association degree between the historical configuration marks.
The embodiment of the invention also provides a device for supplementing static analysis missing semantics based on a network framework, which comprises:
the acquisition module is used for acquiring an execution log of a target network application program, and the target network application program is generated based on a target network framework;
an extraction module, configured to extract a configuration tag corresponding to a preset missing semantic from the target network application based on the execution log;
the analysis module is used for carrying out correlation analysis on the configuration marks to obtain supplementary semantics;
and the supplement module is used for supplementing the missing semantics in the static analysis based on the supplement semantics.
An embodiment of the present invention further provides an electronic device, including:
the device comprises a memory and a processor, wherein the memory and the processor are connected with each other in a communication manner, the memory stores computer instructions, and the processor executes the computer instructions so as to execute the method for supplementing static analysis missing semantics based on a network framework, which is provided by the embodiment of the invention.
Embodiments of the present invention further provide a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, where the computer instructions are used to enable a computer to execute the method for supplementing static analysis missing semantics based on a network framework according to embodiments of the present invention.
The technical scheme of the invention has the following advantages:
the invention provides a method for supplementing static analysis missing semantics based on a network framework, which comprises the steps of acquiring an execution log of a target network application program, wherein the target network application program is generated based on the target network framework; extracting a configuration marker from the target web application based on the execution log; performing correlation analysis on the configuration marks to obtain supplementary semantics; and supplementing the missing semantics in the static analysis based on the supplementary semantics. According to the method, the configuration identification is obtained through automatically mining the execution log in the running process, and then missing semantics are accurately, quickly and effectively obtained through correlation analysis for supplement, so that the analysis capability of a static analysis tool is effectively improved, and compared with a manual compiling mode of researchers, the method not only effectively solves the problem of semantic missing caused by network framework configuration in static analysis, but also saves a large amount of time and human resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a method for supplementing static analysis missing semantics based on a network framework in an embodiment of the present invention;
FIG. 2 is a flow diagram of generating a run log according to an embodiment of the invention;
FIG. 3 is a flow diagram of extracting a configuration token from a target network application in accordance with an embodiment of the present invention;
FIG. 4 is a flow chart of analyzing program execution semantics according to an embodiment of the present invention;
FIG. 5 is a flow diagram for extracting entry point semantics and non-entry point semantics in an embodiment in accordance with the invention;
FIG. 6 is a flow chart of correlation analysis to obtain complementary semantics, according to an embodiment of the present invention;
FIG. 7 is a flow chart of establishing a relational database according to an embodiment of the invention;
FIG. 8 is a schematic structural diagram of an apparatus for supplementing static analysis missing semantics based on a network framework according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In accordance with an embodiment of the present invention, there is provided a method embodiment for supplementing static analysis of missing semantics based on a web framework, it is noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
In this embodiment, a method for supplementing static analysis missing semantics based on a network framework is provided, which can be used in the terminal device, such as a computer, and as shown in fig. 1, the method for supplementing static analysis missing semantics based on a network framework includes the following steps:
step S1: and acquiring an execution log of the target network application program, wherein the target network application program is generated based on the target network framework.
Step S2: and extracting a configuration mark corresponding to preset missing semantics from the target network application program based on the execution log.
And step S3: and performing correlation analysis on the configuration marks to obtain supplementary semantics.
And step S4: and supplementing the missing semantics in the static analysis based on the supplementary semantics.
Through the steps S1 to S4, the method for supplementing static analysis missing semantics based on the network framework provided by the embodiment of the invention obtains the configuration identifier by automatically mining the execution log during the operation, and then accurately, quickly and effectively obtains the missing semantics for supplementing through correlation analysis, thereby effectively improving the analysis capability of the static analysis tool.
Specifically, in an embodiment, the step S1, as shown in fig. 2, specifically includes the following steps:
step S11: and matching the target network application program with a preset pile inserting tool. Specifically, a web application archive (. War) or an executable java archive (. Jar) is input into a preset instrumentation tool, and a plurality of recorders are installed at specific positions of the program through the preset instrumentation tool.
Step S12: and the preset pile inserting tool inserts the plurality of recorders into the target network application program according to the matching result, records the running information in the running process after the target network application program is started, and generates a running log. Specifically, the running information represents the real running state of the program, and the key tracking is recorded by using a special label. High performance bytecode tools may be used such as: the Javassist tool operates on Java bytecodes to enable class modification when a JVM (Java virtual machine) loads a class. The position of insertion of the special recorder is as follows:
entry and exit points for each application method.
All call and return points per application method.
All field access instructions (read/write) in each application method.
In addition to a special identifier indicating the location, a run-time thread number is required to ensure order. Each recorder will record declaration information for the instruction and the value of the object used at runtime. Especially for the start position of a program, if the program is not static, the run-time information of all fields belonging to the current (this) object will also be recorded when the program is executed.
Specifically, in an embodiment, before the step S2, the method further includes: and traversing each record in the execution log, and deleting abnormal records needing to be identified or eliminated. Specifically, the exception record may be deleted by using bracket matching. The reliability of log content can be enhanced by deleting the abnormal record, and the subsequent information processing precision is improved.
Acquiring a thread identification number recorded in each line in an execution log; grouping the records of each row according to the thread identification numbers to obtain a plurality of record sets; and sequencing the record sets according to the sizes of the thread identification numbers. Specifically, the execution log contains runtime call and point information, and each line in the text that records runtime information is a record. The raw output of the execution log may be confusing because records under multiple execution traces and different threads may be interleaved. The record interleaving problem can be solved by grouping the records according to the thread identification numbers, and the ordered records under the same thread are regarded as a record set. The record set describes characteristics of the application code triggered by the framework or external actions. Subsequent convenience and analysis can be facilitated by using all runtime traces to build a Dynamic Call Graph (DCG) of the application, where each node represents an application and directed edges represent calls to the program. In the call graph, even if the call targets adopt the same call style, they are regarded as different call targets. The call point, return point, field read/write information belonging to the program are also recorded in each node, and all calling modes will be different according to the call point. The root of the call graph or the children of the same node are ordered, and the order depends on the runtime order.
Specifically, in an embodiment, the step S2, as shown in fig. 3, specifically includes the following steps:
step S21: and performing semantic analysis on the execution log to obtain program execution semantics.
Step S22: and obtaining a program code segment corresponding to the program execution semantics according to the corresponding relation between the program execution semantics and the program codes in the target network application program. Specifically, because semantics correspond to program code fragments, and a program corresponds to network framework configuration, the existing static analysis method usually does not consider the characteristics of the network framework and the network framework itself is difficult to directly perform static analysis, so that semantics caused by frame missing during analysis affect the static analysis capability. .
Step S23: and matching the program code segments with the preset missing semantics, and establishing mapping between the preset missing semantics and the program code segments according to the matching result. Specifically, the default missing semantics are defined in advance according to the existing working and object-oriented language characteristics, and can be supplemented according to actual conditions.
Step S24: configuration indicia corresponding to the program code segments are extracted from the target web application. Specifically, the configuration tag types include annotations and external files (e.g., external files in XML format). Without the above configuration, a manner of extending the functions of the framework by subclassing is also taken into consideration. The annotated and sub-typed configuration tags and values may be obtained directly, while XML configuration tags and values may be obtained by traversing the XML node structure. A dynamic call graph is constructed from an input binary file by a class-level algorithm using a static analysis framework. And for the entry node semantics and the non-entry node semantics, searching corresponding program code segments, and extracting corresponding configuration marks according to the program code segments. An entry point marker is of the form:
<x|x∈mark class },{y|y∈mark method }>
the positive set is a set of associations between entry node semantics and calling configuration tags, and the negative set is a set of associations between non-entry node semantics and calling configuration tags.
And obtaining a common field configuration mark set according to the configuration mark on each common field semantic corresponding program segment. Each configuration flag is of the form { x | x ∈ mark _ field }. And obtaining an injection field configuration mark set according to the configuration mark on the program fragment corresponding to each injection field semantic.
The relationship between field references and objects can be resolved by querying the mapping of each field to all its possible runtime types. For each field and runtime type pair (e.g.<field,typeruntime>) The connection between the field and the class is looked up in the feature set shown in table I. For example, in the first placeIn section II-A, the annotation value (@ Qualifier ("userService 2")) on field "Service" matches the annotation value (@ Service ("userService 2")) on type "Service 2". A connection is considered to exist when there are any pairs of features on the field that match features on the class. For each injected field, if a connection is found to exist, the connection will be a direction marker. By extracting the mark on the field and recording the mark matching the characteristic value in the form of<{x|x∈mark inject },{y∈mark points-to }>
Meanwhile, matching features on the class are treated as alias tags. If no connection is found, but the field belongs to the set of injected fields, then the flag is recorded directly on the field only. For a field that is not injected but finds a connection, its connection does not belong to the pointing tag.
Table I configuration characteristics associated with fields and classes
Field(s) Class I
Configuration value (Note) Configuration value (Note)
Configuration value (XML attribute) Configuration value (XML attribute)
Type of declaration Simple name of class
Specifically, in an embodiment, the step S21, as shown in fig. 4, specifically includes the following steps:
step S211: and extracting execution sequence information and calling point information from the execution log to construct a dynamic call graph with attached information, wherein the attached information is information for reflecting the mapping relation between the calling point and the calling target in the execution log. Specifically, each node in the dynamic call graph represents an application function, and the directed edges represent the call of the program. In the call graph, nodes of the same signature are treated as different nodes even if they occur multiple times because their call contexts are different. The auxiliary information is read/write information of call point, return point and field recorded on each node.
Step S212: and traversing the dynamic call graph according to the call relation to extract entry point semantics and non-entry point semantics.
Step S213: and traversing the dynamic call graph according to the auxiliary information to extract indirect call semantics. Specifically, a calling point and a calling target pair are extracted from the auxiliary information of the dynamic call graph, and the calling semantics of which the calling point is a frame interface are screened out. An indirect call, i.e. a framework API, will trigger the process of indirectly calling an application: by finding log records of corresponding indirect calls that match pattern pairs (application invoke frames and frame invoke applications); extracting the configuration of a program to which the framework call belongs, the configuration of a framework call statement and the configuration of an indirect target; the caller configuration, the call statement and the target configuration constitute an indirect call configuration. The process by which the application code calls the framework API can be obtained by executing a call point record in the log. An application triggered by calling the framework API will satisfy the following condition:
the target of a call in the execution path after a framework API call point and before its call returns.
Call points for which the target does not match.
Step S214: and extracting the operation information related to the field semantics from the execution log to obtain the injected field semantics and the common field semantics. Specifically, the configuration flag of the field specifically pointing to the target is determined by analyzing the execution log: references injected by the framework and objects injected into the fields. The reason that the object is injected into the field is indicated by determining the configuration connection between the reference and the injected object. And extracting the field to be injected and the configuration mark on the object to be injected to serve as a candidate semantic mark set. And distinguishes the framework injection field from the normal field for subsequent key configuration flag mining. All field access (read/write) records and field runtime information may be collected by building a dynamic call graph and traversing the dynamic call graph. In the execution log, the injection field may be reflected in the following cases:
this field is used at runtime, or its reference does not point to a null value;
at runtime, the method that sets the field value is called by the framework or never.
All fields that satisfy the above condition are collected and then each field is mapped to all possible runtime types by executing the record. Likewise, a normal field indicating that the field is not injected by the framework will satisfy any of the following conditions:
the field reference always points to a null value at runtime.
At least one method sets the value of the field called by the application code so that the field points to an object.
The objects created and hosted by the framework include not only the parameters of the base class objects and the entry methods, but also the objects bound to the references. Different representations of a target are described in the concept of aliases, taking into account the configuration of user-defined names that may support classes, methods, or fields.
Specifically, in an embodiment, the step S212, as shown in fig. 5, specifically includes the following steps:
step S2121: and carrying out node analysis on the dynamic call graph to obtain a root node and a non-root node. Specifically, the node information and the calling condition corresponding to each node can be obtained by dynamically calling the actual calling condition of the calling point in the graph. When the calling point and the called party share the same method signature, or the class of the called party is the subclass or the implementation class of the calling point declaration class, the calling point is considered to be matched with the called party. The root node of the dynamic call graph is an entry node, and the node without any matching call point information is a non-root node.
Step S2122: entry point semantics are extracted from the target web application by the root node and non-entry point semantics are extracted from the target web application based on the non-root node.
Specifically, in an embodiment, the step S3, as shown in fig. 6, specifically includes the following steps:
step S31: and extracting the prediction missing semantics associated with each configuration mark from a preset association database.
Step S32: and comparing the correlation degree between the predicted missing semantics and the corresponding configuration marks with a preset threshold value, and taking the predicted missing semantics with the correlation degree higher than the preset threshold value as supplementary semantics.
Specifically, the predicted configuration mark associated with each configuration mark can be obtained according to the association degree, and missing semantics with high occurrence probability can be screened out more accurately as supplementary semantics by comparing with a preset threshold.
Specifically, in an embodiment, the step S3, as shown in fig. 7, specifically includes the following steps:
step S301: the method comprises the steps of obtaining historical execution logs of a plurality of network applications and extracting historical configuration marks corresponding to preset missing semantics from the plurality of network applications based on the historical execution logs. Specifically, the process may refer to the processes of steps S1 and S2, and will not be described herein.
Step S302: and extracting key configuration marks from the historical configuration marks according to the occurrence frequency of the configuration marks. Specifically, frequent item set mining calculations may be performed using, for example, the Apriori algorithm, which mines k +1-itemset using k-itemset with a given database as input. And reducing a search space by using Apriori characteristics through layer-by-layer search, and finally generating a frequent item set. And the screening process can be carried out in a mode of comparing the occurrence frequencies of all the configuration marks with a threshold value in a mode of presetting a frequency threshold value by screening the configuration marks with high occurrence frequencies as key configuration marks.
Step S303: and calculating the association degree of all preset missing semantics based on the key configuration marks.
Step S304: and establishing an association database based on the calculated association degree between the historical configuration marks. Specifically, establishing a relational database is a process for discovering various related information behind a large amount of data. For example: the association rule is a meaning expression, which is recorded as
Figure BDA0003962173660000151
Reflecting the relationships between elements in the database. Support and confidence are two key indicators in association rule mining. The support table indicates how often a set of items appears in the dataset. Confidence represents the percentage of all transactions that satisfy both a and B.
For example: table II is a database containing 5 transactions. When the support degree is set to 0.5, the frequent item sets are { A }, { B }, { C }, { D }, { A, B }, and { A, D }. For the frequent item set { A, B }, the support is 0.6, i.e., the probability of A and B occurring simultaneously is 0.6. The frequent item set has two association rules, which are recorded as
Figure BDA0003962173660000152
And &>
Figure BDA0003962173660000153
Confidence->
Figure BDA0003962173660000154
When A is represented, the probability of B being 0.75. For the second rule, a confidence of 1 indicates that when B occurs, A also occurs.
Table II mining example database
TID ITEMS
001 A,B,C
002 A,B,D
003 C,D
004 A,D
005 A,B,C,D
The association rules can effectively reflect the relationship between the configuration marks. The configuration identifier related to each configuration representation is found more quickly and accurately in the subsequent query process by establishing the association database.
By analyzing the historical execution logs of the plurality of network application programs and establishing the preset configuration database according to the semantic correspondence between the historical configuration marks and the code segments, when missing semantics are found, the frame configuration corresponding to the missing semantics is extracted from the preset configuration database based on the missing semantics to supplement the configuration of the target network frame.
Specifically, in an embodiment, the method further includes: the configuration specification is built from the configuration tags. Specifically, each entity in the set IS tagged with an "IS" tag or a "NOT" tag by its source via a positive/negative configuration. Each entity in the positive/negative configuration set is used for mining.
Set the minimum support used in Apriori algorithm to 1/len (marks set) to ensure that all item combinations are relevant and that these combinations appear at least once in the database.
Compute frequent item sets and association rules: all sets of items containing "IS" are extracted and all possible association rules between sets of items are computed. Wherein the form of the association rule will be similar to
Figure BDA0003962173660000171
When the constraints on the left are satisfied, the given concept will be satisfied. For each generated association rule, the confidence level of the association rule needs to be calculated and recorded.
Query configuration specification: after mining the frequent itemsets and association rules for each concept, the query contains a configuration specification that analyzes the mapping of concepts to configuration markup. The user may select the confidence value as desired, with the higher the confidence value, the fewer results filtered. After a confidence value is selected, the association rules that satisfy the confidence under each semantic are filtered out
Figure BDA0003962173660000172
The configuration specification of (1). Wherein the generation of each concept and its configuration tag:
entry point attention association rules, containing: "IS" and the configuration tag to which the semantic corresponds. The filtered tokens and key configuration tokens of the entry point semantics are entered under a given confidence condition.
The field injection and pointing concepts, the form used to express the injected field and pointing pairs would be similar to<{x|x∈markinject},{y|y∈markpoints-to}>By selecting association rules, e.g.
Figure BDA0003962173660000173
Figure BDA0003962173660000174
And (4) determining. When a rule satisfies a given confidence, x indicates that the framework will inject a field with such a configuration flag. At the same time, configuration y means that the field reference will point to the target of the configuration in this tag.
Indirect calls, the form (mcallee, API, mtarget) representing the framework API, the configuration flags of the methods that call this API, and the methods that are triggered by this API.
The call sequence, which needs to be filtered with a given confidence level to obtain the configuration flag of the call sequence.
Others: for alias and framework management class semantics, the tokens can be computed directly as the final result due to the simplicity of the set composition.
The reflection of the analysis level concept in the framework configuration can be derived by a concrete implementation. In practical application, the method in the embodiment of the present application is only used for generating the specification when the application program on the relevant framework is deployed and established, and the tedious framework reference does not need to be read and understood.
In this embodiment, a device for supplementing static analysis missing semantics based on a network framework is also provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and the description of the device that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
The embodiment provides a device for supplementing static analysis missing semantics based on a network framework, as shown in fig. 8, including:
the obtaining module 101 is configured to obtain an execution log of a target network application, where the target network application is generated based on a target network framework, and details of the target network application refer to related description of step S1 in the foregoing method embodiment, and are not described herein again.
The extracting module 102 is configured to extract a configuration flag corresponding to a preset missing semantic from the target network application based on the execution log, for details, refer to the related description of step S2 in the foregoing method embodiment, and details are not described here again.
The analysis module 103 is configured to perform association analysis on the configuration flag to obtain a complementary semantic, and for details, refer to the related description of step S3 in the foregoing method embodiment, which is not described herein again.
The supplement module 104 is configured to supplement the missing semantics in the static analysis based on the supplementary semantics, and for details, refer to the relevant description of step S4 in the foregoing method embodiment, which is not described herein again.
The network framework-based apparatus for supplementing static missing semantics in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and a memory for executing one or more software or fixed programs, and/or other devices capable of providing the above functions.
Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.
There is also provided an electronic device according to an embodiment of the present invention, as shown in fig. 9, the electronic device may include a processor 901 and a memory 902, where the processor 901 and the memory 902 may be connected by a bus or in another manner, and fig. 9 takes the example of being connected by a bus as an example.
Processor 901 may be a Central Processing Unit (CPU). The Processor 901 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 902, which is a non-transitory computer readable storage medium, may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods in the method embodiments of the present invention. The processor 901 executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the methods in the above-described method embodiments.
The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 901, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the processor 901 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 902, which when executed by the processor 901 performs the methods in the above-described method embodiments.
The specific details of the electronic device may be understood by referring to the corresponding related description and effects in the above method embodiments, which are not described herein again.
Those skilled in the art will appreciate that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer readable storage medium and can include the processes of the embodiments of the methods described above when executed. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A method for supplementing static analysis missing semantics based on a network framework is characterized by comprising the following steps:
acquiring an execution log of a target network application program, wherein the target network application program is generated based on a target network framework;
extracting a configuration mark corresponding to preset missing semantics from the target network application program based on the execution log;
performing correlation analysis on the configuration marks to obtain supplementary semantics;
and supplementing the missing semantics in the static analysis based on the supplementary semantics.
2. The method for supplementing static analysis missing semantics to a web-based framework according to claim 1, wherein said obtaining an execution log of a target web application comprises:
matching the target network application program with a preset pile inserting tool;
and the preset pile inserting tool inserts a plurality of recorders into the target network application program according to the matching result, records the running information in the running process after the target network application program is started, and generates a running log.
3. The method for supplementing static analysis missing semantics according to claim 1, wherein said extracting configuration tags corresponding to preset missing semantics from said target web application based on said execution log comprises:
performing semantic analysis on the execution log to obtain program execution semantics;
obtaining a program code segment corresponding to the program execution semantics according to the corresponding relation between the program execution semantics and the program code in the target network application program;
matching the program code segments with the code segments of the preset missing semantics, and establishing mapping between the preset missing semantics and the program code segments according to a matching result;
and extracting the configuration mark corresponding to the program code segment from the target network application program.
4. The method for supplementing static analysis missing semantics onto a network framework according to claim 3, wherein the performing semantic analysis on the execution log to obtain program execution semantics includes:
extracting execution sequence information and calling point information from the execution log to construct a dynamic call graph with auxiliary information, wherein the auxiliary information is information for embodying the mapping relation between a calling point and a calling target in the execution log;
traversing the dynamic call graph according to the call relation to extract entry point semantics and non-entry point semantics;
traversing the dynamic call graph according to the auxiliary information to extract indirect call semantics;
and extracting the operation information related to the field semantics from the execution log to obtain the injection field semantics and the common field semantics.
5. The method for supplementing static analysis missing semantics with a web-based framework according to claim 4, wherein said traversing the dynamic call graph according to the call relation extracts entry point semantics and non-entry point semantics, comprising:
performing node analysis on the dynamic call graph to obtain a root node and a non-root node;
extracting, by the root node, entry point semantics from the target web application, extracting non-entry point semantics from the target web application based on the non-root node.
6. The method for supplementing static analysis missing semantics based on a network framework according to claim 1, wherein the performing correlation analysis on the configuration tag to obtain the supplemented semantics comprises:
extracting the prediction missing semantics associated with each configuration mark from a preset association database;
and comparing the relevance between the predicted missing semantics and the corresponding configuration marks with a preset threshold value, and taking the predicted missing semantics with the relevance higher than the preset threshold value as supplementary semantics.
7. The method for supplementing static analysis missing semantics according to claim 6 further comprising:
acquiring historical execution logs of a plurality of network applications and extracting historical configuration marks corresponding to preset missing semantics from the plurality of network applications based on the historical execution logs;
extracting key configuration marks from the historical configuration marks according to the occurrence frequency of the configuration marks;
calculating the association degree of all preset missing semantics based on the key configuration marks;
and establishing the association database based on the calculated association degree between the historical configuration marks.
8. An apparatus for supplementing static analysis missing semantics based on a web framework, comprising:
the acquisition module is used for acquiring an execution log of a target network application program, and the target network application program is generated based on a target network framework;
the extraction module is used for extracting a configuration mark corresponding to preset missing semantics from the target network application program based on the execution log;
the analysis module is used for carrying out correlation analysis on the configuration marks to obtain supplementary semantics;
and the supplement module is used for supplementing missing semantics in the static analysis based on the supplement semantics.
9. An electronic device, comprising:
a memory and a processor, the memory and the processor being communicatively coupled to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method for supplementing static analysis missing semantics based on a web framework according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method for supplementing static analysis missing semantics based on a web framework of any one of claims 1-7.
CN202211482223.7A 2022-11-24 2022-11-24 Method for supplementing static analysis missing semantics based on network framework Pending CN115934517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211482223.7A CN115934517A (en) 2022-11-24 2022-11-24 Method for supplementing static analysis missing semantics based on network framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211482223.7A CN115934517A (en) 2022-11-24 2022-11-24 Method for supplementing static analysis missing semantics based on network framework

Publications (1)

Publication Number Publication Date
CN115934517A true CN115934517A (en) 2023-04-07

Family

ID=86553172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211482223.7A Pending CN115934517A (en) 2022-11-24 2022-11-24 Method for supplementing static analysis missing semantics based on network framework

Country Status (1)

Country Link
CN (1) CN115934517A (en)

Similar Documents

Publication Publication Date Title
US9400733B2 (en) Pattern matching framework for log analysis
US8984485B2 (en) Analysis of source code changes
US10489367B2 (en) Generating an index for a table in a database background
US20210374195A1 (en) Information processing method, electronic device and storage medium
CN108733543B (en) Log analysis method and device, electronic equipment and readable storage medium
CN111240772A (en) Data processing method and device based on block chain and storage medium
CN114416481A (en) Log analysis method, device, equipment and storage medium
CN114201756A (en) Vulnerability detection method and related device for intelligent contract code segment
US20130204839A1 (en) Validating Files Using a Sliding Window to Access and Correlate Records in an Arbitrarily Large Dataset
CN113760891A (en) Data table generation method, device, equipment and storage medium
CN116775488A (en) Abnormal data determination method, device, equipment, medium and product
CN115934517A (en) Method for supplementing static analysis missing semantics based on network framework
CN115203435A (en) Entity relation generation method and data query method based on knowledge graph
CN115391785A (en) Method, device and equipment for detecting risks of software bugs
Ghosh et al. An empirical study of a hybrid code clone detection approach on java byte code
US11727059B2 (en) Retrieval sentence utilization device and retrieval sentence utilization method
CN112860265A (en) Method and device for detecting operation abnormity of source code database
CN107506299B (en) Code analysis method and terminal equipment
CN116483735B (en) Method, device, storage medium and equipment for analyzing influence of code change
CN112988778A (en) Method and device for processing database query script
CN116450682B (en) Model generation method, device, equipment and medium based on data combination
US20240152507A1 (en) Schema-free static query template
CN115203060B (en) IAST-based security testing method and device
CN114880351B (en) Recognition method and device of slow query statement, storage medium and electronic equipment
WO2021255841A1 (en) Information retrieval device, information retrieval method, and computer readable recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination