CN114328208A - Code detection method and device, electronic equipment and storage medium - Google Patents

Code detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114328208A
CN114328208A CN202111599410.9A CN202111599410A CN114328208A CN 114328208 A CN114328208 A CN 114328208A CN 202111599410 A CN202111599410 A CN 202111599410A CN 114328208 A CN114328208 A CN 114328208A
Authority
CN
China
Prior art keywords
program
code
stain
detected
taint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111599410.9A
Other languages
Chinese (zh)
Inventor
纪妙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202111599410.9A priority Critical patent/CN114328208A/en
Publication of CN114328208A publication Critical patent/CN114328208A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Stored Programmes (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure provides a code detection method and device, electronic equipment and a storage medium, relates to the technical field of information security, and can be applied to a scene of taint information flow analysis. The code detection method comprises the following steps: analyzing a grammar structure of a program code to be detected to obtain a program grammar rule and a data dependency relationship; extracting a program code matched with the stain mark library from the program code to be detected to obtain stain matching information; abstracting program elements irrelevant to the target stain in the program code to be detected based on the program syntax rules, the data dependency relationship and the stain matching information to generate a target abstract code; and carrying out program element variation-based taint information flow analysis on the target abstract code to obtain a detection result of the program code to be detected. The technical scheme of the embodiment of the disclosure can improve the efficiency of code detection and the accuracy of taint information flow analysis.

Description

Code detection method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of information security technologies, and in particular, to a code detection method, a code detection apparatus, an electronic device, and a computer-readable storage medium.
Background
Taint analysis is one of important practical means of information flow analysis, and inaccurate taint analysis results can cause a large amount of false reports on code information flow analysis results.
Taint analysis in code detection can be divided into static taint analysis and dynamic taint analysis. Static taint analysis is performed based on a form of alias detection, and has the problems of precision and efficiency. The dynamic taint analysis is performed based on a code instrumentation form, depends on program execution, has a single coverage path, and cannot comprehensively and accurately position the information of the taint propagation path in the whole text.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the disclosed embodiments is to provide a code detection method, a code detection apparatus, an electronic device, and a computer-readable storage medium, which overcome the problem of low accuracy of the taint analysis result in the related art at least to some extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the embodiments of the present disclosure, there is provided a code detection method, including:
analyzing a grammar structure of a program code to be detected to obtain a program grammar rule and a data dependency relationship;
extracting a program code matched with a stain mark library in the program code to be detected to obtain stain matching information;
abstracting program elements irrelevant to a target stain in the program code to be detected based on the program syntax rules, the data dependency relationship and the stain matching information to generate target abstract codes, wherein the target stain is a program element conforming to the stain matching information characteristics;
and carrying out program element variation-based taint information flow analysis on the target abstract code to obtain a detection result of the program code to be detected.
In some example embodiments of the present disclosure, based on the foregoing scheme, the performing, based on the target abstract code, a taint information flow analysis based on program element variation to obtain a detection result of the program code to be detected includes:
performing variation of set times on the program elements of the target abstract codes;
verifying whether an executable path exists between a stain source to be detected and a stain convergent point of the target abstract code and the target abstract code after each variation, and recording a verification result;
calculating the suspiciousness of the program element based on the verification result;
and outputting the detection result of the program code to be detected according to the suspicious degree of the program element.
In some example embodiments of the present disclosure, based on the foregoing solution, the target abstract code includes a taint assertion marking a taint convergence point location, and for the target abstract code and the target abstract code after each mutation, verifying whether an executable path exists between a taint source and a taint convergence point of the target abstract code to be detected includes:
verifying whether the execution result of the program entry parameter of the target abstract code and the target abstract code after each mutation meets the taint assertion, wherein the program entry parameter is determined based on the taint matching information;
if yes, recording the verification result as that the path is reachable;
if not, recording the verification result as that the path is not reachable.
In some example embodiments of the present disclosure, based on the foregoing scheme, the calculating the suspiciousness of the program element based on the verification result includes:
calculating the doubtful degree of the program element according to a doubtful degree calculation formula based on the verification result; the suspicious degree calculation formula is as follows:
Figure BDA0003432521560000021
wherein m represents any program element, and FmM1 is the suspicious degree of the program element m, m1 is the reachable times of the path corresponding to the program element m in the verification result, and m2 is the suspicious degree of the path corresponding to the program element m in the verification resultNumber of times the path is unreachable, ∑mm1 is the total number of times the path of all verified program elements of the target abstract code is unreachable, Σ, in the verification resultmm2 is the total number of times the path of all verified program elements of the target abstract code in the verification result can be reached.
In some example embodiments of the present disclosure, based on the foregoing scheme, the stain marking library includes a stain source marking function, a stain convergence point marking function, and a innocent treatment marking function; extracting program codes matched with a stain mark library in the program codes to be detected to obtain stain matching information, wherein the stain matching information comprises the following steps:
and extracting program codes matched with the stain source marking function, the stain convergent point marking function and the innocent treatment marking function from the program codes to be detected to obtain stain matching information.
In some example embodiments of the present disclosure, based on the foregoing scheme, the parsing a syntax structure of the program code to be detected to obtain a program syntax rule and a data dependency relationship includes:
performing lexical analysis and syntactic analysis on the program code to be detected to generate an abstract syntax tree;
generating program syntax rules based on the abstract syntax tree;
and generating a program calling relation graph based on the abstract syntax tree, and extracting data dependency relations from the program calling relation graph.
In some example embodiments of the present disclosure, based on the foregoing scheme, the abstracting, based on the program syntax rule, the data dependency relationship, and the taint matching information, a program element, which is irrelevant to a target taint, in the program code to be detected to generate a target abstract code includes:
based on the program syntax rule, the data dependency relationship and the stain matching information, and in combination with the abstract syntax tree and the program calling relationship graph, removing program elements irrelevant to the target stain in the program code to be detected to obtain an abstract code to be processed;
and adding a taint assertion for marking the taint convergence point position in the abstract code to be processed to obtain a target abstract code.
According to a second aspect of the embodiments of the present disclosure, there is provided a code detection apparatus including:
the analysis module is used for analyzing the syntactic structure of the program code to be detected to obtain a program syntactic rule and a data dependency relationship;
the extraction module is used for extracting the program codes matched with the stain mark library in the program codes to be detected to obtain stain matching information;
the abstract processing module is used for carrying out abstract processing on program elements irrelevant to a target stain in the program code to be detected based on the program syntax rules, the data dependency relationship and the stain matching information to generate a target abstract code, wherein the target stain is a program element conforming to the stain matching information characteristics;
and the analysis module is used for carrying out stain information flow analysis on the target abstract code to obtain a detection result of the program code to be detected.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; and a memory having computer readable instructions stored thereon which, when executed by the processor, implement the code detection method of any of the above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a code detection method according to any one of the above.
According to the code detection method, the code detection device, the electronic equipment and the computer readable storage medium provided by the embodiment of the disclosure, the program syntax rule and the data dependency relationship are obtained by analyzing the syntax structure of the program code to be detected, the program code matched with the taint mark library in the program code to be detected is extracted to obtain the taint matching information, the program element irrelevant to the target taint in the program code to be detected is abstracted based on the program syntax rule, the data dependency relationship and the taint matching information to generate the target abstract code, and then the taint information flow analysis based on the program element variation is performed on the target abstract code to obtain the detection result of the program code to be detected, so that the detection of the program code to be detected is realized. On one hand, irrelevant paths and program statements in the program code to be detected can be reduced by analyzing and abstracting the program code to be detected, so that the complexity of stain information flow analysis is reduced, and the code detection efficiency is improved; on the other hand, the extracted target abstract code is subjected to program element variation-based taint information flow analysis, so that multiple times of program element verification analysis can be performed on the target abstract code, the program code to be detected can be comprehensively analyzed from different paths and the taint path can be positioned, and the accuracy of taint information flow analysis in code detection is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
FIG. 1 schematically illustrates a schematic diagram of a code detection method flow, according to some embodiments of the present disclosure;
FIG. 2 schematically illustrates a schematic diagram of a program call relationship diagram, in accordance with some embodiments of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of program element variation-based taint dataflow analysis of target abstract code, according to some embodiments of the present disclosure;
FIG. 4 schematically illustrates a schematic diagram of a code detection apparatus according to some embodiments of the present disclosure;
FIG. 5 schematically illustrates a structural schematic of an electronic device according to some embodiments of the present disclosure;
fig. 6 schematically illustrates a schematic diagram of a computer-readable storage medium, according to some embodiments of the present disclosure.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
Furthermore, the drawings are merely schematic illustrations and are not necessarily drawn to scale. The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
In the present exemplary embodiment, a code detection method is first provided, and the code detection method may be applied to a terminal device, such as an electronic device like a mobile phone or a computer. Fig. 1 schematically illustrates a schematic diagram of a code detection method flow, according to some embodiments of the present disclosure. Referring to fig. 1, the code detection method may include the steps of:
step S110, analyzing a grammar structure of a program code to be detected to obtain a program grammar rule and a data dependency relationship;
step S120, extracting a program code matched with the stain mark library from the program code to be detected to obtain stain matching information;
step S130, abstracting program elements irrelevant to the target stain in the program code to be detected based on the program syntax rules, the data dependency relationship and the stain matching information to generate target abstract codes, wherein the target stain is the program elements according with the stain matching information characteristics;
and step S140, performing program element variation-based taint information flow analysis on the target abstract code to obtain a detection result of the program code to be detected.
According to the code detection method in the embodiment, on one hand, irrelevant paths and program statements in the program code to be detected can be reduced by analyzing and abstracting the program code to be detected, so that the complexity of stain information flow analysis is reduced, and the code detection efficiency is improved; on the other hand, the extracted target abstract code is subjected to program element variation-based taint information flow analysis, so that multiple times of program element verification analysis can be performed on the target abstract code, the program code to be detected can be comprehensively analyzed from different paths and the taint path can be positioned, and the accuracy of taint information flow analysis in code detection is improved.
Next, a code detection method in the present exemplary embodiment will be further described.
In step S110, the syntax structure of the program code to be detected is analyzed to obtain the program syntax rules and the data dependency relationship.
For example, for the program code to be detected, lexical analysis and syntax analysis may be performed on the program code to be detected to generate an abstract syntax tree, then a program syntax rule is generated based on the abstract syntax tree, a program call relation diagram is generated based on the abstract syntax tree, and a data dependency relationship is extracted from the program call relation diagram.
For example, the contents of a program code file to be detected are as follows:
Figure BDA0003432521560000071
the program code to be detected can be analyzed lexically and syntactically, each syntactical structure in the program code to be detected is represented by a node on the tree, and an abstract syntax tree can be generated. Based on the abstract syntax tree, information such as expressions, characters and the like of the program code to be detected can be expanded and defined according to syntax, and a program syntax rule is generated. For example, one of the expressions Expr may be obtained based on the abstract syntax tree, the expression Expr may be expanded to define expressions Expr + Expr, Expr-Expr, Expr × Expr, and the like, and characters, such as a character "s", may be obtained based on the abstract syntax tree, and the expression definitions and the character definitions may be aggregated together to generate a program syntax rule.
After the abstract syntax tree is generated, a program call relation graph can be generated based on the abstract syntax tree, and the program call relation graph can be generated based on a function call relation in the abstract syntax tree. For example, fig. 2 schematically illustrates a program call relationship diagram of the program code to be detected, and according to the program call relationship diagram illustrated in fig. 2, the data dependency relationship extracted from the program code may be as indicated by a dashed arrow in the diagram, for example, a character "s" in the input program code may be transmitted to a function "source"(s), and a data dependency relationship exists between "s" and "source"(s) "; then, the s can be respectively propagated along the paths II, III and IV through the source to form the data dependency relationship on each path.
In step S120, the program code matched with the stain marking library in the program code to be detected is extracted to obtain stain matching information.
A taint mark library may be preset, which may include a taint source mark function, a taint point of convergence mark function, and a detoxification mark function, and may indicate which functions may be a taint source, a taint point of convergence, or a detoxification function. Based on the stain mark library, the program codes matched with the stain source mark function, the stain convergent point mark function and the harmless treatment mark function can be extracted from the program codes to be detected, and stain matching information is obtained. The program code in the taint matching information may include at least one of a possible taint source, a possible taint aggregation point, and a possible detoxification function. For example, the taint matching information may be extracted from the program code to be detected based on known marking functions and program entry parameter characteristics in the taint marking library.
For example, after the program code to be detected in step S110 is matched with the taint mark library, it may be matched that the function source () may be a taint source, and the function sink () may be a taint gathering point, and then the program code having a relationship with the source () and sink () may be extracted to obtain the taint matching information.
In step S130, abstract processing is performed on program elements, which are unrelated to the target stain, in the program code to be detected based on the program syntax rule, the data dependency relationship, and the stain matching information, so as to generate a target abstract code.
The target taint is a program element which accords with taint matching information, for example, the taint matching information comprises source (), source.and sink (), and the program element comprising source (), source.and sink () can be used as the target taint.
For example, based on the program syntax rule and the data dependency relationship obtained in step S110 and the stain matching information obtained in step S120, and in combination with the abstract syntax tree and the program call relationship diagram obtained in step S110, the program elements irrelevant to the target stain in the program code to be detected are removed to obtain an abstract code to be processed, and then a stain assertion for marking the position of the stain converging point is added to the abstract code to be processed to obtain the target abstract code.
For example, for the above program code to be detected, the obtained target abstract code may be represented as follows:
Figure BDA0003432521560000081
it reserves the program codes related to source (), source, and sink (), omits the judgment condition "if (source. length () > MAX)" and the branch condition "else { dosemetingElse (); and therefore, irrelevant paths and irrelevant program statements are eliminated, and the complexity of subsequent information flow analysis is reduced. If a plurality of branch paths exist in the program code to be detected, abstraction can be respectively carried out according to different path combinations.
Wherein, sink (sink) is a possible stain explosion point, i.e. a stain convergence point, and an assertion alert (true) can be added at the position of the possible stain convergence point.
In step S140, a taint information flow analysis based on program element variation is performed on the target abstract code to obtain a detection result of the program code to be detected.
After the target abstract code is obtained, program entry parameters can be input into the target abstract code, whether a path between a program entry and a stain convergence point is reachable or not is verified by using a dynamic symbolic execution technology, and a verification result is recorded.
For example, a possible source of a pollution point in the program code to be detected may be obtained through the matching process of step S120, and relevant program entry parameters may be input into the target abstract code according to entry parameter characteristics of the possible source of the pollution point.
Then, the program elements of the target abstract code can be mutated for a set number of times, whether an executable path exists between a to-be-detected stain point source and a stain convergence point is verified again based on the mutated target abstract code, and a verification result is recorded.
In an example embodiment of the present disclosure, as illustrated with reference to fig. 3, step S140 may include step S310 to step S360.
In step S310, program entry parameters are input into the target abstract code.
In step S320, it is verified whether a path between the program entry and the stain convergence point is reachable based on the program entry parameter, and the verification result is recorded.
In step S330, the program element of the target abstract code is mutated by a set number of times.
For example, the program element in the target abstract code may be mutated for a set number of times according to the program syntax rule and the taint matching information in step S110 and step S120. The program elements may include entry parameters of functions in the code and program statements in the code. Accordingly, the mutation to the program element may be a mutation entry parameter, a conditional statement in the mutation program, or both a mutation entry parameter and a conditional statement.
For example, the values of the entry parameters of the function in the taint matching information may be changed, or the types of the entry parameters may be modified, etc. For another example, when the program code to be detected is abstracted, some condition statements such as judgment conditions and loop conditions are ignored, and these condition statements are recorded in the program syntax rules, and when the program elements are changed, these condition statements can be appropriately changed or restored according to the record of the program syntax rules.
By means of the variation of the set times of the program elements in the target abstract codes, different execution paths and combination abstractions can be selected to verify the execution paths of the target abstract codes, and whether the program can be normally executed to a stain convergent point or not is verified through multiple tests, so that the program codes to be detected can be comprehensively analyzed and the stain paths can be positioned, the accuracy of stain information flow analysis in code detection is effectively improved, and the false alarm rate of detection results is reduced.
In step S340, for the target abstract code after each mutation, it is verified whether an executable path exists between the stain source and the stain convergence point to be detected, and the verification result is recorded.
For example, the target abstract code may include a taint assertion marking the position of a taint convergence point, and whether an executable path exists between a taint source and the taint convergence point in the target abstract code and the mutated target abstract code at each time may be verified by the following methods: verifying whether the execution result of the program entry parameter of the target abstract code and the target abstract code after each variation meets taint assertion; if the path is satisfied, the program entry parameter is indicated to possibly threaten the program, the program entry parameter is regarded as a pollution source, and the recorded verification result is that the path is reachable; if not, the program entry parameter may not pose a threat to the program, and is not a pollution source, and the verification result is recorded as the path unreachable. The program entry parameters can be determined based on the taint matching information, such as from entry parameter characteristics of potential taint sources in the taint matching information.
The data security of the program code to be detected can be verified in an all-around and accurate mode through the variant program elements, and the state of the program elements can be covered through code abstraction.
In an example embodiment, for each verified program element, its propagation path may be recorded simultaneously.
In step S350, the degree of suspicion of the program element is calculated based on the verification result.
In an example embodiment, the degree of suspicion of the program element may be calculated according to a suspicion degree calculation formula based on the verification result. For example, the suspicious degree calculation formula may be formula (1) as follows.
Figure BDA0003432521560000101
In formula (1), m represents any one program element, FmM1 is the number of times that the path corresponding to program element m in the verification result can be reached, m2 is the number of times that the path corresponding to program element m in the verification result can not be reached, Σmm1 is the total number of times the path of all verified program elements of the target abstract code in the verification result is unreachable, Σmm2 represents all verified programs of the target abstract code in the verification resultThe total number of times the path of an element can be reached.
By calculating the degree of suspicion of program elements, the probability of contamination sources in the program code to be detected can be measured. The higher the suspicious degree of the program element is, the higher the possibility that the input point of the program is a pollution source is; the lower the suspicion degree of a program element, the lower the probability that the program input point is a contamination source.
In step S360, a detection result of the program code to be detected is output according to the suspicious degree of the program element.
Finally, the detection result of the program code to be detected can be output according to the suspicious degree of the program element, and the detection result can include the detected corresponding relation among the stain source, the stain convergent point and the data transmission path. Illustratively, the detection result may further include a program element set and its propagation path.
In an example embodiment of the disclosure, after performing program element variation-based taint information flow analysis on target abstract code, performing innocent function identification processing analysis, matching with innocent processing marking functions in a taint marking library, and then outputting a detection result. The false alarm rate of the analysis result of the taint information flow can be further reduced through harmless function identification processing analysis.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Further, in the present exemplary embodiment, a code detection apparatus is also provided. Referring to fig. 4, the code detecting apparatus 400 may include: the system comprises a parsing module 410, an extracting module 420, an abstract processing module 430 and an analyzing module 440. Wherein:
the parsing module 410 may be configured to parse a syntax structure of the program code to be detected to obtain a program syntax rule and a data dependency relationship;
the extracting module 420 can be used for extracting program codes matched with the stain mark library from the program codes to be detected to obtain stain matching information;
the abstract processing module 430 may be configured to perform abstract processing on a program element, which is irrelevant to a target stain, in a program code to be detected based on a program syntax rule, a data dependency relationship, and stain matching information, so as to generate a target abstract code, where the target stain is a program element conforming to the characteristics of the stain matching information;
the analysis module 440 may be configured to perform taint information flow analysis on the target abstract code to obtain a detection result of the program code to be detected.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the analysis module 340 may include a mutation unit, a verification recording unit, a calculation unit, and an output unit. Wherein:
the variation unit may be configured to perform variation for a set number of times on the program element of the target abstract code;
the verification recording unit can be used for verifying whether an executable path exists between a stain source to be detected and a stain convergent point for the target abstract code and the target abstract code after each variation, and recording a verification result;
the calculation unit may be configured to calculate a suspicion degree of the program element based on the verification result;
the output unit may be configured to output a detection result of the program code to be detected according to the degree of doubt of the program element.
In an exemplary embodiment of the disclosure, based on the foregoing solution, the target abstract code may include a taint assertion marking a location of a taint gathering point, and the verification recording unit may be specifically configured to: verifying whether the execution result of the program entry parameter of the target abstract code and the target abstract code after each variation meets the taint assertion, wherein the program entry parameter is determined based on taint matching information; if yes, recording the verification result as that the path is reachable; if not, recording the verification result as that the path is not reachable.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the calculating unit may specifically be configured to: calculating the doubtful degree of the program element according to a doubtful degree calculation formula based on the verification result; the suspicious degree calculation formula is as follows:
Figure BDA0003432521560000121
wherein m represents any program element, FmM1 is the number of times that the path corresponding to program element m in the verification result can be reached, m2 is the number of times that the path corresponding to program element m in the verification result can not be reached, Σmm1 is the total number of times the path of all verified program elements of the target abstract code in the verification result is unreachable, Σmm2 is the total number of times the path of all verified program elements of the target abstract code in the verification result can be reached.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the taint mark library may include a taint source mark function, a taint convergence point mark function, and a innocent treatment mark function; the extracting module 420 may be specifically configured to extract, from the program codes to be detected, program codes matched with the stain source marking function, the stain convergence point marking function, and the innocent treatment marking function, so as to obtain stain matching information.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the parsing module 410 may include an analyzing unit, a first generating unit, and a second generating unit. Wherein:
the analysis unit can be used for performing lexical analysis and syntactic analysis on the program code to be detected to generate an abstract syntax tree;
the first generating unit may be configured to generate a program syntax rule based on the abstract syntax tree;
the second generating unit may be configured to generate a program call relationship diagram based on the abstract syntax tree, and extract the data dependency relationship from the program call relationship diagram.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the abstraction processing module 430 may include an element removing unit and an adding unit, where:
the element removing unit can be used for removing program elements irrelevant to the target stain in the program code to be detected based on the program syntax rule, the data dependency relationship and the stain matching information and by combining the abstract syntax tree and the program calling relationship diagram to obtain an abstract code to be processed;
the adding unit can be used for adding the taint assertion for marking the taint convergence point position in the abstract code to be processed to obtain the target abstract code.
The specific details of each module of the code detection apparatus have been described in detail in the corresponding code detection method, and therefore are not described herein again.
It should be noted that although in the above detailed description several modules or units of the code detection apparatus are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the code detection method is also provided.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to such an embodiment of the present disclosure is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present disclosure.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, a bus 530 connecting various system components (including the memory unit 520 and the processing unit 510), and a display unit 540.
Wherein the storage unit stores program code that is executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present disclosure as described in the above section "exemplary methods" of this specification. For example, the processing unit 510 may perform the steps as shown in fig. 1: step S110, analyzing a grammar structure of a program code to be detected to obtain a program grammar rule and a data dependency relationship; step S120, extracting a program code matched with the stain mark library from the program code to be detected to obtain stain matching information; step S130, abstracting program elements irrelevant to the target stain in the program codes to be detected based on the program syntax rules, the data dependency relationship and the stain matching information to generate target abstract codes, wherein the target stain is the program elements conforming to the stain matching information characteristics; and step S140, performing program element variation-based taint information flow analysis on the target abstract code to obtain a detection result of the program code to be detected.
The storage unit 520 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)521 and/or a cache memory unit 522, and may further include a read only memory unit (ROM) 523.
The storage unit 520 may also include a program/utility 524 having a set (at least one) of program modules 524, such program modules 525 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 570 (e.g., keyboard, pointing device, Bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. As shown, the network adapter 560 communicates with the other modules of the electronic device 500 over the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.
Referring to fig. 6, a program product 600 for implementing the code detection method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A code detection method, comprising:
analyzing a grammar structure of a program code to be detected to obtain a program grammar rule and a data dependency relationship;
extracting a program code matched with a stain mark library in the program code to be detected to obtain stain matching information;
abstracting program elements irrelevant to a target stain in the program code to be detected based on the program syntax rules, the data dependency relationship and the stain matching information to generate target abstract codes, wherein the target stain is the program elements conforming to the stain matching information;
and carrying out program element variation-based taint information flow analysis on the target abstract code to obtain a detection result of the program code to be detected.
2. The code detection method of claim 1, wherein performing a program element variation-based taint information flow analysis based on the target abstract code to obtain a detection result of the program code to be detected comprises:
performing variation of set times on the program elements of the target abstract codes;
verifying whether an executable path exists between a stain source to be detected and a stain convergent point of the target abstract code and the target abstract code after each variation, and recording a verification result;
calculating the suspiciousness of the program element based on the verification result;
and outputting the detection result of the program code to be detected according to the suspicious degree of the program element.
3. The code detection method according to claim 2, wherein the target abstract code includes a taint assertion marking a taint convergence point position, and the verifying whether an executable path exists between a taint source to be detected and the taint convergence point for the target abstract code and the target abstract code after each mutation comprises:
verifying whether the execution result of the program entry parameter of the target abstract code and the target abstract code after each mutation meets the taint assertion, wherein the program entry parameter is determined based on the taint matching information;
if yes, recording the verification result as that the path is reachable;
if not, recording the verification result as that the path is not reachable.
4. The code detection method according to claim 2, wherein the calculating the suspiciousness of the program element based on the verification result comprises:
calculating the doubtful degree of the program element according to a doubtful degree calculation formula based on the verification result; the suspicious degree calculation formula is as follows:
Figure FDA0003432521550000021
wherein m represents any program element, and FmM1 is the suspicious degree of the program element m, m1 is the reachable times of the path corresponding to the program element m in the verification result, m2 is the unreachable times of the path corresponding to the program element m in the verification result, Σmm1 is the path of all verified program elements of the target abstract code in the verification resultTotal number of unreachable times, Σmm2 is the total number of times the path of all verified program elements of the target abstract code in the verification result can be reached.
5. The code detection method according to claim 1, wherein the taint mark library comprises a taint source mark function, a taint convergence point mark function and a innocent treatment mark function; extracting program codes matched with a stain mark library in the program codes to be detected to obtain stain matching information, wherein the stain matching information comprises the following steps:
and extracting program codes matched with the stain source marking function, the stain convergent point marking function and the innocent treatment marking function from the program codes to be detected to obtain stain matching information.
6. The code detection method of claim 1, wherein the parsing the syntactic structure of the program code to be detected to obtain program syntactic rules and data dependencies, comprises:
performing lexical analysis and syntactic analysis on the program code to be detected to generate an abstract syntax tree;
generating program syntax rules based on the abstract syntax tree;
and generating a program calling relation graph based on the abstract syntax tree, and extracting data dependency relations from the program calling relation graph.
7. The code detection method of claim 6, wherein abstracting, based on the program syntax rules, the data dependencies, and the taint matching information, program elements that are not related to a target taint in the program code to be detected to generate a target abstract code, comprises:
based on the program syntax rule, the data dependency relationship and the stain matching information, and in combination with the abstract syntax tree and the program calling relationship graph, removing program elements irrelevant to the target stain in the program code to be detected to obtain an abstract code to be processed;
and adding a taint assertion for marking the taint convergence point position in the abstract code to be processed to obtain a target abstract code.
8. A code detection apparatus, comprising:
the analysis module is used for analyzing the syntactic structure of the program code to be detected to obtain a program syntactic rule and a data dependency relationship;
the extraction module is used for extracting the program codes matched with the stain mark library in the program codes to be detected to obtain stain matching information;
the abstract processing module is used for carrying out abstract processing on program elements irrelevant to a target stain in the program code to be detected based on the program syntax rules, the data dependency relationship and the stain matching information to generate a target abstract code, wherein the target stain is a program element conforming to the stain matching information characteristics;
and the analysis module is used for carrying out stain information flow analysis on the target abstract code to obtain a detection result of the program code to be detected.
9. An electronic device, comprising:
a processor; and
a memory having stored thereon computer readable instructions which, when executed by the processor, implement the code detection method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a code detection method according to any one of claims 1 to 7.
CN202111599410.9A 2021-12-24 2021-12-24 Code detection method and device, electronic equipment and storage medium Pending CN114328208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111599410.9A CN114328208A (en) 2021-12-24 2021-12-24 Code detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111599410.9A CN114328208A (en) 2021-12-24 2021-12-24 Code detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114328208A true CN114328208A (en) 2022-04-12

Family

ID=81013736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111599410.9A Pending CN114328208A (en) 2021-12-24 2021-12-24 Code detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114328208A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115296895A (en) * 2022-08-02 2022-11-04 中国电信股份有限公司 Request response method and device, storage medium and electronic equipment
CN116167048A (en) * 2023-01-20 2023-05-26 北京长亭未来科技有限公司 Webshell detection method and device for EL expression
CN116303042A (en) * 2023-03-22 2023-06-23 中国人民解放军国防科技大学 Software configuration fault detection method based on stain analysis
CN116340186A (en) * 2023-05-25 2023-06-27 中汽研软件测评(天津)有限公司 Automobile electronic software detection system, method and medium
CN117421252A (en) * 2023-12-18 2024-01-19 荣耀终端有限公司 Code detection method, device and computer readable storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115296895A (en) * 2022-08-02 2022-11-04 中国电信股份有限公司 Request response method and device, storage medium and electronic equipment
CN115296895B (en) * 2022-08-02 2024-02-23 中国电信股份有限公司 Request response method and device, storage medium and electronic equipment
CN116167048A (en) * 2023-01-20 2023-05-26 北京长亭未来科技有限公司 Webshell detection method and device for EL expression
CN116167048B (en) * 2023-01-20 2023-08-11 北京长亭未来科技有限公司 Webshell detection method and device for EL expression
CN116303042A (en) * 2023-03-22 2023-06-23 中国人民解放军国防科技大学 Software configuration fault detection method based on stain analysis
CN116303042B (en) * 2023-03-22 2023-09-12 中国人民解放军国防科技大学 Software configuration fault detection method based on stain analysis
CN116340186A (en) * 2023-05-25 2023-06-27 中汽研软件测评(天津)有限公司 Automobile electronic software detection system, method and medium
CN116340186B (en) * 2023-05-25 2023-09-19 中汽研软件测评(天津)有限公司 Automobile electronic software detection system, method and medium
CN117421252A (en) * 2023-12-18 2024-01-19 荣耀终端有限公司 Code detection method, device and computer readable storage medium
CN117421252B (en) * 2023-12-18 2024-05-31 荣耀终端有限公司 Code detection method, device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN114328208A (en) Code detection method and device, electronic equipment and storage medium
US10614226B2 (en) Machine learning statistical methods estimating software system's security analysis assessment or audit effort, cost and processing decisions
CN111506900B (en) Vulnerability detection method and device, electronic equipment and computer storage medium
US8904543B2 (en) Discovery of application vulnerabilities involving multiple execution flows
CN115146282A (en) AST-based source code anomaly detection method and device
CN114398673A (en) Application compliance detection method and device, storage medium and electronic equipment
US11449408B2 (en) Method, device, and computer program product for obtaining diagnostic information
Cheng et al. Logextractor: Extracting digital evidence from android log messages via string and taint analysis
CN112131573A (en) Method and device for detecting security vulnerability and storage medium
CN114036526A (en) Vulnerability testing method and device, computer equipment and storage medium
CN114205156A (en) Message detection method and device for tangent plane technology, electronic equipment and medium
CN113971284B (en) JavaScript-based malicious webpage detection method, equipment and computer readable storage medium
CN115964701A (en) Application security detection method and device, storage medium and electronic equipment
US20230177168A1 (en) System for static analysis of binary executable code and source code using fuzzy logic and method thereof
CN115809267A (en) Method, apparatus, medium, and program product for generating audit result
CN113760291B (en) Log output method and device
CN116185805A (en) Code detection method, device, equipment and storage medium
CN112286802B (en) Method and device for testing program performance and electronic equipment
CN113672512A (en) Code inspection rule generating method, code inspection method, device and medium
JPWO2020008632A1 (en) Hypothesis reasoning device, hypothesis reasoning method, and program
RU2783152C1 (en) System and method for static analysis of executable binary code and source code using fuzzy logic
CN114881018B (en) File processing method and device, electronic equipment and storage medium
CN114091024A (en) File detection method and device, electronic equipment and storage medium
Nembhard et al. Conversational code analysis: The future of secure coding
JP2018121245A (en) Communication apparatus, communication specification difference extraction method, and communication specification difference extraction program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination