CN112965747A

CN112965747A - Method, device, equipment and computer readable medium for mining code vulnerability

Info

Publication number: CN112965747A
Application number: CN202110340078.8A
Authority: CN
Inventors: 刘文宇; 阳骁尧; 邹为; 夏伟; 涂耀旭; 郑娜威
Original assignee: CCB Finetech Co Ltd
Current assignee: CCB Finetech Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-06-15
Anticipated expiration: 2041-03-30
Also published as: CN112965747B

Abstract

The invention discloses a method, a device, equipment and a computer readable medium for mining code bugs, and relates to the technical field of automatic program design. One embodiment of the method comprises: formatting the codes and analyzing the composition of the formatted codes; determining the initial position of the user input data stream according to the development frame of the code by combining the composition; searching the formatted codes from the initial position of the data stream, and positioning the data stream of the user controllable variable; and determining a code vulnerability based on an deserialization function in the data stream of the user controllable variable. The implementation method can reduce the time consumption for excavating the code bugs and reduce the rate of missing report.

Description

Method, device, equipment and computer readable medium for mining code vulnerability

Technical Field

The present invention relates to the field of automatic programming technologies, and in particular, to a method, an apparatus, a device, and a computer-readable medium for mining code vulnerabilities.

Background

The testing work of the Java deserialization vulnerability currently depends on a black box testing method, a gray box testing method and a white box testing method.

The black box test method requires an attempt to trigger a generic, non-command execution gadget through the existing knowledge base, such as: and judging whether an deserialization vulnerability exists in a certain parameter by adopting an application sleep mode. Or auditing whether a java library with java deserialization vulnerability is referenced in the java project or not depending on the existing knowledge base universal vulnerability, and mining in a mode of existence of the vulnerability version.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the method has the advantages of long time consumption and high missing report rate in the process of mining the code bugs.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a computer-readable medium for mining a code vulnerability, which can reduce time consumption for mining a code vulnerability and reduce a false negative rate.

To achieve the above object, according to an aspect of the embodiments of the present invention, there is provided a method for mining a code vulnerability, including:

formatting the codes and analyzing the composition of the formatted codes;

determining the initial position of the user input data stream according to the development frame of the code by combining the composition;

searching the formatted codes from the initial position of the data stream, and positioning the data stream of the user controllable variable;

and determining a code vulnerability based on an deserialization function in the data stream of the user controllable variable.

The formatting code includes one or more of clearing comments in the code, deleting unnecessary spaces in the code, and deleting unnecessary carriage returns in the code.

The analyzing the composition of the formatted code comprises:

and analyzing the composition of the formatted code through a regular expression.

The composition includes one or more of a class base case, a class content, a method base case, a method content, an interface base case, a build method content, and an attribute base case.

The determining, in combination with the composition and according to the development framework of the code, a start position of a user input data stream includes:

respectively putting the components into index arrays, and taking files of the index arrays as indexes;

and determining the initial position of the user input data stream according to the development framework of the code by combining the index.

The development framework of the code is springMVC;

determining a starting position of a user input data stream according to the development framework of the code by combining the index, wherein the determining comprises the following steps:

the method of the class of the @ Controller annotation is consulted in conjunction with the index to determine the starting position of the user input data stream.

The searching the formatted code from the initial position of the data stream to locate the data stream of the user controllable variable includes:

and starting from the initial position of the data stream, globally searching the formatted codes and positioning the data stream of the user controllable variable.

starting from the initial position of the data stream, obtaining the entry name of the target function;

and searching the formatted codes according to the entry names of the target functions, and positioning the data stream of the user controllable variables.

The obtaining of the entry name of the objective function comprises:

obtaining the entry name of the target function;

determining that the entry parameter of the objective function is a temporary definition variable of a method of the objective function;

and after the target function is updated or the entry name is updated, the entry name of the target function is acquired again.

The obtaining of the entry name of the objective function comprises:

obtaining the entry name of the target function;

determining a temporary definition variable of a method that the entry name of the target function does not belong to the target function;

determining that the entry name of the target function does not belong to a system constant and the entry name of the target function does not belong to a non-user controllable variable;

The determining that the entry name of the objective function does not belong to a system constant includes:

and determining the entry name of the target function as the entry of the target function.

The non-user controllable variables include one or more of: searching and constructing a method content array, a method content array and a static-like method array.

Determining a code vulnerability based on an deserialization function in the data stream of the user controllable variable, including:

and determining the code vulnerability based on an deserialization function by a user input data stream starting method in the data stream of the user controllable variable.

judging whether an anti-serialization vulnerability attack surface exists in the data stream of the user controllable variable based on an anti-serialization function;

and determining the code vulnerability on the anti-serialization vulnerability attack surface.

The deserialization function includes presetting a plurality of functions.

Determining a code vulnerability on the anti-serialization vulnerability attack surface, wherein the determining comprises the following steps:

and extracting the parameters of the deserialization function on the deserialization vulnerability attack surface to determine the code vulnerability.

The extracting the parameters of the deserialization function comprises the following steps:

and extracting the parameters of the deserialization function by adopting a regular expression.

According to a second aspect of the embodiments of the present invention, there is provided an apparatus for mining a code vulnerability, including:

the analysis module is used for formatting the codes and analyzing the composition of the formatted codes;

a position module for determining the initial position of the user input data stream according to the development frame of the code by combining the composition;

the positioning module is used for searching the formatted codes from the initial position of the data stream and positioning the data stream of the user controllable variable;

and the determining module is used for determining the code vulnerability based on an deserialization function in the data stream of the user controllable variable.

According to a third aspect of the embodiments of the present invention, there is provided an electronic device for mining a code vulnerability, including:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method as described above.

According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method as described above.

One embodiment of the above invention has the following advantages or benefits: formatting the codes and analyzing the composition of the formatted codes; determining the initial position of the user input data stream according to the development frame of the code by combining the composition; searching the formatted codes from the initial position of the data stream, and positioning the data stream of the user controllable variable; and determining a code vulnerability based on an deserialization function in the data stream of the user controllable variable. Each type of deserialization function can be found in the code by simple search, so that the time consumption for mining code bugs can be reduced, and the missing report rate can be reduced.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of a main flow of a method for mining code vulnerabilities according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating a process for determining a start position of a user input data stream according to an embodiment of the present invention;

FIG. 3 is a flow diagram of a data flow for locating user controllable variables according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart illustrating a process of determining a code vulnerability based on an deserialization function according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the main structure of an apparatus for mining code vulnerabilities according to an embodiment of the present invention;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 7 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

By adopting the black box testing method, the gray box testing method and the white box testing method, the testing of the Java anti-serialization loophole is realized, the time consumption is long, and the report missing exists. Such as: developers' secondary packaging deserialization methods are characteristically difficult to detect by gray box or fuzz means. Alternatively, the deserialized parameter input method is special and may be omitted in the black box test. Or, the gadget cannot be utilized for the command-free execution, but a Java deserialization vulnerability exists. Relevant situations of harm occur once a researcher excavates a new gadget, such as: native java deserialization may produce DOS attacks.

Currently, manual tracking is mostly relied on for injection type leaks, and is time-consuming and easy to miss. And the complete use of the tool test also has the missing situation, such as: and (4) fuzzing testing, directly applying according to a vulnerability knowledge base, and the like. The variation of the input data stream in the application is not considered, so that the traditional payload (payload) may not produce the same attack effect on different applications.

In addition, the calling dimension of the common code auditing tool can reach the relatively rough granularity of the call stack level, and the capability of reversely analyzing codes is insufficient.

Therefore, the technical problems of long time consumption and high missing report rate of mining code bugs exist.

In order to solve the technical problems of long time consumption and high missing report rate in the process of mining code bugs, the following technical scheme in the embodiment of the invention can be adopted.

Referring to fig. 1, fig. 1 is a schematic diagram of a main flow of a method for mining a code vulnerability according to an embodiment of the present invention, which searches a data stream and determines a code vulnerability based on an deserialization function. As shown in fig. 1, the method specifically comprises the following steps:

s101, formatting the codes and analyzing the structure of the formatted codes.

According to the white-box test of the Java deserialization vulnerability, due to the obvious characteristics of the trigger function, each type of deserialization function can be found in the source code through simple search. Such as: the native java deserialization function is readObject (), or the fromXML () function in xstream, etc.

Such an obvious deserialization feature can help the security test engineer quickly locate the constituent entries of the deserialization function by merely combing the data stream of the entries. Wherein, whether the payload can be directly used or constructed according to the characteristics of the application data stream is more purposeful to solve.

To mine code vulnerabilities, source code is first obtained. In the embodiment of the invention, the excavation of the java deserialization vulnerability is focused, and the successful utilization of the gadget is the work of the subsequent security engineer during the POC writing process, so that only the java suffix file is needed. Wherein, the source code is a java suffix file.

In one embodiment of the invention, to analyze code composition, the code needs to be formatted first. The formatting code is to unify the format of the code.

As one example, formatting the code includes one or more of clearing comments in the code, deleting unnecessary spaces in the code, and deleting unnecessary carriage returns in the code. Specifically, the annotation in each java code file is cleared, and unnecessary spaces and carriage returns are deleted for linefeed. And establishing a new source code file list by the formatted codes.

In one embodiment of the invention, the composition of the formatted code may be analyzed by regular expressions. Regular expressions are typically used to retrieve, replace, text that conforms to a certain pattern or rule.

As one example, a composition includes one or more of a class base case, a class content, a method base case, a method content, an interface base case, a build method content, and an attribute base case.

Referring to table 1, table 1 is that splitting code involves regular expressions. Table 2 is a schematic table of the results of analyzing the formatted code.

TABLE 1

TABLE 2

And S102, combining and forming, and determining the initial position of the user input data stream according to the development framework of the code.

After the composition of the formatted code is known, the composition can be combined, and the initial position of the user input data stream is determined according to the development framework of the code.

Referring to fig. 2, fig. 2 is a schematic flowchart of determining a start position of a user input data stream according to an embodiment of the present invention, which specifically includes the following steps:

s201, the components are respectively placed into index arrays, and files of the index arrays serve as indexes.

In order to obtain the structure of the formatted code, the structure may be respectively placed in an index array, and the file to which the index array belongs may be used as an index.

S202, determining the initial position of the user input data stream according to the development framework of the code by combining the index.

And determining the initial position of the user input data stream according to the development framework of the code by combining the index. As one example, the development framework for code is SpringMVC. The method of the class of the @ Controller annotation is consulted in conjunction with the index to determine the starting position of the user input data stream. In particular, all the @ Controller annotated classes are found by searching, where the entry parameter of the method is the start position of the data stream.

Basically the user input of the mainstream java framework is of the form described above. The entry parameter of the method is the starting position of the user input data stream, and the method can be added into a starting array of the user input data stream.

S103, starting from the initial position of the data stream, searching the formatted codes and positioning the data stream of the user controllable variable.

Starting from the start of the data stream, the formatted code is searched to locate the data stream of the user controllable variable.

In one embodiment of the invention, to improve the search efficiency, a global search may be employed. That is, starting from the start of the data stream, the formatted code is globally searched to locate the data stream of the user controllable variable.

Referring to fig. 3, fig. 3 is a schematic flow chart of positioning a data flow of a user controllable variable according to an embodiment of the present invention, which specifically includes:

s301, starting from the initial position of the data stream, obtaining the entry name of the objective function.

The leaf nodes involved in the data flow are classified into two categories: non-user-controllable variables and user-controllable variables. Where the data flow of user-controllable variables is tracked, the tester can construct a payload that attempts to trigger deserialization.

Then, starting from the start position of the data stream, the entry name of the objective function is obtained. It should be noted that there are a plurality of objective functions, and there are a plurality of entry names of the objective functions. Therefore, obtaining the entry name of the objective function is a step performed a plurality of times. As an example, the objective function is the deserialization function described above.

For the temporary definition variable, the objective function or the parameter of the objective function may be updated, and the parameter of the objective function is obtained again.

As one example, an entry name of the objective function is obtained; determining a temporary definition variable of a method of which the entry name of the target function is the target function; and after the target function is updated or the entry name is updated, the entry name of the target function is acquired again.

In addition, when the entry name of the target function is determined to be the temporary definition variable of the method including the target function, whether the structure of the effective assignment of the temporary definition variable belongs to the temporary definition variable is judged again.

For the parameter values of the non-system constants and the non-user controllable variables, the objective function or the parameter values of the objective function can be updated, and the parameter values of the objective function are obtained again.

As one example, an entry name of the objective function is obtained; determining a temporary definition variable of a method that the entry name of the target function does not belong to the target function; determining that the entry name of the target function does not belong to a system constant and the entry name of the target function does not belong to a non-user controllable variable; and after the target function is updated or the entry name is updated, the entry name of the target function is acquired again. The entry name of the objective function is not the user controllable variable, and only when the entry name of the objective function belongs to the user controllable variable, the data stream of the user controllable variable can be positioned.

The method comprises the steps of determining that the parameter of the target function does not belong to a system constant, and specifically determining that the parameter of the target function belongs to the parameter of the target function. Of course, after the entry name of the target function is determined to belong to the system constant, the entry name does not need to be updated, because the system constant does not have an anti-serialization vulnerability attack surface.

In one embodiment of the invention, the non-user controllable variables include one or more of: searching and constructing a method content array, a method content array and a static-like method array. That is, the search structure method content array, the method content array, and the static-like method array all belong to non-user-controllable variables.

S302, searching the formatted codes according to the entry names of the target functions, and positioning the data streams of the user controllable variables.

Based on the entry names of the objective functions, the formatted codes are searched, and then the data streams of the user controllable variables can be located.

In the embodiment of fig. 3, the formatted code is searched based on the entry name of the objective function, thereby locating the data stream of the user controllable variable.

And S104, determining code vulnerabilities based on an deserialization function in the data flow of the user controllable variable.

After the data stream of the user controllable variable is positioned, the code vulnerability can be determined based on the deserialization function in the data stream of the user controllable variable.

It should be noted that, instead of a code bug necessarily existing in the data stream of the user-controllable variable, a code bug generally exists in the data stream of the user-controllable variable.

Referring to fig. 4, fig. 4 is a schematic flowchart of a process for determining a code vulnerability based on an deserialization function according to an embodiment of the present invention, which specifically includes:

s401, judging whether an anti-serialization vulnerability attack surface exists in the data stream of the user controllable variable based on an anti-serialization function.

And judging whether an anti-serialization vulnerability attack surface exists in the data stream of the user controllable variable based on the anti-serialization function. It should be noted that the deserializing function includes a plurality of preset functions. See table 3, which is an deserialization attack table 3.

TABLE 3

S402, determining the code vulnerability on the anti-serialization vulnerability attack surface.

The code vulnerability can be determined on an anti-serialization vulnerability attack surface.

In the embodiment of FIG. 4, a code vulnerability may be determined in a data stream of user-controllable variables.

In one embodiment of the invention, in the data stream of the user controllable variable, the entries of the deserialization function are extracted to determine the code vulnerability. During specific implementation, the regular expression is adopted to extract the parameters of the deserialization function.

In the above embodiment, the code is formatted, and the composition of the formatted code is analyzed; determining the initial position of the user input data stream according to the development frame of the code by combining the composition; searching the formatted codes from the initial position of the data stream, and positioning the data stream of the user controllable variable; and determining a code vulnerability based on an deserialization function in the data stream of the user controllable variable. Each type of deserialization function can be found in the code by simple search, so that the time consumption for mining code bugs can be reduced, and the missing report rate can be reduced.

In addition, the scheme in the embodiment of the invention is adopted to mine the code vulnerability, and the data stream of the suspicious Java deserialization vulnerability is combed out for a security engineer to audit. And a large amount of code auditing work in the test process of a safety engineer is reduced.

And tracking the data stream from the starting point of the vulnerability, and improving the vulnerability mining accuracy. Deployment and use are simple, and only a developer needs to provide a source code, so that potential vulnerability point positions can be analyzed, and data flow is reversely tracked until whether the data flow is input by a user is judged.

Referring to fig. 5, fig. 5 is a schematic diagram of a main structure of an apparatus for mining a code vulnerability according to an embodiment of the present invention, where the apparatus for mining a code vulnerability may implement a method for mining a code vulnerability, and as shown in fig. 5, the apparatus for mining a code vulnerability specifically includes:

the analysis module 501 is used for formatting the codes and analyzing the composition of the formatted codes;

a location module 502, configured to determine, in accordance with the development framework of the code, a starting location of a user input data stream in combination with the composition;

a positioning module 503, configured to search the formatted code from the start position of the data stream, and position the data stream of the user controllable variable;

a determining module 504, configured to determine a code vulnerability based on an deserialization function in the data stream of the user controllable variable.

In one embodiment of the invention, the formatting code includes one or more of clearing comments in the code, deleting unnecessary spaces in the code, and deleting unnecessary carriage returns in the code.

In an embodiment of the present invention, the analysis module 501 is specifically configured to analyze the composition of the formatted code through a regular expression.

In one embodiment of the invention, the composition includes one or more of a class base case, a class content, a method base case, a method content, an interface base case, a build method content, and an attribute base case.

In an embodiment of the present invention, the location module 502 is specifically configured to separately place the components into index arrays, and use files to which the index arrays belong as indexes;

In one embodiment of the invention, the development framework for the code is SpringMVC;

the location module 502 is specifically configured to refer to the method of the class of the @ Controller annotation in combination with the index, and determine the starting location of the user input data stream.

In an embodiment of the present invention, the positioning module 503 is specifically configured to start from the start position of the data stream, and globally search the formatted code to position the data stream of the user controllable variable.

In an embodiment of the present invention, the positioning module 503 is specifically configured to obtain an entry name of the objective function from a start position of the data stream;

In an embodiment of the present invention, the positioning module 503 is specifically configured to obtain an entry name of the objective function;

In an embodiment of the present invention, the positioning module 503 is specifically configured to determine that the entry name of the objective function belongs to the entry of the objective function.

In one embodiment of the invention, the non-user controllable variables include one or more of: searching and constructing a method content array, a method content array and a static-like method array.

In an embodiment of the present invention, the determining module 504 is specifically configured to determine a code vulnerability based on an deserialization function in a user input data stream starting method in the data stream of the user controllable variable.

In an embodiment of the present invention, the determining module 504 is specifically configured to determine, based on an deserialization function, that a deserialization vulnerability attack surface exists in the data stream of the user controllable variable;

In one embodiment of the invention, the deserialization function comprises a predetermined plurality of functions.

In an embodiment of the present invention, the determining module 504 is specifically configured to extract an entry of the deserialization function in the deserialization vulnerability attack plane to determine the code vulnerability.

In an embodiment of the present invention, the determining module 504 is specifically configured to extract the entries of the deserialization function by using a regular expression.

Fig. 6 illustrates an exemplary system architecture 600 to which the method for mining code vulnerabilities or the apparatus for mining code vulnerabilities of the present invention may be applied.

As shown in fig. 6, the system architecture 600 may include

terminal devices

601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the

terminal devices

601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The

terminal devices

601, 602, 603 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 605 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

601, 602, 603. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for mining code vulnerabilities provided in the embodiment of the present invention is generally executed by the server 605, and accordingly, the apparatus for mining code vulnerabilities is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an analysis module, a location module, a positioning module, and a determination module. Where the names of these modules do not in some cases constitute a limitation on the modules themselves, for example, an analysis module may also be described as "for formatting code, and analyzing the composition of the formatted code.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:

formatting the codes and analyzing the composition of the formatted codes;

According to the technical scheme of the embodiment of the invention, the codes are formatted, and the composition of the formatted codes is analyzed; determining the initial position of the user input data stream according to the development frame of the code by combining the composition; searching the formatted codes from the initial position of the data stream, and positioning the data stream of the user controllable variable; and determining a code vulnerability based on an deserialization function in the data stream of the user controllable variable. Each type of deserialization function can be found in the code by simple search, so that the time consumption for mining code bugs can be reduced, and the missing report rate can be reduced.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for mining code vulnerabilities, comprising:

formatting the codes and analyzing the composition of the formatted codes;

2. The method of mining code vulnerabilities according to claim 1, wherein the formatting of the code includes one or more of clearing annotations in the code, deleting unnecessary spaces in the code, and deleting unnecessary carriage returns in the code.

3. The method for mining code vulnerabilities according to claim 1, wherein the analyzing the composition of the formatted code comprises:

4. The method of mining code vulnerabilities according to claim 1, wherein the constructs include one or more of class primitives, class content, method primitives, method content, interface primitives, build method content, and attribute primitives.

5. The method of mining code vulnerabilities according to claim 1, wherein said determining, in conjunction with said forming, a starting location of a user input data stream according to a development framework of said code comprises:

6. The method of mining code vulnerabilities according to claim 5, wherein the development framework for the code is SpringMVC;

7. The method for mining code vulnerabilities according to claim 1, wherein said searching said formatted code from a start of said data stream to locate a data stream of user controllable variables comprises:

8. The method for mining code vulnerabilities according to claim 1, wherein said searching said formatted code from a start of said data stream to locate a data stream of user controllable variables comprises:

9. The method for mining code vulnerabilities according to claim 8, wherein said obtaining an entry name of an objective function from said source comprises:

obtaining the entry name of the target function;

10. The method for mining code vulnerabilities according to claim 8, wherein said obtaining an entry name of an objective function from said source comprises:

obtaining the entry name of the target function;

11. The method for mining code vulnerabilities according to claim 10, wherein said determining that the entry name of the objective function does not belong to a system constant comprises:

12. The method for mining code vulnerabilities according to claim 10, wherein the non-user controllable variables include one or more of: searching and constructing a method content array, a method content array and a static-like method array.

13. The method for mining code vulnerabilities according to claim 1, wherein determining code vulnerabilities based on an deserialization function in the data stream of user-controllable variables comprises:

14. The method for mining code vulnerabilities according to claim 1, wherein determining code vulnerabilities based on an deserialization function in the data stream of user-controllable variables comprises:

15. The method of mining code vulnerabilities according to claim 1, wherein the deserialization function includes a predetermined plurality of functions.

16. The method of mining code vulnerabilities according to claim 14, wherein determining code vulnerabilities at the anti-serialization vulnerability attack face comprises:

17. The method for mining code vulnerabilities according to claim 16, wherein said extracting the entries of said deserializing function comprises:

18. An apparatus for mining code vulnerabilities, comprising:

19. An electronic device for mining code vulnerabilities, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-17.

20. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-17.