CN112632423A - URL extraction method and device - Google Patents

URL extraction method and device Download PDF

Info

Publication number
CN112632423A
CN112632423A CN202110258227.6A CN202110258227A CN112632423A CN 112632423 A CN112632423 A CN 112632423A CN 202110258227 A CN202110258227 A CN 202110258227A CN 112632423 A CN112632423 A CN 112632423A
Authority
CN
China
Prior art keywords
url
source code
target parameter
processing
syntax tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110258227.6A
Other languages
Chinese (zh)
Other versions
CN112632423B (en
Inventor
徐国爱
徐国胜
齐向东
纪胜龙
王少杰
王晨宇
张洪盈
毛庆梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qax Technology Group Inc
Beijing University of Posts and Telecommunications
Original Assignee
Qax Technology Group Inc
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qax Technology Group Inc, Beijing University of Posts and Telecommunications filed Critical Qax Technology Group Inc
Priority to CN202110258227.6A priority Critical patent/CN112632423B/en
Publication of CN112632423A publication Critical patent/CN112632423A/en
Application granted granted Critical
Publication of CN112632423B publication Critical patent/CN112632423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

One or more embodiments of the present disclosure provide a URL extraction method and apparatus. The URL extraction method comprises the following steps: acquiring a source code file; constructing an abstract syntax tree based on the source code file; traversing the abstract syntax tree to obtain Web-API; determining target parameters according to the Web-API; traversing the abstract syntax tree again, and judging whether the target parameter exists in the source code file; if yes, performing first processing; and if not, performing second processing to obtain the URL. The URL present in the source code can be directly retrieved. The method has the advantages of high URL extraction accuracy, flexible response to different scenes, no need of real-time maintenance and the like, and greatly improves the working efficiency of suspicious URL detection in the source code.

Description

URL extraction method and device
Technical Field
One or more embodiments of the present disclosure relate to the technical field of software security, and in particular, to a URL extraction method and apparatus.
Background
With the continuous development of network communication technology, more and more applications rely on accessing URLs to provide services, and the information network security problem related to the applications is increasingly highlighted. People are confronted with a large number of malicious network attacks such as junk mails, phishing, click fraud and the like while enjoying convenience brought by network development, and most of the malicious network attacks complete attack behaviors by means of malicious URLs.
Extracting URLs from applications is the basis for carrying out many anomaly detection tasks. The accurate identification and analysis of the network communication URL existing in the source code play a very important role in security detection such as hidden service identification, malicious website detection, vulnerability detection and the like. Since the URL formats are various and the text environment is complex, extraction is difficult. With the continuous development of information technology and the change of network defense and attack technology, the existing mode of the URL in the application program is more variable, and the extraction of the URL is more difficult. The existing extraction method is not accurate enough and cannot meet the actual requirement.
Disclosure of Invention
In view of the above, one or more embodiments of the present disclosure are directed to a URL extraction method and apparatus, so as to solve the problems in the prior art.
In view of the above, one or more embodiments of the present disclosure provide a URL extracting method, including:
acquiring a source code file;
constructing an abstract syntax tree based on the source code file;
traversing the abstract syntax tree to obtain Web-API;
determining target parameters according to the Web-API;
traversing the abstract syntax tree again, and judging whether the target parameter exists in the source code file;
if yes, performing first processing to obtain a URL; and if not, performing second processing to obtain the URL.
In one embodiment, the first processing includes: performing corresponding extraction processing according to the assignment type of the target parameter; and the assignment types of the target parameters comprise constant assignment, function assignment and user input.
In one embodiment, in the first process: when the target parameter is constant assignment, performing first extraction processing: directly extracting constant character strings corresponding to the target parameters; and when the target parameter is input by a user, performing second extraction processing: and setting user input, and extracting the constant character string corresponding to the user input.
In one embodiment, when the target parameter is a function assignment, performing a third extraction process; the third extraction process includes:
tracking assignment and method calling construction of parameters corresponding to a construction function on which the target parameters depend, and analyzing the values of the target parameters to obtain constant character strings;
and executing the obtained constant character string according to the constructor to obtain a complete constant character string.
In one embodiment, tracking assignment of a parameter corresponding to a constructor that a target parameter depends on and a method call structure, and analyzing a value of the target parameter to obtain a constant string may specifically include:
judging the assignment type of the parameter corresponding to the constructor on which the extracted target parameter depends;
when the parameter is constant assignment, the first extraction processing is carried out; when the parameter is input by a user, performing the second extraction processing; and when the parameter is function input, performing the third extraction processing.
In one embodiment, the second processing includes: and performing corresponding extraction processing according to the type of the external resource file introduced by the source code file.
In one embodiment, the second processing specifically includes: when the type of the external resource file is a source code type, the step of constructing an abstract syntax tree based on the source code file is circulated until a URL is obtained; and when the type of the external resource file is a non-source code type, searching the value of the target parameter to obtain the constant character string.
In one embodiment, the determining the target parameter according to the Web-API specifically includes:
and analyzing the nodes depended on by the Web-API according to the information of the Web-API calling module to obtain the objects depended on by the calling method and the parameters thereof, namely the target parameters.
In one embodiment, the method further comprises storing the URL in a file, and outputting the URL and a file path where the URL is located.
The embodiment of the present disclosure further provides a URL extracting apparatus, including:
the source code file acquisition module is used for acquiring a source code file;
the abstract syntax tree construction module is used for constructing an abstract syntax tree based on the source code file;
the Web-API acquisition module is used for traversing the abstract syntax tree to acquire a Web-API;
the target parameter determining module is used for determining target parameters according to the Web-API;
the target parameter position judging module is used for traversing the abstract syntax tree again and judging whether the target parameter exists in the source code file or not;
the processing module is used for carrying out first processing to obtain a URL if the target parameter exists in the source code file; and if the target parameter exists in the source code file, performing second processing to obtain the URL.
As can be seen from the foregoing, the URL extracting method and apparatus provided in one or more embodiments of the present disclosure include: acquiring a source code file; constructing an abstract syntax tree based on the source code file; traversing the abstract syntax tree to obtain Web-API; determining target parameters according to the Web-API; traversing the abstract syntax tree again, and judging whether the target parameter exists in the source code file; if yes, performing first processing; and if not, performing second processing to obtain the URL. The URL present in the source code can be directly retrieved. The method has the advantages of high URL extraction accuracy, flexible response to different scenes, no need of real-time maintenance and the like, and greatly improves the working efficiency of suspicious URL detection in the source code.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present disclosure, reference will now be made briefly to the attached drawings, which are used in the description of the embodiments or prior art, and it should be apparent that the attached drawings in the description below are only one or more embodiments of the present disclosure, and that other drawings may be obtained by those skilled in the art without inventive effort.
Fig. 1 is a schematic flowchart of a URL whole extraction method according to an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a URL extraction method according to an embodiment of the disclosure;
FIG. 3 is a diagram of an abstract syntax tree according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a specific URL extraction method according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a URL extraction apparatus according to an embodiment of the disclosure;
fig. 6 is a schematic diagram of an electronic device of an embodiment of the disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present disclosure should have the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the present disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items.
As described in the background section, a URL is an abbreviation for Uniform Resource Location, translated to a "Uniform Resource locator". URLs are strings on the Internet that are used to describe information resources, primarily in various WWW client and server programs. The URL can be used for describing various information resources in a uniform format, including files, addresses and directories of servers and the like.
Applicants have discovered in the course of practicing the present disclosure that the current URL extraction work uses mostly regular matching techniques. The regular expression (rgemulxpersion) defines a pattern used to search for matching strings. The regular expression can be used for realizing the pattern matching of the character strings and carrying out data verification on the values in the input domain. The regular expression matching method for extracting URLs has two disadvantages. Firstly, the detection target of regular matching is single, the extraction effect of the short website, the URL containing an IP address, having a special format or being in a complex context is not good, and the condition that the URL cannot be identified can occur if the URL uses different characters or is confronted by case-case switching, so that a lot of false reports are caused. In addition, the regular matching method has low efficiency, and is easy to generate abnormal matching, thereby causing situations such as breakdown or resource exhaustion.
The applicant proposes a method for extracting a URL (Uniform resource locator) by analyzing a source code abstract Syntax tree, which mainly comprises three parts, namely AST (Abstract Syntax tree) abstract Syntax tree generation, URL extraction and output. And generating an abstract syntax tree for the source code, and then analyzing the abstract syntax tree to obtain the URL existing in the abstract syntax tree. The overall flow chart is shown in fig. 1. The method has the advantages of high URL extraction accuracy, flexible response to different scenes, no need of real-time maintenance and the like, and greatly improves the working efficiency of suspicious URL detection in the source code.
Hereinafter, the technical means of the present disclosure will be described in further detail with reference to specific examples.
Referring to fig. 2, an embodiment of the present disclosure provides a URL extraction method, including:
s100, acquiring a source code file;
s200, constructing an abstract syntax tree based on the source code file;
s300, traversing the abstract syntax tree to obtain Web-API;
s400, determining target parameters according to the Web-API;
s500, traversing the abstract syntax tree again, and judging whether the target parameter exists in the source code file;
s600, according to the storage position of the target parameter, corresponding processing is carried out to obtain the URL.
In one or more embodiments of the present disclosure, in step S100, the type of the source code file may be at least one of C language and Java, which are common languages. The source code file can be obtained by accessing the path of the source code folder to be tested.
In one or more embodiments of the present disclosure, in step S200, building an abstract syntax tree based on the source code file may include: and constructing a corresponding abstract syntax tree by adopting an abstract syntax tree construction tool according to the concrete development language type of the source code file. That is, the step is to establish an abstract syntax tree for each source code file to obtain a corresponding abstract syntax tree. Selecting a proper abstract syntax tree construction tool according to different development languages of source codes, wherein the proper abstract syntax tree construction tool is used for C language, and Antlr is used for C language; java is used as Java parser; JavaScript uses JavaScript Parser. In constructing the abstract syntax tree, each node represents a structure in the source code, and may include a function declaration, a function call, a variable declaration, a variable assignment, and the like.
In an application scenario, the source code may be as follows:
function test( ) {
var a =http://test.com;
function visit(x) {
//visit the website
Return 0;
}
Visit(a)
}。
the source code is JavaScript, and an abstract syntax tree can be constructed by adopting JavaScript Parser. The corresponding abstract syntax tree may be as shown in fig. 3. Wherein the Function primitive module is defined for Function display; the Function call module is used for calling a Function; the Variable declaration module is a Variable declaration; the Return status module is a Return statement; the Assignment and Literal combination assigns the variable a value of "http:// test. com".
In one or more embodiments of the present disclosure, in step S300, by traversing all nodes of the abstract syntax tree, a call flow of each API (Application Programming Interface) in the source code, various operations of the variables, and the like can be read. The API may be a Web-API that functions on the Web. The Web-API includes utl access functions built into the programming language, as well as common third party Web libraries.
In one or more embodiments of the present disclosure, in step S400, for each Web-API, the node on which the Web-API depends is analyzed according to the information of the Web-API calling module, and an object and a parameter of the object on which the calling method depends are obtained, that is, the target parameter is obtained. Wherein a node may be a calling method. That is, according to different types of Web-APIs, parameters or objects of a specific location are extracted.
In one or more embodiments of the present disclosure, in step S500, it can be determined whether the target parameter exists in the source code file by traversing the abstract syntax tree again.
Referring to fig. 4, in one or more embodiments of the present disclosure, in step S600, corresponding processing is performed according to a storage location of a target parameter. The method specifically comprises the following steps: if yes, performing first processing to obtain a URL; and if not, performing second extraction to obtain the URL. Namely, if the target parameter exists in the source code file, performing first processing to obtain a URL; and if the target parameter exists in the external resource file, performing second extraction to obtain the URL.
In one or more embodiments of the present disclosure, the first process includes: and performing corresponding extraction processing according to the assignment type of the target parameter. And the assignment types of the target parameters comprise constant assignment, function assignment and user input.
In one or more embodiments of the present disclosure, when the target parameter is a constant value, a first extraction process is performed. The first extraction processing comprises directly extracting the constant character strings corresponding to the target parameters.
In an application scenario, for example, in Java, when the Web-API is openStream (), the calling mode is url. The variable URL is then analyzed, and if its assigned value is URL = new URL ("http:// www.baidu.com"), then the value of URL is the constant string "http:// www.baidu.com", i.e., the URL to be extracted.
In one or more embodiments of the present disclosure, when the target parameter is a user input, a second extraction process is performed, where the second extraction process includes setting the user input and extracting a constant character string corresponding to the set user input. In an application scenario, the USER INPUT may be set to "USER INPUT URL".
In one or more embodiments of the present disclosure, when the target parameter is a function assignment, a third extraction process is performed. The function valuation may be dependent on string concatenation or on other function constructs, such as string generators constructed using strings.
In one or more embodiments of the present disclosure, the third extraction process includes:
tracking assignment and method calling construction of parameters corresponding to a construction function on which the target parameters depend, and analyzing the values of the target parameters to obtain constant character strings;
and executing the obtained constant character string according to the constructor to obtain a complete constant character string.
In one or more embodiments of the present disclosure, tracking assignment of a parameter and a method call structure corresponding to a constructor that a target parameter depends on, analyzing a value of the target parameter, and obtaining a constant string may specifically include:
judging the assignment type of the parameter corresponding to the construction function which is relied on when the target parameter is extracted;
and performing corresponding extraction processing according to the assignment type of the parameter. Wherein the assignment types of the parameters comprise constant assignments, function assignments and user inputs.
And when the parameter is a constant assignment, performing the first extraction processing. The first process is the same as the above process, and is not described herein again. And when the parameter is user input, performing the second extraction process, where the second extraction process is the same as the previous one, and is not described herein again. And when the parameter is function input, performing the third extraction process, where the third extraction process is the same as the previous one, and is not described herein again.
In an application scenario, for example, in Java, when the Web-API is openStream (), the calling mode is url. Analyzing the variable URL, if the assignment operation is URL = a.apend ("baidu.com"), continuing to analyze the value of the parameter a, and if the parameter a is a constant character string, connecting the parameter a and the "baidu.com" to obtain a complete constant character string, namely the extracted URL; if the parameter a is INPUT by the USER, setting the parameter a as a USER INPUT URL, and connecting the USER INPUT URL with the basic character com to obtain a complete constant character string, namely the extracted URL; and if the parameter a is function assignment, repeating parameter analysis until a constant character string is obtained, and connecting the constant character string with the 'baidu.com' to obtain a complete constant character string, namely the extracted URL.
In one or more embodiments of the present disclosure, the second processing includes: and performing corresponding extraction processing according to the type of the external resource file introduced by the source code file. The introduced external resource files can comprise include < >, import, and also comprise common File read-write functions, such as File (), fopen (), and the like.
The method specifically comprises the following steps: when the type of the external resource file is a source code type, the step of constructing an abstract syntax tree based on the source code file is circulated until a URL is obtained; and when the type of the external resource file is a non-source code type, searching the value of the target parameter to obtain the constant character string. It should be understood that constructing an abstract syntax tree based on a source code file herein refers to constructing an abstract syntax tree based on an external resource file of the source code type.
In one or more embodiments of the present disclosure, the method further includes storing the URL in a file, and outputting the URL and a file path where the URL is located.
The embodiment of the disclosure also provides the following URL extraction method:
taking the aforementioned source code as an example, the abstract syntax diagram corresponding to the source code segment is shown in fig. 3. And reading a Function definition module Function primitive module by traversing the syntax tree, and determining that the Web-API is the experience (). And analyzing a Function calling module Function call, namely a calling module of the visit (), and obtaining the target parameter a. And traversing the syntax tree again, analyzing the combination of the Variable declaration module, the Assignment module and the live module, and extracting the URL successfully from the source code, wherein the Assignment of the Variable a is http:// test.
According to the URL extraction method provided by the embodiment of the disclosure, the path of the source code folder to be tested is saved, the source code is read, and URL extraction is carried out on the source code. The concrete extraction method is that firstly, an Abstract Syntax Tree (AST) is created for a source code file, then the abstract syntax tree is traversed, and Web-API related to communication is searched. And recursively analyzing nodes depended by the API aiming at the found call of the Web-API, and analyzing objects depended on by the calling method and parameters thereof. All assignments and method invocation constructs for each variable in the calling method are tracked. And extracting and analyzing the objects and parameters corresponding to the Web-API to obtain the URL existing in the source code. And storing each extracted URL into a file according to the output result of the extraction system, and outputting a file path including the URL and the URL.
According to the URL extraction method provided by the embodiment of the disclosure, from the perspective of calling the URL by the API, the URL is extracted from the data stream, and the URL format conversion has no influence on the extraction method, so that regular maintenance rules are not needed. The detection target which can avoid regular matching is single, the involved range is limited, and the URL needs to be updated if different formats of URLs appear. From the perspective of calling the URL by the API, the URL is extracted from the data stream, uncertainty of various URL existing modes can be avoided, and the URL extraction is simple and efficient. And the method analyzes the source code obtained after software decompiling, has more complete calling relation, and can reduce a lot of missing reports, so the detection is more comprehensive. The method avoids the problems that regular matching can only detect the set URL in the format, and the extraction effect is poor due to the operations such as URL format updating and URL splicing. Meanwhile, the method can flexibly deal with new URLs, and achieves better balance of various factors such as manpower, calculated amount, speed, effect and the like in practical use. Therefore, the method has strong feasibility and avoids the difficulty of the rule matching method in maintaining the rules.
It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It is noted that the above describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to any embodiment method, one or more embodiments of the present disclosure further provide a URL extraction apparatus.
Referring to fig. 5, the URL extracting apparatus 700 includes:
a source code file obtaining module 710, configured to obtain a source code file;
an abstract syntax tree construction module 720, configured to construct an abstract syntax tree based on the source code file;
a Web-API obtaining module 730, configured to traverse the abstract syntax tree to obtain a Web-API;
a target parameter determining module 740, configured to determine a target parameter according to the Web-API;
a target parameter position determining module 750, configured to traverse the abstract syntax tree again, and determine whether a target parameter exists in the source code file;
the processing module 760 is configured to perform a first process if the target parameter exists in the source code file; and if the target parameter exists in the source code file, performing second processing to obtain the URL.
In one or more embodiments of the present disclosure, when the processing module 760 is used for the first processing, specifically, the processing module is configured to: performing corresponding extraction processing according to the assignment type of the target parameter; and the assignment types of the target parameters comprise constant assignment, function assignment and user input.
In one or more embodiments of the present disclosure, when the processing module 760 is used for the first processing, specifically, the processing module is configured to: when the target parameter is a constant assignment, performing first extraction processing, namely directly extracting a constant character string corresponding to the target parameter; and when the target parameter is the user input, performing second extraction processing, namely setting the user input, and extracting the constant character string corresponding to the user input.
In one or more embodiments of the present disclosure, the processing module 760 is specifically configured to: when the target parameter is the function assignment, performing third extraction processing; the third extraction process includes:
tracking assignment and method calling construction of parameters corresponding to a construction function on which the target parameters depend, and analyzing the values of the target parameters to obtain constant character strings;
and executing the obtained constant character string according to the constructor to obtain a complete constant character string.
In one or more embodiments of the present disclosure, the processing module 760 is specifically configured to, when the processing module is configured to track assignment and method call configuration of a parameter corresponding to a constructor that a target parameter depends on, analyze a value of the target parameter, and obtain a constant string:
judging the assignment type of the parameter corresponding to the constructor on which the extracted target parameter depends;
when the parameter is constant assignment, the first extraction processing is carried out; when the parameter is input by a user, performing the second extraction processing; and when the parameter is function input, performing the third extraction processing.
In one or more embodiments of the present disclosure, the processing module 760 when used for the second processing includes: and performing corresponding extraction processing according to the type of the external resource file introduced by the source code file.
In one or more embodiments of the present disclosure, when the processing module 760 is used for the second processing, specifically, the processing module includes: when the type of the external resource file is a source code file, the step of constructing an abstract syntax tree based on the source code file is circulated until a URL is obtained; and when the type of the external resource file is a non-source code file, searching the value of the target parameter to obtain the constant character string.
In one or more embodiments of the present disclosure, the target parameter determining module 740, when configured to determine the target parameter according to the Web-API, is specifically configured to:
and analyzing the nodes depended on by the Web-API according to the information of the Web-API calling module to obtain the objects depended on by the calling method and the parameters thereof, namely the target parameters.
In one or more embodiments of the present disclosure, the method further includes: and storing the URL into the file, and outputting the URL and the file path where the URL is located.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present disclosure.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments, one or more embodiments of the present disclosure further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the URL extracting method according to any of the above embodiments is implemented.
Fig. 6 shows a more specific hardware structure diagram of an electronic device according to an embodiment of the present disclosure, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided by the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 can store an operating system and other application programs, and when the technical solution provided by the embodiments of the present disclosure is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. Moreover, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present disclosure, and need not include all of the components shown in the figures.
The electronic device of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above-described embodiment methods, one or more embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the URL extraction method according to any of the above-described embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the disclosure as described above, which are not provided in detail for the sake of brevity, within the spirit of the disclosure.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures, for simplicity of illustration and discussion, and so as not to obscure one or more embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring one or more embodiments of the present disclosure, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which one or more embodiments of the present disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The one or more embodiments of the present disclosure are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the disclosure are intended to be included within the scope of the disclosure.

Claims (10)

1. A URL extraction method, comprising:
acquiring a source code file;
constructing an abstract syntax tree based on the source code file;
traversing the abstract syntax tree to obtain Web-API;
determining target parameters according to the Web-API;
traversing the abstract syntax tree again, and judging whether the target parameter exists in the source code file;
if yes, performing first processing to obtain a URL; and if not, performing second processing to obtain the URL.
2. The URL extraction method according to claim 1, wherein the first process includes: performing corresponding extraction processing according to the assignment type of the target parameter; and the assignment types of the target parameters comprise constant assignment, function assignment and user input.
3. The URL extraction method according to claim 1, wherein in the first process: when the target parameter is a constant assignment, performing first extraction processing, namely directly extracting a constant character string corresponding to the target parameter; and when the target parameter is the user input, performing second extraction processing, namely setting the user input, and extracting the constant character string corresponding to the user input.
4. The URL extraction method according to claim 3, wherein when the target parameter is a function assignment, a third extraction process is performed; the third extraction process includes:
tracking assignment and method calling construction of parameters corresponding to a construction function on which the target parameters depend, and analyzing the values of the target parameters to obtain constant character strings;
and executing the obtained constant character string according to the constructor to obtain a complete constant character string.
5. The URL extraction method according to claim 4, wherein tracking assignment and method invocation structure of parameters corresponding to a constructor on which the target parameter depends, and analyzing values of the target parameter to obtain the constant string specifically includes:
judging the assignment type of the parameter corresponding to the constructor on which the target parameter depends;
when the parameter is constant assignment, the first extraction processing is carried out; when the parameter is input by a user, performing the second extraction processing; and when the parameter is function input, performing the third extraction processing.
6. The URL extraction method according to claim 1, wherein the second process includes: and performing corresponding extraction processing according to the type of the external resource file introduced by the source code file.
7. The URL extraction method according to claim 6, wherein the second processing specifically includes: when the type of the external resource file is a source code type, the step of constructing an abstract syntax tree based on the source code file is circulated until a URL is obtained; and when the type of the external resource file is a non-source code type, searching the value of the target parameter to obtain the constant character string.
8. The URL extraction method according to claim 1, wherein the determining the target parameter according to the Web-API specifically includes:
and analyzing the nodes depended on by the Web-API according to the information of the Web-API calling module to obtain the objects depended on by the calling method and the parameters thereof, namely the target parameters.
9. The URL extraction method as claimed in claim 1, further comprising storing the URL in a file and outputting the URL and a file path where the URL is located.
10. An apparatus for extracting URL, comprising:
the source code file acquisition module is used for acquiring a source code file;
the abstract syntax tree construction module is used for constructing an abstract syntax tree based on the source code file;
the Web-API acquisition module is used for traversing the abstract syntax tree to acquire a Web-API;
the target parameter determining module is used for determining target parameters according to the Web-API;
the target parameter position judging module is used for traversing the abstract syntax tree again and judging whether the target parameter exists in the source code file or not;
the processing module is used for carrying out first processing to obtain a URL if the target parameter exists in the source code file; and if the target parameter exists in the source code file, performing second processing to obtain the URL.
CN202110258227.6A 2021-03-10 2021-03-10 URL extraction method and device Active CN112632423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110258227.6A CN112632423B (en) 2021-03-10 2021-03-10 URL extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110258227.6A CN112632423B (en) 2021-03-10 2021-03-10 URL extraction method and device

Publications (2)

Publication Number Publication Date
CN112632423A true CN112632423A (en) 2021-04-09
CN112632423B CN112632423B (en) 2021-06-29

Family

ID=75297814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110258227.6A Active CN112632423B (en) 2021-03-10 2021-03-10 URL extraction method and device

Country Status (1)

Country Link
CN (1) CN112632423B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051253A (en) * 2021-04-15 2021-06-29 广州云族佳科技有限公司 Method and device for constructing tag database

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9805009B2 (en) * 2010-12-30 2017-10-31 Opera Software As Method and device for cascading style sheet (CSS) selector matching
CN107463376A (en) * 2017-07-21 2017-12-12 珠海牛角科技有限公司 The method and device for automatically generating back end interface document based on Javadoc
CN109462583A (en) * 2018-10-31 2019-03-12 南京邮电大学 A kind of reflection-type leak detection method combined based on static and dynamic
CN110362996A (en) * 2019-06-03 2019-10-22 中国科学院信息工程研究所 A kind of method and system of offline inspection PowerShell Malware
CN110472165A (en) * 2019-08-20 2019-11-19 深圳前海微众银行股份有限公司 URL extracting method, device, equipment and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9805009B2 (en) * 2010-12-30 2017-10-31 Opera Software As Method and device for cascading style sheet (CSS) selector matching
CN107463376A (en) * 2017-07-21 2017-12-12 珠海牛角科技有限公司 The method and device for automatically generating back end interface document based on Javadoc
CN109462583A (en) * 2018-10-31 2019-03-12 南京邮电大学 A kind of reflection-type leak detection method combined based on static and dynamic
CN110362996A (en) * 2019-06-03 2019-10-22 中国科学院信息工程研究所 A kind of method and system of offline inspection PowerShell Malware
CN110472165A (en) * 2019-08-20 2019-11-19 深圳前海微众银行股份有限公司 URL extracting method, device, equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李敬涛: ""基于机器学习的JavaScript恶意代码检测方案研究"", 《中国优秀硕士学位论文全文数据库》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051253A (en) * 2021-04-15 2021-06-29 广州云族佳科技有限公司 Method and device for constructing tag database

Also Published As

Publication number Publication date
CN112632423B (en) 2021-06-29

Similar Documents

Publication Publication Date Title
US11068382B2 (en) Software testing and verification
US11650905B2 (en) Testing source code changes
Chen et al. DroidCIA: A novel detection method of code injection attacks on HTML5-based mobile apps
CN112733158A (en) Android system vulnerability detection method, electronic equipment and storage medium
CN111260336B (en) Service checking method, device and equipment based on rule engine
CN111435393A (en) Object vulnerability detection method, device, medium and electronic equipment
CN107347076A (en) The detection method and device of SSRF leaks
CN115146282A (en) AST-based source code anomaly detection method and device
CN114328208A (en) Code detection method and device, electronic equipment and storage medium
CN113312618A (en) Program vulnerability detection method and device, electronic equipment and medium
CN112632423B (en) URL extraction method and device
CN113778897B (en) Automatic test method, device and equipment for interface and storage medium
CN109698814A (en) Botnet finds that method and Botnet find device
CN113419971A (en) Android system service vulnerability detection method and related device
US9398041B2 (en) Identifying stored vulnerabilities in a web service
CN115659344B (en) Software supply chain detection method and device
CN116880847A (en) Source tracing method and device based on open source project, electronic equipment and storage medium
CN115618363B (en) Vulnerability path mining method and related equipment
CN115421831A (en) Method, device, equipment and storage medium for generating calling relation of activity component
CN110968500A (en) Test case execution method and device
CN114625372A (en) Automatic component compiling method and device, computer equipment and storage medium
CN111309311B (en) Vulnerability detection tool generation method, device, equipment and readable storage medium
CN114528552A (en) Security event correlation method based on vulnerability and related equipment
CN112000573A (en) Code quality monitoring method and device, computer equipment and medium
US20230315862A1 (en) Method and apparatus for identifying dynamically invoked computer code using literal values

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant