CN114626062B - Website application user interaction point discovery method and system based on dynamic and static combination - Google Patents

Website application user interaction point discovery method and system based on dynamic and static combination Download PDF

Info

Publication number
CN114626062B
CN114626062B CN202210160099.6A CN202210160099A CN114626062B CN 114626062 B CN114626062 B CN 114626062B CN 202210160099 A CN202210160099 A CN 202210160099A CN 114626062 B CN114626062 B CN 114626062B
Authority
CN
China
Prior art keywords
point
interaction
website application
webpage
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210160099.6A
Other languages
Chinese (zh)
Other versions
CN114626062A (en
Inventor
李阳
陈远超
于璐
沈毅
张利群
马慧敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210160099.6A priority Critical patent/CN114626062B/en
Publication of CN114626062A publication Critical patent/CN114626062A/en
Application granted granted Critical
Publication of CN114626062B publication Critical patent/CN114626062B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a website application user interaction point discovery method and system based on dynamic and static combination. The method comprises the following steps: s1, determining a webpage to be crawled from a plurality of webpages applied by the website based on a similarity judgment strategy, wherein the similarity judgment strategy is used for grouping the plurality of webpages according to similarity so as to select one webpage in each group as the webpage to be crawled; s2, extracting data interaction points in the webpage to be crawled as first interaction points by using a dynamic crawler in a rule matching mode; s3, generating a control flow graph of the website application through lexical analysis and syntax abstraction in static code audit based on the source code of the website application; and S4, performing path analysis on the control flow graph by using the entry file information extracted from the first interaction point to determine a data interaction point different from the first interaction point in the source code as a second interaction point.

Description

Website application user interaction point discovery method and system based on dynamic and static combination
Technical Field
The invention relates to the technical field of network security, in particular to a website application user interaction point discovery method and system based on dynamic and static combination.
Background
The important reasons for data interaction point discovery in Web applications are: with the rapid development of the internet and the rapid popularization of the Web application, the increasing of user groups leads to more frequent updating and upgrading of the Web application, and the included functions are more abundant, which also causes two problems:
(1) Because the updating of the Web application is too fast, developers continuously update and add new functions on the basis of the original framework, and continuously redundant codes are left in the version iteration process, a large number of data interaction points of the original functions exist in the codes, and the more the data interaction points are, the more easily malicious users utilize the codes to carry out attack testing.
(2) The data interaction points contain a large number of parameter constraint relations, but not all parameters are necessary, and optimization of Web application can be further realized by discovering the parameter relations in the data interaction points.
In general, the discovery of data interaction points in Web applications is mainly applied to Web fuzzy test work, and is mainly performed in a way that crawlers crawl page information. In the existing research work, a method of machine learning based on documents and judgment request feedback is mostly adopted to discover the parameter constraint relationship of the Web application, namely, data interaction points of the Web application. Wherein inter-parameter constraints are inferred based on a decision tree approach, and information for populating a decision tree model is observed and inferred from responses of candidate parameter constraints at given points of data interaction. These candidates are selected using a set of heuristics and by observing the feedback of the Web service. This process integrates information based on parameters of the Web service, error messages and test results.
It can be seen that these methods are intended to crawl constraints from Web pages and infer parameter constraints through a document manual, but do not regard codes as inputs, and cannot solve the discovery of data interaction points of functions that do not appear in Web pages and document manuals, so a comprehensive and convenient method and system are urgently needed to implement the discovery of data interaction points in Web applications.
Disclosure of Invention
The invention provides a website application user interaction point discovery scheme based on dynamic and static combination, which aims to solve the technical problems in the prior art.
The invention discloses a website application user interaction point discovery method based on dynamic and static combination. The interaction point comprises a first interaction point in a web page of the website application and a second interaction point in source code of the website application, and the method comprises the following steps: s1, determining a webpage to be crawled from a plurality of webpages applied by the website based on a similarity judgment strategy, wherein the similarity judgment strategy is used for grouping the plurality of webpages according to similarity so as to select one webpage in each group as the webpage to be crawled; s2, extracting data interaction points in the webpage to be crawled as first interaction points by using a dynamic crawler in a rule matching mode; s3, generating a control flow graph of the website application through lexical analysis and syntax abstraction in static code audit based on the source code of the website application; and S4, performing path analysis on the control flow graph by using the entry file information extracted from the first interaction point to determine a data interaction point different from the first interaction point in the source code as a second interaction point.
According to the method of the first aspect of the present invention, the step S1 specifically includes: acquiring information of each webpage in a login state by using a headless browser; extracting protocols, domain names and virtual paths of all the webpages from the acquired information; based on the similarity judgment strategy, classifying the webpages with the same protocol, domain name and virtual path into the same group; and selecting one webpage from each group as the webpage to be crawled.
According to the method of the first aspect of the present invention, the web pages having the same protocol, domain name and virtual path mean that the parameter names of the protocol, domain name and virtual path are the same.
According to the method of the first aspect of the present invention, in step S2, the rule matching is a regular matching, and the first interaction point includes a data interaction point based on user editing and a data interaction point based on function triggering.
According to the method of the first aspect of the present invention, the step S3 specifically includes: extracting source codes of the website application; performing the lexical analysis on the source code to convert the source code into a corresponding Token stream; generating a syntax tree of the source code from the Token stream based on the syntax abstraction; converting the syntax tree into the control flow graph.
According to the method of the first aspect of the present invention, the step S4 specifically includes: extracting an input point and an output point for the path analysis from the source code based on the entry file information, wherein the input point is a superglobal variable used for acquiring user input data in the website application, and the output point is a function used for interacting with a system terminal, a database or a file system in the website application; determining an effective path between the input point and the output point to extract a data interaction point on the effective path; and deleting the interaction points which are overlapped with the first interaction points from the data interaction points on the effective path to obtain the second interaction points.
According to the method of the first aspect of the present invention, the data interaction point on the effective path is extracted based on the parameter constraint relationship of the effective path.
The invention discloses a website application user interaction point discovery system based on dynamic and static combination. The interaction point comprises a first interaction point in a web page of the website application and a second interaction point in source code of the website application, and the system comprises: a first processing unit configured to: determining a webpage to be crawled from a plurality of webpages applied by the website based on a similarity judgment strategy, wherein the similarity judgment strategy is used for grouping the plurality of webpages according to similarity so as to select one webpage in each group as the webpage to be crawled; a second processing unit configured to: extracting data interaction points in the webpage to be crawled as first interaction points by using a dynamic crawler in a rule matching mode; a third processing unit configured to: generating a control flow graph of the website application through lexical analysis and syntax abstraction in static code audit based on the source code of the website application; a fourth processing unit configured to: and performing path analysis on the control flow graph by using the entry file information extracted from the first interaction point to determine a data interaction point different from the first interaction point in the source code as a second interaction point.
According to the system of the second aspect of the present invention, the first processing unit is specifically configured to perform information acquisition on each web page in a login state by using a headless browser; extracting protocols, domain names and virtual paths of all the webpages from the acquired information; based on the similarity judgment strategy, classifying the webpages with the same protocol, domain name and virtual path into the same group; and selecting one webpage from each group as the webpage to be crawled.
According to the system of the second aspect of the present invention, the web pages having the same protocol, domain name and virtual path mean that the parameter names of the protocol, domain name and virtual path are the same.
According to the system of the second aspect of the present invention, the second processing unit is specifically configured to, the rule matching is a regular matching, and the first interaction point includes a data interaction point based on user editing and a data interaction point based on function triggering.
According to the system of the second aspect of the present invention, the third processing unit is specifically configured to extract the source code of the website application; performing the lexical analysis on the source code to convert the source code into a corresponding Token stream; generating a syntax tree of the source code from the Token stream based on the syntax abstraction; converting the syntax tree into the control flow graph.
According to the system of the second aspect of the present invention, the fourth processing unit is specifically configured to extract an input point and an output point for the path analysis from the source code based on the entry file information, where the input point is a hyper total office variable used for acquiring user input data in the website application, and the output point is a function used for interacting with a system terminal, a database, or a file system in the website application; determining an effective path between the input point and the output point to extract a data interaction point on the effective path; and deleting the interaction points which are overlapped with the first interaction points from the data interaction points on the effective path to obtain the second interaction points.
According to the system of the second aspect of the present invention, the fourth processing unit is specifically configured to extract the data interaction point on the effective path based on the parameter constraint relationship of the effective path.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the website application user interaction point discovery method based on dynamic and static combination in any one of the first aspect of the disclosure when executing the computer program.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program, which when executed by a processor, implements the steps of the website application user interaction point discovery method based on dynamic and static combination according to any one of the first aspect of the disclosure.
In conclusion, the technical scheme of the invention combines static analysis and dynamic crawling, and utilizes dynamic data to guide the static analysis to mine the target Web application data interaction point; therefore, the data interaction points in the Web application are efficiently, accurately and highly covered, namely the quantity of the excavated data interaction points of the Web application is maximized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flowchart of a website application user interaction point discovery method based on dynamic and static combination according to an embodiment of the present invention;
FIG. 2a-1 is a schematic diagram of static code auditing for data interaction point discovery, according to an embodiment of the invention;
FIGS. 2a-2 are diagrams of source code abstract syntax tree trees, according to embodiments of the present invention;
FIGS. 2a-3 are source code control flow diagrams according to embodiments of the invention;
FIG. 2b is a schematic diagram of a valid path including a constraint parameter relationship according to an embodiment of the present invention;
FIG. 2c is a schematic diagram of determining a first interaction point and a second interaction point according to an embodiment of the invention;
FIG. 3 is a structural diagram of a website application user interaction point discovery system based on dynamic and static combination according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The existing method for discovering the Web application data interaction points mainly adopts a dynamic crawler to traverse a target website page or an analysis website user manual to discover the target Web application data interaction points. The efficiency of the method for the dynamic crawler to traverse the website to obtain the target Web application data interaction points depends on the performance design of the crawler, and because the data interaction interfaces displayed in the webpage by the Web application are limited and only comprise partial interaction interfaces in the Web application, the interaction interfaces discovered by the dynamic crawler can only ensure the coverage rate of the data interaction points in the normal function, but cannot ensure the coverage rate of the data interaction points in the code function. The method for mining the interaction point of the target Web application data by analyzing the website user manual is limited by the website user manual and cannot mine an interaction point path containing parameters which do not appear in the manual. In order to solve the problems and improve the coverage rate of the Web application data interaction points, namely, the data interaction points in the Web application are explored as much as possible, the invention provides a scheme for exploring the Web application data interaction points by combining dynamic crawlers and static code auditing, and the quantity and the coverage rate of a Web application data interaction system can be effectively improved.
The invention discloses a website application user interaction point discovery method based on dynamic and static combination. The interaction point includes a first interaction point in a web page of the website application and a second interaction point in source code of the website application.
Fig. 1 is a flowchart of a website application user interaction point discovery method based on dynamic and static combination according to an embodiment of the present invention, and as shown in fig. 1, the method includes: s1, determining a webpage to be crawled from a plurality of webpages applied by the website based on a similarity judgment strategy, wherein the similarity judgment strategy is used for grouping the plurality of webpages according to similarity so as to select one webpage in each group as the webpage to be crawled; s2, extracting data interaction points in the webpage to be crawled as first interaction points by using a dynamic crawler in a rule matching mode; s3, generating a control flow graph of the website application through lexical analysis and syntax abstraction in static code audit based on the source code of the website application; and S4, performing path analysis on the control flow graph by using the entry file information extracted from the first interaction point to determine a data interaction point different from the first interaction point in the source code as a second interaction point.
In step S1, a web page to be crawled is determined from a plurality of web pages applied to the website based on a similarity determination policy, where the similarity determination policy is used to group the plurality of web pages according to similarity, so as to select one web page in each group as the web page to be crawled.
In some embodiments, the step S1 specifically includes: acquiring information of each webpage in a login state by using a headless browser; extracting protocols, domain names and virtual paths of all the webpages from the acquired information; classifying the web pages with the same protocol, domain name and virtual path into the same group based on the similarity judgment strategy; and selecting one webpage from each group as the webpage to be crawled. The crawler program has the browsing capacity of triggering a Web application page ajax and simulating interaction between a user and a server based on a headless browser.
In some embodiments, the web pages having the same protocol, domain name, and virtual path refer to the protocol, domain name, and virtual path having the same parameter name.
Specifically, a corresponding source code is downloaded from a target Web application official website, web application services are erected on a Web service end through the source code according to an installation manual, and it is determined that website services can be normally accessed through a browser without errors, so that the dynamic crawler test is facilitated. Secondly, the dynamic crawler simulates logging in Web application through login information provided by a user, keeps the logging state of the user, can more comprehensively cover pages in a website, collects information of the pages through a headless browser (browser), achieves browsing capabilities of triggering ajax, user interaction and asynchronization, and improves the total-station traversal efficiency by setting a similarity judgment strategy of URL (uniform resource locator), so that the problems of unlimited network crawling and circular network crawling are avoided.
In some embodiments, for example, hrrp: domain/path/? x =1 and hrrp: domain/path/? x =2 is similar, only crawls the information of one of them URL at the in-process that the crawler crawled the net, can effectual improvement crawler crawl the efficiency of net.
And S2, extracting the data interaction points in the webpage to be crawled as first interaction points by using a dynamic crawler in a rule matching mode.
In some embodiments, in step S2, the rule match is a regular match, and the first interaction point includes a data interaction point based on user editing and a data interaction point based on function triggering.
Specifically, all data interaction points in the web page are captured in an identification and matching manner, that is, all interaction points where a user submits and acquires data with the server, for example, user editable data interaction points such as a user login box, a drop-down box, a file upload form, and the like, and all trigger points of non-user interactive website functions are included.
The rule matching refers to extracting content meeting a matching rule in web application page content by using a regular matching method, and includes all interaction points for data submission and acquisition between a user and a server, for example, user editable data interaction points such as a user login box, a drop-down box and a file upload form, and all trigger points of non-user interactive website functions. And simulating the user to input data aiming at the data interaction points which can be edited by the user, intercepting the HTTP request message aiming at the trigger points of the non-user interactive website functions by triggering javascript in the page, and extracting URL information, cookie information and request body content, namely data interaction point information.
And S3, generating a control flow graph of the website application through lexical analysis and syntax abstraction in static code audit based on the source code of the website application.
Fig. 2a-1 is a schematic diagram of data interaction point discovery during static code auditing according to an embodiment of the present invention, as shown in fig. 2a-1, in some embodiments, the step S3 specifically includes: extracting source codes of the website application; performing the lexical analysis on the source code to convert the source code into a corresponding Token stream; generating a syntax tree of the source code from the Token stream based on the syntax abstraction; converting the syntax tree into the control flow graph.
Specifically, the target Web application source code is converted into a Token stream, so that an Abstract Syntax Tree (AST) corresponding to the source code is generated, a corresponding control flow graph is generated based on the Abstract Syntax Tree, and further path analysis is performed based on the control flow graph.
Specifically, the source code is as follows:
Figure GDA0003958547580000091
specifically, an abstract syntax tree of the source code may be generated through lexical analysis, as shown in fig. 2a-2, and a control flow graph of the Web application source code control flow graph is generated based on the abstract syntax tree, as shown in fig. 2 a-3.
In step S4 (as shown in fig. 1), using entry file information extracted from the first interaction point, performing path analysis on the control flow graph to determine a data interaction point different from the first interaction point in the source code as a second interaction point.
In some embodiments, the step S4 specifically includes: extracting an input point and an output point for the path analysis from the source code based on the entry file information (including entries and parameter information in the dynamic crawler data, as shown in fig. 2a-1, 2a-2 and 2 a-3), where the input point is a super-global variable used for acquiring user input data in the website application, and the output point is a function used for interacting with a system terminal, a database or a file system in the website application; determining an effective path between the input point and the output point to extract data interaction points on the effective path; and deleting the interaction points which are overlapped with the first interaction points from the data interaction points on the effective path to obtain the second interaction points.
Specifically, the input point refers to a common input point in a php-language-based Web application source code, which provides a super-global variable for acquiring user input data in the Web application, and is shown in table 1; the output points refer to sensitive functions in the Web application that interact with the system terminal, database or file system, as shown in table 2.
Table 1 common entry points
Figure GDA0003958547580000101
Table 2 sensitivity function in general
Figure GDA0003958547580000102
FIG. 2b is a schematic diagram of an active path including a constraint parameter relationship according to an embodiment of the invention; as shown in fig. 2b, in some embodiments, based on the parameter constraint relationship of the effective path, the data interaction point on the effective path is extracted, so that the externally input data can satisfy various conditions in the effective path.
Specifically, to find a data interaction point, it is first necessary to determine a Source point, i.e., an input point, for example, an input Source of the PHP code includes some superglobal variables, a file read operation, and a database operation, where the superglobal variables may generally obtain direct inputs of a user, such as $ _ GET, $ POST, and $ _ cool, and then determine a Sink point, i.e., an output point, for example, sensitive functions in the PHP code, such as eval, exec, system, move _ loaded _ file, and mysql _ query. And then searching a path from the Source point to the Sink point.
According to the entry and parameter information in the dynamic crawler data, entry file information and Sink point information with partial functions can be quickly positioned, the trigger is carried out from the Sink point, the path from the Source to the Sink is backtracked and analyzed, and therefore data interaction points which cannot be obtained through the dynamic crawler can be quickly searched according to the information. Meanwhile, the dynamic crawler data can assist in guiding the problem that the dynamic files cannot be solved in static code audit, the inclusion relation of partial files can be determined, and cross-file path discovery can be better realized, so that the coverage of data interaction point discovery is further improved. And finally, calculating a parameter constraint relation in the path according to the path from the Source to the Sink, wherein as shown in fig. 2b, the parameter constraint relations of the two paths from the Source to the Sink are respectively as follows: and x >5 and y = xxx and z = test, x >, 5 and y = zzz = z = check, and combining the public entry file and the parameter constraint relation, thereby obtaining the URL data interaction point.
If the data repetition limitation is not performed on the static code analysis, part of data obtained by the dynamic crawler and part of data obtained by the dynamic crawler are repeated in the data interaction points obtained by the static analysis, so that in the static code auditing process, whether the parameter relation in the path is repeated with the data in the dynamic crawler or not is judged, and if the parameter relation in the path is repeated, the analysis of the path is finished, and the efficiency is further improved.
FIG. 2c is a schematic diagram of determining a first interaction point and a second interaction point according to an embodiment of the invention; as shown in fig. 2c, for the first interaction point (the data interaction point in the target Web application), a dynamic crawler is used to perform data interaction point mining, so that data interaction point information of a normal function in a Web page of the Web application can be acquired more efficiently, a user is simulated to trigger all JS requests in the Web page, and all data interaction points in the Web page are captured, that is, all interaction points where the user submits and acquires data with a server, for example, user editable data interaction points such as a user login box, a drop-down box, a file upload form, and all trigger points of non-user interactive Web functions are included.
For a second interaction point (a data interaction point which can be mined in a webpage, namely an interaction point mined from a source code), extracting the data interaction point in a Web application source code through static code auditing, namely acquiring path parameter constraint information from a source point to a sink point, generating a control flow graph of the source code based on the abstract syntax tree by converting the source code into the abstract syntax tree, analyzing the path information from the source point to the sink point in the structural relationship of the control flow graph, calculating constraint values of all parameters in a path, namely determining the values of the parameters, enabling input data to reach the sink point, and guiding the relationship dynamically contained in a part of files and part of entry files and parameters in the static code auditing through dynamic crawler data so as to improve the data interaction point mining efficiency of the static code auditing method.
The second aspect of the invention discloses a website application user interaction point discovery system based on dynamic and static combination. The interaction point includes a first interaction point in a web page of the website application and a second interaction point in source code of the website application.
Fig. 3 is a structural diagram of a website application user interaction point discovery system based on dynamic and static combination according to an embodiment of the present invention, as shown in fig. 3. The system 300 includes: a first processing unit 301 configured to: determining a webpage to be crawled from a plurality of webpages applied by the website based on a similarity judgment strategy, wherein the similarity judgment strategy is used for grouping the plurality of webpages according to similarity so as to select one webpage in each group as the webpage to be crawled; a second processing unit 302 configured to: extracting data interaction points in the webpage to be crawled as first interaction points by using a dynamic crawler in a rule matching mode; a third processing unit 303 configured to: generating a control flow graph of the website application through lexical analysis and syntax abstraction in static code audit based on the source code of the website application; a fourth processing unit 304 configured to: and performing path analysis on the control flow graph by using the entry file information extracted from the first interaction point to determine a data interaction point different from the first interaction point in the source code as a second interaction point.
According to the system of the second aspect of the present invention, the first processing unit is specifically configured to perform information acquisition on each web page in a login state by using a headless browser; extracting protocols, domain names and virtual paths of all the webpages from the acquired information; based on the similarity judgment strategy, classifying the webpages with the same protocol, domain name and virtual path into the same group; and selecting one webpage from each group as the webpage to be crawled.
According to the system of the second aspect of the present invention, the web pages having the same protocol, domain name and virtual path mean that the parameter names of the protocol, domain name and virtual path are the same.
According to the system of the second aspect of the present invention, the second processing unit is specifically configured to, the rule matching is a regular matching, and the first interaction point includes a data interaction point based on user editing and a data interaction point based on function triggering.
According to the system of the second aspect of the present invention, the third processing unit is specifically configured to extract the source code of the website application; performing the lexical analysis on the source code to convert the source code into a corresponding Token stream; generating a syntax tree of the source code from the Token stream based on the syntax abstraction; converting the syntax tree into the control flow graph.
According to the system of the second aspect of the present invention, the fourth processing unit is specifically configured to extract an input point and an output point for the path analysis from the source code based on the entry file information, where the input point is a super-global variable used for acquiring user input data in the website application, and the output point is a function used for interacting with a system terminal, a database, or a file system in the website application; determining an effective path between the input point and the output point to extract a data interaction point on the effective path; and deleting the interaction points which are overlapped with the first interaction points from the data interaction points on the effective path to obtain the second interaction points.
According to the system of the second aspect of the present invention, the fourth processing unit is specifically configured to extract the data interaction point on the effective path based on the parameter constraint relationship of the effective path.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the website application user interaction point discovery method based on dynamic and static combination in any one of the first aspect of the disclosure when executing the computer program.
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device, which are connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that the structure shown in fig. 4 is only a partial block diagram related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the solution of the present application is applied, and a specific electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program, which when executed by a processor, implements the steps of the website application user interaction point discovery method based on dynamic and static combination according to any one of the first aspect of the disclosure.
In summary, the technical solutions of the aspects of the present invention mainly achieve the following effects: (1) By utilizing the method, the Web application source code can be fully utilized, and the utilization rate of the Web application source code is improved from the perspective of combining static analysis and dynamic crawling; (2) By using the method, the dynamic data can guide the static code audit, the defects contained in the dynamic file in part of the static code audit are solved in an auxiliary way, and the analysis efficiency is improved; (3) By the method, the quantity and coverage of Web application data interaction points can be effectively increased, and the defect that a dynamic crawler traverses and explores the data interaction points of the target website page is overcome.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A website application user interaction point discovery method based on dynamic and static combination is characterized in that interaction points comprise a first interaction point in a webpage of a website application and a second interaction point in a source code of the website application, and the method comprises the following steps:
s1, determining a webpage to be crawled from a plurality of webpages applied by the website based on a similarity judgment strategy, wherein the similarity judgment strategy is used for grouping the plurality of webpages according to similarity so as to select one webpage in each group as the webpage to be crawled;
s2, extracting data interaction points in the webpage to be crawled as first interaction points by using a dynamic crawler in a rule matching mode;
s3, generating a control flow graph of the website application through lexical analysis and syntax abstraction in static code audit based on the source code of the website application;
s4, performing path analysis on the control flow graph by using the entry file information extracted from the first interaction point to determine a data interaction point different from the first interaction point in the source code as a second interaction point;
wherein, the step S4 specifically includes:
extracting an input point and an output point for the path analysis from the source code based on the entry file information, wherein the input point is a superglobal variable used for acquiring user input data in the website application, and the output point is a function used for interacting with a system terminal, a database or a file system in the website application;
determining an effective path between the input point and the output point to extract a data interaction point on the effective path;
and deleting the interaction points which are overlapped with the first interaction points from the data interaction points on the effective path to obtain the second interaction points.
2. The method for discovering website application user interaction points based on dynamic and static combination according to claim 1, wherein the step S1 specifically comprises:
acquiring information of each webpage in a login state by using a headless browser;
extracting protocols, domain names and virtual paths of all the webpages from the acquired information;
classifying the web pages with the same protocol, domain name and virtual path into the same group based on the similarity judgment strategy;
and selecting one webpage from each group as the webpage to be crawled.
3. The method for discovering website application user interaction points based on dynamic and static combination as claimed in claim 2, wherein the web pages having the same protocol, domain name and virtual path mean that the parameter names of the protocol, domain name and virtual path are the same.
4. The method for discovering website application user interaction points based on dynamic and static combination as claimed in claim 1, wherein in the step S2, the rule matching is a regular matching, and the first interaction point includes a data interaction point based on user editing and a data interaction point based on function triggering.
5. The method for discovering website application user interaction points based on dynamic and static combination according to claim 1, wherein the step S3 specifically comprises:
extracting source codes of the website application;
performing the lexical analysis on the source code to convert the source code into a corresponding Token stream;
generating a syntax tree of the source code from the Token stream based on the syntax abstraction;
converting the syntax tree into the control flow graph.
6. The method for discovering website application user interaction points based on dynamic and static combination as claimed in claim 1, wherein the data interaction points on the effective path are extracted based on a parameter constraint relationship of the effective path.
7. A website application user interaction point discovery system based on dynamic and static combination, wherein the interaction points comprise a first interaction point in a webpage of the website application and a second interaction point in a source code of the website application, the system comprising:
a first processing unit configured to: determining a webpage to be crawled from a plurality of webpages applied by the website based on a similarity judgment strategy, wherein the similarity judgment strategy is used for grouping the plurality of webpages according to similarity so as to select one webpage in each group as the webpage to be crawled;
a second processing unit configured to: extracting data interaction points in the webpage to be crawled as first interaction points by using a dynamic crawler in a rule matching mode;
a third processing unit configured to: generating a control flow graph of the website application through lexical analysis and syntax abstraction in static code audit based on the source code of the website application;
a fourth processing unit configured to: performing path analysis on the control flow graph by using entry file information extracted from the first interaction point to determine a data interaction point different from the first interaction point in the source code as a second interaction point;
the fourth processing unit is specifically configured to:
extracting an input point and an output point for the path analysis from the source code based on the entry file information, wherein the input point is a super-global variable used for acquiring user input data in the website application, and the output point is a function used for interacting with a system terminal, a database or a file system in the website application;
determining an effective path between the input point and the output point to extract a data interaction point on the effective path;
and deleting the interaction points which are overlapped with the first interaction points from the data interaction points on the effective path to obtain the second interaction points.
8. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for discovering a dynamic and static website application user interaction point according to any one of claims 1 to 6 when executing the computer program.
9. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when executed by a processor, the computer program implements the steps in the method for discovering user interaction points of a website application based on dynamic and static combination according to any one of claims 1 to 6.
CN202210160099.6A 2022-02-22 2022-02-22 Website application user interaction point discovery method and system based on dynamic and static combination Active CN114626062B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210160099.6A CN114626062B (en) 2022-02-22 2022-02-22 Website application user interaction point discovery method and system based on dynamic and static combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210160099.6A CN114626062B (en) 2022-02-22 2022-02-22 Website application user interaction point discovery method and system based on dynamic and static combination

Publications (2)

Publication Number Publication Date
CN114626062A CN114626062A (en) 2022-06-14
CN114626062B true CN114626062B (en) 2023-03-24

Family

ID=81899944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210160099.6A Active CN114626062B (en) 2022-02-22 2022-02-22 Website application user interaction point discovery method and system based on dynamic and static combination

Country Status (1)

Country Link
CN (1) CN114626062B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8516449B2 (en) * 2009-07-14 2013-08-20 International Business Machines Corporation Detecting and localizing security vulnerabilities in client-server application
CN104766014B (en) * 2015-04-30 2017-12-01 安一恒通(北京)科技有限公司 For detecting the method and system of malice network address
GB201611967D0 (en) * 2016-07-08 2016-08-24 Cocotec Ltd An interoperable extensible system for the generation of verified software
CN108846286A (en) * 2018-06-21 2018-11-20 腾讯科技(深圳)有限公司 Cross site scripting leak detection method and device
CN109462583B (en) * 2018-10-31 2021-04-20 南京邮电大学 Reflection-type vulnerability detection method based on combination of static state and dynamic state

Also Published As

Publication number Publication date
CN114626062A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
Mesbah et al. Crawling Ajax-based web applications through dynamic analysis of user interface state changes
JP5425699B2 (en) Information processing apparatus, test case generation method, program, and recording medium
CN113127771A (en) Application point burying method and device, computing equipment and system
KR101908162B1 (en) Live browser tooling in an integrated development environment
EP2580677B1 (en) Web site implementation by mapping expression evaluation
CN107766344B (en) Template rendering method and device and browser
CN108228228B (en) Application software publishing method and device
CN110825619A (en) Automatic generation method and device of interface test case and storage medium
CN103294732A (en) Web page crawling method and spider
Huang et al. UChecker: Automatically detecting php-based unrestricted file upload vulnerabilities
KR101765296B1 (en) Apparatus and method for providing data analysis tool with user created analysis module
Tonella et al. Recent advances in web testing
CN110555146A (en) method and system for generating network crawler camouflage data
CN112363953A (en) Interface test case generation method and system based on crawler technology and rule engine
Ma et al. Aladdin: Automating release of deep-link APIs on Android
CN107391528B (en) Front-end component dependent information searching method and equipment
CN116560683A (en) Software updating method, device, equipment and storage medium
Vogel et al. An in-depth analysis of web page structure and efficiency with focus on optimization potential for initial page load
Alalfi et al. An approach to clone detection in sequence diagrams and its application to security analysis
CN114626062B (en) Website application user interaction point discovery method and system based on dynamic and static combination
CN116719735A (en) Test case generation method and device
Bajaj et al. Dompletion: DOM-aware JavaScript code completion
Roy Choudhary Cross-platform testing and maintenance of web and mobile applications
CN116451271A (en) Automatic privacy policy extraction method for application software
Panum et al. Kraaler: A user-perspective web crawler

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant