WO2021031902A1 - Url提取方法、装置、设备及计算机可读存储介质 - Google Patents

Url提取方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2021031902A1
WO2021031902A1 PCT/CN2020/108187 CN2020108187W WO2021031902A1 WO 2021031902 A1 WO2021031902 A1 WO 2021031902A1 CN 2020108187 W CN2020108187 W CN 2020108187W WO 2021031902 A1 WO2021031902 A1 WO 2021031902A1
Authority
WO
WIPO (PCT)
Prior art keywords
url
server
uri
root directory
preset
Prior art date
Application number
PCT/CN2020/108187
Other languages
English (en)
French (fr)
Inventor
邵樊
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021031902A1 publication Critical patent/WO2021031902A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation

Definitions

  • This application relates to the technical field of financial technology (Fintech), in particular to a URL extraction method, device, equipment, and computer-readable storage medium.
  • the main purpose of this application is to provide a URL extraction method, device, equipment, and computer-readable storage medium, aiming to solve the problems of URL omissions and poor integrity in the prior art.
  • the present application provides a URL extraction method, and the URL extraction method includes:
  • the step of generating a corresponding uniform resource locator URL according to the detection result and a preset rule includes:
  • the first URI, the server_name information, and the port information are spliced in a first splicing manner to generate a corresponding URL.
  • the step of generating a corresponding uniform resource locator URL according to the detection result and a preset rule includes:
  • the routing configuration file of the Web framework is obtained, and the URI is obtained according to the routing configuration file, which is recorded as the second URI;
  • the second splicing method performs splicing to generate the corresponding URL.
  • the step of obtaining the routing configuration file of the Web framework and obtaining the URI according to the routing configuration file, which is marked as the second URI includes:
  • the routing configuration file is parsed to obtain the routing rules of the Web framework, and the URI is obtained according to the routing rules, which is recorded as the second URI.
  • the step of matching and obtaining the root root directory and the CGI path of the common gateway interface from the location block includes:
  • the root directory and the CGI path are matched from the location block based on a preset matching rule, where the preset matching rule includes one or more of exact matching, prefix matching, regular matching, normal matching and full matching.
  • this application also provides a URL extraction device, the URL extraction device including:
  • the obtaining module is configured to obtain the context pointer object ctx of the preset module in the preset array
  • the matching module is configured to traverse each service server block in the ctx and locate the location block in the server block, and obtain the root root directory and the common gateway interface CGI path from the location block by matching;
  • the generating module is configured to detect whether the root directory matches the prefix string of the CGI path, obtain a detection result, and generate a corresponding uniform resource locator URL according to the detection result and a preset rule.
  • the generating module includes:
  • the first obtaining unit is configured to, after determining that the root directory matches the prefix character string of the CGI path, obtain a first uniform resource identifier URI according to the root directory and the CGI path, and obtain the The name server_name information and port information of the server block corresponding to the location block;
  • the first generating unit is configured to splice the first URI, the server_name information, and the port information in a first splicing manner to generate a corresponding URL.
  • the generating module further includes:
  • the detection unit is configured to, after determining that the root directory does not match the prefix string of the CGI path, obtain the index.php entry file in the nginx.conf file according to the root directory and the CGI path, and detect all Whether the index.php entry file meets the preset conditions to detect whether the Web framework corresponding to the CGI path is an MVC framework;
  • the second obtaining unit is configured to determine that the Web framework corresponding to the CGI path is an MVC framework, obtain a routing configuration file of the Web framework, and obtain a URI according to the routing configuration file, which is recorded as a second URI;
  • the second generating unit is configured to obtain the location_value value of the location block, and obtain the server_name information and port information of the server block corresponding to the location block, and combine the location_value value, the server_name information, the port information and The second URI is spliced in the second splicing manner to generate a corresponding URL.
  • the present application also provides a URL extraction device
  • the URL extraction device includes: a memory, a processor, and a URL extraction program stored on the memory and running on the processor, so When the URL extraction program is executed by the processor, the steps of the URL extraction method described above are implemented.
  • the present application also provides a computer-readable storage medium with a URL extraction program stored on the computer-readable storage medium, and when the URL extraction program is executed by a processor, the URL extraction as described above is realized Method steps.
  • This application provides a method, device, device, and computer-readable storage medium for URL extraction.
  • obtain the context pointer object ctx of the preset module in the preset array traverse each server block in the ctx and locate the location in the server block Block, and then match the root directory and CGI path from the location block; finally check whether the root directory matches the prefix string of the CGI path, that is, check whether the root directory is the prefix string of the CGI path, and get the detection result, and then according to the detection
  • the result and preset rules generate the corresponding URL.
  • this application uses the nginx.conf configuration file on the server host to reversely parse the URL in combination with the CGI path.
  • the extraction result is very accurate, and there is no URL omission.
  • this application can improve the completeness of URL extraction.
  • FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the application;
  • Fig. 3 is a schematic diagram of functional modules of the first embodiment of the URL extraction device of the present application.
  • FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the application.
  • the URL extraction device in the embodiment of this application may be a smart phone, or a terminal device such as a PC (Personal Computer), a tablet computer, and a portable computer.
  • a terminal device such as a PC (Personal Computer), a tablet computer, and a portable computer.
  • the URL extraction device may include a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a Wi-Fi interface).
  • the memory 1005 can be a high-speed RAM memory or a stable memory (non-volatile memory), such as disk storage.
  • the memory 1005 may also be a storage device independent of the foregoing processor 1001.
  • the structure of the URL extraction device shown in FIG. 1 does not constitute a limitation on the URL extraction device, and may include more or less components than shown in the figure, or a combination of certain components, or different components Layout.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a URL extraction program.
  • the network interface 1004 is mainly used to connect to a back-end server and communicate with the back-end server;
  • the user interface 1003 is mainly used to connect to a client and communicate with the client;
  • the processor 1001 can be used to Call the URL extraction program stored in the storage 1005, and perform the following operations:
  • processor 1001 may call the URL extraction program stored in the memory 1005, and also perform the following operations:
  • the first URI, the server_name information, and the port information are spliced in a first splicing manner to generate a corresponding URL.
  • processor 1001 may call the URL extraction program stored in the memory 1005, and also perform the following operations:
  • the Web framework corresponding to the CGI path is an MVC framework, obtaining a routing configuration file of the Web framework, and obtaining a URI according to the routing configuration file, which is recorded as a second URI;
  • the second splicing method performs splicing to generate the corresponding URL.
  • processor 1001 may call the URL extraction program stored in the memory 1005, and also perform the following operations:
  • the routing configuration file is parsed to obtain the routing rules of the Web framework, and the URI is obtained according to the routing rules, which is recorded as the second URI.
  • processor 1001 may call the URL extraction program stored in the memory 1005, and also perform the following operations:
  • the root directory and the CGI path are matched from the location block based on a preset matching rule, where the preset matching rule includes one or more of exact matching, prefix matching, regular matching, normal matching and full matching.
  • This application provides a URL extraction method.
  • Fig. 2 is a schematic flowchart of a first embodiment of a method for extracting a URL according to this application.
  • the URL extraction method includes:
  • Step S10 obtaining the context pointer object ctx of the preset module in the preset array
  • the URL extraction method in this embodiment is implemented by a URL extraction device, and the device is described using a server as an example.
  • the web service and nginx service are deployed on the server.
  • the modified nginx executable file is also deployed.
  • the nginx.conf configuration file can be reversely parsed, and the following URL extraction method can be executed.
  • the transformed nginx is based on the existing nginx for transformation, mainly in the form of adding code to implement the server on the existing nginx name:port/uri or server
  • the URL composed of name:port/trv file value/index.php is written to the function stored on the server host to obtain the modified nginx.
  • the preset array can be cycle->conf_ctx
  • cycle->conf_ctx is an array, each element of the array corresponds to the configuration context of a module, the module here refers to the server block and http block of the nginx.conf configuration file
  • the preset module can be the ngx_http_core_module module, the implementation of the http protocol in nginx, most of the core instructions of the http framework, the processing flow of the entire life cycle of the http request (or the processing framework of the http request) are all implemented in this module;
  • ctx Point to the module context structure, and different types of modules usually point to different types of structures, and the structure usually contains several function pointers.
  • Step S20 traverse each service server block in the ctx, and locate the location block in the server block, and obtain the root root directory and the common gateway interface CGI path from the location block by matching;
  • each server (service) block in ctx traverses each server (service) block in ctx, and locate the location block in the server block, and then match the root directory and CGI (Common Gateway Interface) path from the location block. .
  • CGI Common Gateway Interface
  • the step of "matching the root root directory and the CGI path of the common gateway interface from the location block” includes: matching the root directory and the CGI path from the location block based on a preset matching rule, wherein the preset matching
  • the rules include one or more of exact matching, prefix matching, regular matching, normal matching, and full matching.
  • root directory and CGI path matching it can be matched through any one of exact matching, prefix matching, regular matching, normal matching, and full matching.
  • regular matching the commands used are ⁇ and ⁇ *. The former means using regular and is case sensitive, and the latter means using regular and is not case sensitive.
  • Normal matching the command is empty, that is, the matching command is not specified as normal matching. Full match is the same as normal match, there is no match instruction.
  • the specific matching method is the same as in the prior art, and will not be detailed here. It should be noted that if multiple matching methods are used for matching, one big principle and two small details can be followed.
  • the main principle is about the priority of the matching mode: exact matching>prefix matching>regular matching>normal matching>full matching.
  • the minor details are in the same priority: detail one, the matching will stop after the regular match is successful, and the non-regular match will continue to be matched; the second detail, among all the root directories and CGI paths that match successfully, select the one with the highest matching degree.
  • Step S30 detecting whether the root directory matches the prefix character string of the CGI path, obtaining a detection result, and generating a corresponding uniform resource locator URL according to the detection result and a preset rule.
  • the root directory matches the prefix string of the CGI path, that is, whether the root directory is the prefix string of the CGI path, and the detection result is obtained, and then the corresponding uniform resource locator URL is generated according to the detection result and preset rules.
  • the first URI Uniform Resource Identifier, uniform resource identifier
  • the server_name (name) information and port (port) information of the server block corresponding to the location block and then splice the first URI, server_name information and port information according to the first splicing method to generate The corresponding URL (Uniform Resource Locator, Uniform Resource Locator).
  • the index.php entry file in the nginx.conf file is obtained according to the root directory and CGI path, and the index.php entry file is checked whether it meets the preset conditions to detect Whether the web framework corresponding to the CGI path is an MVC framework; if the index.php entry file meets the preset conditions, it means that the web framework corresponding to the CGI path is an MVC framework.
  • the embodiment of the present application provides a method for extracting URL. Firstly, obtain the context pointer object ctx of the preset module in the preset array; traverse each server block in ctx, locate to the location block in the server block, and then match from the location block Get the root directory and CGI path; finally check whether the prefix string of the root directory matches the CGI path, that is, check whether the root directory is the prefix string of the CGI path, get the detection result, and then generate the corresponding according to the detection result and preset rules URL. Since the result of nginx parsing configuration is stored in a specific data structure, in this embodiment, the nginx.conf configuration file on the server host is used to reversely parse the URL in combination with the CGI path. The extraction result is very accurate, and there is no URL. In the case of omission, compared with the URL acquisition method in the prior art, the embodiment of the present application can improve the integrity of URL extraction.
  • step S30 may include:
  • Step a1 determining that the root directory matches the prefix character string of the CGI path, obtain a first uniform resource identifier URI according to the root directory and the CGI path, and obtain the server corresponding to the location block
  • the CGI path After confirming that the root directory matches the prefix string of the CGI path, that is, when the root directory is the prefix string of the CGI path, it indicates that the CGI path corresponds to the URI of the location block. In this case, the first is obtained according to the root directory and CGI path.
  • a URI Uniform Resource Identifier, uniform resource identifier
  • the CGI path removes the prefix string (ie root directory) to get the first URI, and then obtains the server_name (name) information and port (port) of the server block corresponding to the location block Information to form a domain name based on server_name information and port information.
  • Step a2 splicing the first URI, the server_name information, and the port information in a first splicing manner to generate a corresponding URL.
  • the specific first splicing method is: http://server_name:port/first URI.
  • the first URL generation method mentioned above is suitable for reverse extraction and generation of URLs for non-MVC (Model View Controller, Model View Controller) frameworks, that is, the root directory matches the prefix string of the CGI path Scene.
  • step S30 may also include:
  • Step a3 After determining that the root directory does not match the prefix string of the CGI path, obtain the index.php entry file in the nginx.conf file according to the root directory and the CGI path, and detect the index Whether the .php entry file meets preset conditions to detect whether the Web framework corresponding to the CGI path is an MVC framework;
  • the following method needs to be used to extract and generate the URL. Specifically, first obtain the index.php entry file in the nginx.conf file according to the root directory and CGI path, that is, determine the location of the index.php entry file on the server according to the root directory and CGI path, and then obtain the index.php entry file. After obtaining the index.php entry file, it is checked whether the index.php entry file meets the preset conditions, so as to detect whether the web framework corresponding to the CGI path is an MVC framework.
  • the preset condition can be set according to the characteristics of each MVC framework. Since the MVC framework includes multiple types, the preset condition correspondence may include one or more. It is detected whether the index.php entry file meets the preset conditions. For either one, it can be determined that the Web framework corresponding to the CGI path is the MVC framework. For example, for the CodeIgniter framework (a simple and fast PHP MVC framework), you can set a default condition that core/CodeIgniter.php exists in the index.php entry file, if the index.php entry file is detected by forward matching core/CodeIgniter.php, it means that the web framework corresponding to the CGI path is the MVC framework.
  • the CodeIgniter framework a simple and fast PHP MVC framework
  • Step a4 it is determined that the Web framework corresponding to the CGI path is the MVC framework, then the routing configuration file of the Web framework is obtained, and the URI is obtained according to the routing configuration file, which is recorded as the second URI;
  • step a4 includes:
  • Step a41 calling and executing the index.php entry file through the php execution file of the preset hypertext preprocessor php extension plug-in to obtain the syntax tree executed by the index.php entry file;
  • the php execution file of the extension plug-in calls and executes the index.php entry file to obtain the syntax tree executed by the index.php entry file.
  • the preset php extension plug-in is modified on the basis of the existing php syntax parse tree extension plug-in source code, and the php code can be parsed into an abstract syntax tree (Abstract Syntax tree) through the preset php extension plug-in. Tree, AST), where the structure of the code is defined in the abstract syntax tree.
  • Step a42 obtaining the routing configuration file of the Web framework from the syntax tree according to the type of the Web framework;
  • the routing configuration file of the web framework is obtained from the syntax tree according to the type of the web framework.
  • the type of the Web framework can be determined according to the preset conditions that the index.php entry file meets. For example, in the above example, if core/CodeIgniter.php is detected in the index.php entry file, the Web framework can be determined The type is CodeIgniter framework.
  • the routing configuration files of each framework have fixed characteristics and paths, such as the CodeIgniter framework, the corresponding characteristic config/routes.php can be matched through regular matching, and the routing configuration of the Web framework can be obtained from the syntax tree.
  • File information such as: "/data/htdocs/a_ci_application_4De/config/routes.php", you can get the path where the routes.php file is located, and then get the routing configuration file of the Web framework.
  • Step a43 Parse the routing configuration file to obtain the routing rules of the Web framework, and obtain the URI according to the routing rules, which is recorded as the second URI.
  • the routing configuration file is parsed to obtain the routing rules of the Web framework, and the URI is obtained according to the routing rules, which is recorded as the second URI.
  • the routing rules For example, in the above example, if the content of the routes.php file is read as follows:
  • Step a5 Obtain the location_value value of the location block, and obtain the server_name information and port information of the server block corresponding to the location block, and combine the location_value value, the server_name information, the port information and the second
  • the URI is spliced in the second splicing mode to generate the corresponding URL.
  • the location_value value of the location block is obtained, and the server_name information and port information of the server block corresponding to the location block are obtained, and the location_value value, server_name information, port information and the second URI are spliced in the second way Perform splicing to generate the corresponding URL.
  • the specific second splicing method is: http://server_name:port/location_value/index.php/second URI.
  • This application also provides a URL extraction device.
  • FIG. 3 is a schematic diagram of the functional modules of the first embodiment of the URL extraction device of this application.
  • the URL extraction device includes:
  • the obtaining module 10 is configured to obtain the context pointer object ctx of the preset module in the preset array;
  • the matching module 20 is configured to traverse each service server block in the ctx and locate the location block in the server block, and obtain the root root directory and the common gateway interface CGI path by matching from the location block;
  • the generating module 30 is configured to detect whether the root directory matches the prefix string of the CGI path, obtain a detection result, and generate a corresponding uniform resource locator URL according to the detection result and preset rules.
  • the generating module 30 includes:
  • the first obtaining unit is configured to determine that the root directory matches the prefix character string of the CGI path, then obtain a first uniform resource identifier URI according to the root directory and the CGI path, and obtain a first uniform resource identifier URI with the location
  • the first generating unit is configured to splice the first URI, the server_name information, and the port information in a first splicing manner to generate a corresponding URL.
  • the generating module 30 further includes:
  • the detection unit is configured to determine that the root directory does not match the prefix string of the CGI path, then obtain the index.php entry file in the nginx.conf file according to the root directory and the CGI path, and detect the Whether the index.php entry file meets preset conditions to detect whether the Web framework corresponding to the CGI path is an MVC framework;
  • the second acquiring unit is configured to, after determining that the Web framework corresponding to the CGI path is the MVC framework, acquire the routing configuration file of the Web framework, and obtain the URI according to the routing configuration file, which is recorded as the second URI;
  • the second generating unit is configured to obtain the location_value value of the location block, and obtain the server_name information and port information of the server block corresponding to the location block, and combine the location_value value, the server_name information, the port information and The second URI is spliced in the second splicing manner to generate a corresponding URL.
  • the second acquiring unit includes:
  • the execution subunit is configured to call and execute the index.php entry file through the php execution file of the preset hypertext preprocessor php extension plug-in to obtain the syntax tree executed by the index.php entry file;
  • An obtaining subunit configured to obtain the routing configuration file of the web framework from the syntax tree according to the type of the web framework
  • the parsing subunit is configured to parse the routing configuration file to obtain the routing rule of the Web framework, and obtain a URI according to the routing rule, which is recorded as a second URI.
  • the matching module 20 is specifically configured to: match the root directory and the CGI path from the location block based on preset matching rules, where the preset matching rules include exact matching, prefix matching, regular matching, and normal matching. One or more of match and full match.
  • each module in the above-mentioned URL extraction device corresponds to each step in the above-mentioned URL extraction method embodiment, and its functions and realization process are not repeated here.
  • the present application also provides a computer-readable storage medium having a URL extraction program stored on the computer-readable storage medium, and when the URL extraction program is executed by a processor, the URL extraction method as described in any of the above embodiments is implemented step.

Abstract

一种URL提取方法、装置、设备及计算机可读存储介质,涉及金融科技技术领域。该URL提取方法包括:获取预设数组中预设模块的上下文指针对象ctx(S10);遍历所述ctx中的各服务server块,并定位至所述server块中的位置location块,从所述location块中匹配得到根root目录和通用网关接口CGI路径(S20);检测所述root目录与所述CGI路径的前缀字符串是否相匹配,得到检测结果,并根据所述检测结果和预设规则生成对应的统一资源定位符URL(S30)。

Description

URL提取方法、装置、设备及计算机可读存储介质
本申请要求于2019年08月20日提交中国专利局、申请号为201910776693.6、发明名称为“URL提取方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及金融科技(Fintech)技术领域,尤其涉及一种URL提取方法、装置、设备及计算机可读存储介质。
背景技术
随着计算机技术的发展,越来越多的技术(大数据、分布式、区块链Blockchain、人工智能等)应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变,但由于金融行业的安全性、实时性要求,也对技术提出了更高的要求。
随着计算机网络技术的快速发展,Web及其相关技术得到广泛的应用,网站设计的需求越来越大,但同时其安全问题也日益突出,如何保障Web应用的安全性已成为一个重点关注的问题。目前,通常是采用Web漏洞扫描器对Web应用或网站中存在的漏洞进行扫描检测。通过Web漏洞扫描器进行Web漏洞扫描的前提是需要知道URL(Uniform Resource Locator,统一资源定位符),Web漏洞扫描器扫描效果的好坏与也与URL的完整性有关。
目前URL的获取方式主要有以下两种:1)用户主动提交URL;2)通过爬虫技术爬取出URL。然而,在通过第1)种方式获取URL时,由于在Web站点不断的更新和迭代及人员的更新过程中,很多URL会被遗漏,会导致y用户提供的URL的完整性较差;在通过第2)种方式中的爬虫技术爬取URL时,由于存在很多需要鉴权或者特殊权限的页面无法进行获取,从而也会导致提取得到的URL完整性较差。因此,亟需提供一种能够提高URL提取完整性的方法,以便于Web漏洞扫描器扫描出更多的漏洞、保障Web应用的安全性。
技术解决方案
本申请的主要目的在于提供一种URL提取方法、装置、设备及计算机可读存储介质,旨在解决现有技术中URL遗漏较多、完整性较差的问题。
为实现上述目的,本申请提供一种URL提取方法,所述URL提取方法包括:
获取预设数组中预设模块的上下文指针对象ctx;
遍历所述ctx中的各服务server块,并定位至所述server块中的位置location块,从所述location块中匹配得到根root目录和通用网关接口CGI路径;
检测所述root目录与所述CGI路径的前缀字符串是否相匹配,得到检测结果,并根据所述检测结果和预设规则生成对应的统一资源定位符URL。
在一实施例中,所述根据所述检测结果和预设规则生成对应的统一资源定位符URL的步骤包括:
确定所述root目录与所述CGI路径的前缀字符串相匹配后,根据所述root目录和所述CGI路径得到第一统一资源标识符URI,并获取与所述location块对应的server块的名称server_name信息和端口port信息;
将所述第一URI、所述server_name信息和所述port信息按第一拼接方式进行拼接,生成对应的URL。
在一实施例中,所述根据所述检测结果和预设规则生成对应的统一资源定位符URL的步骤包括:
确定所述root目录与所述CGI路径的前缀字符串不匹配后,根据所述root目录和所述CGI路径获取nginx.conf文件中的index.php入口文件,并检测所述index.php入口文件是否符合预设条件,以检测所述CGI路径对应的Web框架是否为MVC框架;
确定所述CGI路径对应的Web框架为MVC框架,则获取所述Web框架的路由配置文件,并根据所述路由配置文件得到URI,记为第二URI;
获取所述location块的location_value值,并获取与所述location块对应的server块的server_name信息和port信息,将所述location_value值、所述server_name信息、所述port信息和所述第二URI按第二拼接方式进行拼接,生成对应的URL。
在一实施例中,所述获取所述Web框架的路由配置文件,并根据所述路由配置文件得到URI,记为第二URI的步骤包括:
通过预设超文本预处理器php扩展插件的php执行文件调用执行所述index.php入口文件,得到所述index.php入口文件执行的语法树;
根据所述Web框架的类型从所述语法树中获取所述Web框架的路由配置文件;
解析所述路由配置文件得到所述Web框架的路由规则,并根据所述路由规则得到URI,记为第二URI。
在一实施例中,所述从所述location块中匹配得到根root目录和通用网关接口CGI路径的步骤包括:
基于预设匹配规则从所述location块中匹配得到root目录和CGI路径,其中,所述预设匹配规则包括精确匹配、前缀匹配、正则匹配、正常匹配和全匹配中的一种或多种。
此外,为实现上述目的,本申请还提供一种URL提取装置,所述URL提取装置包括:
获取模块,配置为获取预设数组中预设模块的上下文指针对象ctx;
匹配模块,配置为遍历所述ctx中的各服务server块,并定位至所述server块中的位置location块,从所述location块中匹配得到根root目录和通用网关接口CGI路径;
生成模块,配置为检测所述root目录与所述CGI路径的前缀字符串是否相匹配,得到检测结果,并根据所述检测结果和预设规则生成对应的统一资源定位符URL。
在一实施例中,所述生成模块包括:
第一获取单元,配置为确定所述root目录与所述CGI路径的前缀字符串相匹配后,则根据所述root目录和所述CGI路径得到第一统一资源标识符URI,并获取与所述location块对应的server块的名称server_name信息和端口port信息;
第一生成单元,配置为将所述第一URI、所述server_name信息和所述port信息按第一拼接方式进行拼接,生成对应的URL。
在一实施例中,所述生成模块还包括:
检测单元,配置为确定所述root目录与所述CGI路径的前缀字符串不匹配后,则根据所述root目录和所述CGI路径获取nginx.conf文件中的index.php入口文件,并检测所述index.php入口文件是否符合预设条件,以检测所述CGI路径对应的Web框架是否为MVC框架;
第二获取单元,配置为确定所述CGI路径对应的Web框架为MVC框架,获取所述Web框架的路由配置文件,并根据所述路由配置文件得到URI,记为第二URI;
第二生成单元,配置为获取所述location块的location_value值,并获取与所述location块对应的server块的server_name信息和port信息,将所述location_value值、所述server_name信息、所述port信息和所述第二URI按第二拼接方式进行拼接,生成对应的URL。
此外,为实现上述目的,本申请还提供一种URL提取设备,所述URL提取设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的URL提取程序,所述URL提取程序被所述处理器执行时实现如上所述的URL提取方法的步骤。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有URL提取程序,所述URL提取程序被处理器执行时实现如上所述的URL提取方法的步骤。
本申请提供一种URL提取方法、装置、设备及计算机可读存储介质,先获取预设数组中预设模块的上下文指针对象ctx;遍历ctx中的各server块,并定位至server块中的location块,然后从location块中匹配得到root目录和CGI路径;最后检测root目录与CGI路径的前缀字符串是否相匹配,即检测root目录是否为CGI路径的前缀字符串,得到检测结果,进而根据检测结果和预设规则生成对应的URL。由于nginx解析配置的结果是存储在特定数据结构中,因此本申请中通过服务器主机上的nginx.conf配置文件,结合CGI路径反向解析出URL,其提取结果是很精确的,不存在URL遗漏的情况,相比于现有技术中的URL获取方式,本申请可提高URL提取的完整性。
附图说明
图1为本申请实施例方案涉及的硬件运行环境的设备结构示意图;
图2为本申请URL提取方法第一实施例的流程示意图;
图3为本申请URL提取装置第一实施例的功能模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本申请的实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,图1为本申请实施例方案涉及的硬件运行环境的设备结构示意图。
本申请实施例URL提取设备可以是智能手机,也可以是PC(Personal Computer,个人计算机)、平板电脑、便携计算机等终端设备。
如图1所示,该URL提取设备可以包括:处理器1001,例如CPU,通信总线1002,用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如Wi-Fi接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的URL提取设备结构并不构成对URL提取设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及URL提取程序。
在图1所示的终端中,网络接口1004主要用于连接后台服务器,与后台服务器进行数据通信;用户接口1003主要用于连接客户端,与客户端进行数据通信;而处理器1001可以用于调用存储器1005中存储的URL提取程序,并执行以下操作:
获取预设数组中预设模块的上下文指针对象ctx;
遍历所述ctx中的各服务server块,并定位至所述server块中的位置location块,从所述location块中匹配得到根root目录和通用网关接口CGI路径;
检测所述root目录与所述CGI路径的前缀字符串是否相匹配,得到检测结果,并根据所述检测结果和预设规则生成对应的统一资源定位符URL。
进一步地,处理器1001可以调用存储器1005中存储的URL提取程序,还执行以下操作:
确定所述root目录与所述CGI路径的前缀字符串相匹配后,根据所述root目录和所述CGI路径得到第一统一资源标识符URI,并获取与所述location块对应的server块的名称server_name信息和端口port信息;
将所述第一URI、所述server_name信息和所述port信息按第一拼接方式进行拼接,生成对应的URL。
进一步地,处理器1001可以调用存储器1005中存储的URL提取程序,还执行以下操作:
确定所述root目录与所述CGI路径的前缀字符串不匹配后,根据所述root目录和所述CGI路径获取nginx.conf文件中的index.php入口文件,并检测所述index.php入口文件是否符合预设条件,以检测所述CGI路径对应的Web框架是否为MVC框架;
确定所述CGI路径对应的Web框架为MVC框架,获取所述Web框架的路由配置文件,并根据所述路由配置文件得到URI,记为第二URI;
获取所述location块的location_value值,并获取与所述location块对应的server块的server_name信息和port信息,将所述location_value值、所述server_name信息、所述port信息和所述第二URI按第二拼接方式进行拼接,生成对应的URL。
进一步地,处理器1001可以调用存储器1005中存储的URL提取程序,还执行以下操作:
通过预设超文本预处理器php扩展插件的php执行文件调用执行所述index.php入口文件,得到所述index.php入口文件执行的语法树;
根据所述Web框架的类型从所述语法树中获取所述Web框架的路由配置文件;
解析所述路由配置文件得到所述Web框架的路由规则,并根据所述路由规则得到URI,记为第二URI。
进一步地,处理器1001可以调用存储器1005中存储的URL提取程序,还执行以下操作:
基于预设匹配规则从所述location块中匹配得到root目录和CGI路径,其中,所述预设匹配规则包括精确匹配、前缀匹配、正则匹配、正常匹配和全匹配中的一种或多种。
基于上述硬件结构,提出本申请URL提取方法的各实施例。
本申请提供一种URL提取方法。
参照图2,图2为本申请URL提取方法第一实施例的流程示意图。
在本实施例中,该URL提取方法包括:
步骤S10,获取预设数组中预设模块的上下文指针对象ctx;
本实施例的URL提取方法是由URL提取设备实现的,该设备以服务器为例进行说明。其中,该服务器上部署有Web服务和nginx服务,同时,还部署有改造后的nginx可执行文件,启动改造后的nginx,可反向解析nginx.conf配置文件,执行下述URL提取方法。其中,改造后的nginx是基于现有的nginx进行改造,主要是通过添加代码的形式,在现有的nginx上实现按server name:port/uri或者server name:port/trv file value/index.php组成的URL写到服务器主机上存储的功能,以得到改造后的nginx。通过了解nginx正向解析URI流程,即如何从URI映射到主机的CGI文件,发现nginx解析配置的结果是存储在特定数据结构中,理论上我们遍历该数据结构是可以得到完整的配置,并且是非常精确的,然后根据CGI路径在配置结果中反向查找location,映射到URI,进而生成对应的URL。也就是说,通过服务器主机上的nginx.conf配置文件,结合主机上的CGI路径反向解析出URL,其提取结果是很精确的,不存在URL遗漏的情况,相比于现有技术中的URL获取方式,本申请可提高URL提取的完整性。
具体的,先获取预设数组中预设模块的上下文指针对象ctx。其中,预设数组可以为cycle->conf_ctx,cycle->conf_ctx是一个数组,数组的每一个元素对应于一个模块的配置上下文,这里的模块是指nginx.conf配置文件的server块和http块;预设模块可以为ngx_http_core_module模块,nginx中http协议的实现,http框架的大部分核心指令,http请求整个生命周期的处理流程(也可以说是http请求的处理框架)都在该模块中实现;ctx指向模块上下文结构体,且不同类型的模块通常指向不同类型的结构体,该结构体中通常会包含若干函数指针。
步骤S20,遍历所述ctx中的各服务server块,并定位至所述server块中的位置location块,从所述location块中匹配得到根root目录和通用网关接口CGI路径;
然后,遍历ctx中的各server(服务)块,并定位至server块中的location(位置)块,然后从location块中匹配得到root(根)目录和CGI(Common Gateway Interface,通用网关接口)路径。
其中,步骤“从所述location块中匹配得到根root目录和通用网关接口CGI路径”包括:基于预设匹配规则从所述location块中匹配得到root目录和CGI路径,其中,所述预设匹配规则包括精确匹配、前缀匹配、正则匹配、正常匹配和全匹配中的一种或多种。
对于root目录和CGI路径匹配,可通过精确匹配、前缀匹配、正则匹配、正常匹配和全匹配中的其中任一种进行匹配。其中,精确匹配,匹配条件是在搜索关键词与推广关键词二者字面完全一致时才触发的限定条件,用于精确严格的匹配限制,可以采用=指令进行精确匹配,不能使用正则,区分大小写;前缀匹配,采用^~指令进行前缀匹配,和=精确匹配一样,也是用于字符确定的匹配,不能使用正则且区分大小写。正则匹配,所使用的指令是~和~*,前者表示使用正则,区分大小写,后者表示使用正则,不区分大小写。正常匹配,指令为空,即没有指定匹配指令的即为正常匹配。全匹配与正常匹配一样,没有匹配指令。具体的匹配方式与现有技术一样,此处不作具体赘述。需要说明的是,若采用多种匹配方式进行匹配时,可遵循一个大原则和两个小细节。大原则是关于匹配模式的优先级:精确匹配>前缀匹配>正则匹配>正常匹配>全匹配。小细节则是同一优先级中:细节一,正则匹配成功之后停止匹配,非正则匹配成功还会接着匹配;细节二,在所有匹配成功的root目录和CGI路径中,选取匹配度最大的。
步骤S30,检测所述root目录与所述CGI路径的前缀字符串是否相匹配,得到检测结果,并根据所述检测结果和预设规则生成对应的统一资源定位符URL。
然后,检测root目录与CGI路径的前缀字符串是否相匹配,即检测root目录是否为CGI路径的前缀字符串,得到检测结果,进而根据检测结果和预设规则生成对应的统一资源定位符URL。具体的,若root目录与CGI路径的前缀字符串相匹配,则根据root目录和CGI路径得到第一URI(Uniform Resource Identifier,统一资源标识符),并获取与location块对应的server块的server_name(名称)信息和port(端口)信息,然后将第一URI、server_name信息和port信息按第一拼接方式进行拼接,生成对应的URL(Uniform Resource Locator,统一资源定位符)。若检测到root目录与CGI路径的前缀字符串不匹配,则根据root目录和CGI路径获取nginx.conf文件中的index.php入口文件,并检测index.php入口文件是否符合预设条件,以检测CGI路径对应的Web框架是否为MVC框架;若index.php入口文件符合预设条件,则说明CGI路径对应的Web框架为MVC框架,此时,获取Web框架的路由配置文件,并根据路由配置文件得到URI,记为第二URI,进而获取location块的location_value值,并获取与location块对应的server块的server_name信息和port信息,将location_value值、server_name信息、port信息和第二URI按第二拼接方式进行拼接,生成对应的URL。具体的执行过程可参照下述实施例,此处不作赘述。
本申请实施例提供一种URL提取方法,先获取预设数组中预设模块的上下文指针对象ctx;遍历ctx中的各server块,并定位至server块中的location块,然后从location块中匹配得到root目录和CGI路径;最后检测root目录与CGI路径的前缀字符串是否相匹配,即检测root目录是否为CGI路径的前缀字符串,得到检测结果,进而根据检测结果和预设规则生成对应的URL。由于nginx解析配置的结果是存储在特定数据结构中,因此本实施例中通过服务器主机上的nginx.conf配置文件,结合CGI路径反向解析出URL,其提取结果是很精确的,不存在URL遗漏的情况,相比于现有技术中的URL获取方式,本申请实施例可提高URL提取的完整性。
作为URL的其中一种生成方式,步骤S30可以包括:
步骤a1,确定所述root目录与所述CGI路径的前缀字符串相匹配,则根据所述root目录和所述CGI路径得到第一统一资源标识符URI,并获取与所述location块对应的server块的名称server_name信息和端口port信息;
确定root目录与CGI路径的前缀字符串相匹配后,即root目录为CGI路径的前缀字符串时,则说明该CGI路径对应该location块的URI,此时,则根据root目录和CGI路径得到第一URI(Uniform Resource Identifier,统一资源标识符),具体的,CGI路径去掉前缀字符串(即root目录)即可得到第一URI,然后获取与该location块对应的server块的server_name(名称)信息和port(端口)信息,以根据server_name信息和port信息形成域名。
步骤a2,将所述第一URI、所述server_name信息和所述port信息按第一拼接方式进行拼接,生成对应的URL。
然后,将第一URI、server_name信息和port信息按第一拼接方式进行拼接,生成对应的URL。具体的第一拼接方式为:http://server_name:port/第一URI。
需要说明的是,上述第一种URL的生成方式适用于反向提取并生成非MVC(Model View Controller,模型-视图-控制器)框架的URL,即root目录与CGI路径的前缀字符串相匹配的情景。
作为URL的另一种生成方式,步骤S30还可以包括:
步骤a3,确定所述root目录与所述CGI路径的前缀字符串不匹配后,则根据所述root目录和所述CGI路径获取nginx.conf文件中的index.php入口文件,并检测所述index.php入口文件是否符合预设条件,以检测所述CGI路径对应的Web框架是否为MVC框架;
确定所述root目录与CGI路径的前缀字符串不匹配后,即root目录不为CGI路径的前缀字符串时,则需采用下述方式提取生成URL。具体的,先根据root目录和CGI路径获取nginx.conf文件中的index.php入口文件,即根据root目录和CGI路径确定index.php入口文件在服务器上的位置,进而获取到该index.php入口文件。在获取到index.php入口文件之后,检测index.php入口文件是否符合预设条件,以检测CGI路径对应的Web框架是否为MVC框架。
其中,该预设条件可以根据各MVC框架的特征进行设定,由于MVC框架包括多种,该预设条件对应可包括一个或多个,检测到index.php入口文件是否符合预设条件中的任一个,都可以判定CGI路径对应的Web框架为MVC框架。例如,对于CodeIgniter框架(一个简单快速的PHP MVC框架),可设定一预设条件为index.php入口文件中存在core/CodeIgniter.php,若通过正向匹配检测到index.php入口文件中存在core/CodeIgniter.php,则说明该CGI路径对应的Web框架为MVC框架。
步骤a4,确定所述CGI路径对应的Web框架为MVC框架,则获取所述Web框架的路由配置文件,并根据所述路由配置文件得到URI,记为第二URI;
确定所述CGI路径对应的Web框架为MVC框架,则获取Web框架的路由配置文件,并根据路由配置文件得到URI,记为第二URI。具体的,步骤a4包括:
步骤a41,通过预设超文本预处理器php扩展插件的php执行文件调用执行所述index.php入口文件,得到所述index.php入口文件执行的语法树;
确定判定Web框架为MVC框架后,可先通过预设php(PHP: Hypertext Preprocessor,超文本预处理器)扩展插件的php执行文件调用执行该index.php入口文件,得到index.php入口文件执行的语法树。其中,预设php扩展插件是在现有的php语法解析树扩展插件源码的基础上进行改造得到的,通过该预设php扩展插件可将php代码解析为抽象语法树(Abstract Syntax Tree,AST),其中,该抽象语法树中定义了代码的结构,通过操纵这颗语法树,我们可以精准的定位到声明语句、赋值语句、运算语句等等,实现对代码的分析、优化、变更等操作。
步骤a42,根据所述Web框架的类型从所述语法树中获取所述Web框架的路由配置文件;
然后,根据Web框架的类型从语法树中获取Web框架的路由配置文件。其中,Web框架的类型可以根据index.php入口文件所符合的预设条件确定得到,例如,上述例中,若检测到index.php入口文件中存在core/CodeIgniter.php,则可确定Web框架的类型为CodeIgniter框架。此外,由于每种框架的路由配置文件,都有固定的特征和路径,例如CodeIgniter框架,可通过正则匹配对应的特征config/routes.php,进而可以从语法树中定位获取到Web框架的路由配置文件的信息,如:“/data/htdocs/a_ci_application_4De/config/routes.php”,即可得到routes.php文件所在的路径,进而获取到Web框架的路由配置文件。
步骤a43,解析所述路由配置文件得到所述Web框架的路由规则,并根据所述路由规则得到URI,记为第二URI。
然后,解析路由配置文件得到Web框架的路由规则,并根据所述路由规则得到URI,记为第二URI。例如,上述例中,若读取到routes.php文件内容如下:
$route[‘default_controller’]=“welcome”;
$route[‘404_override’]= ‘ ’;
$route[‘admin/detail_(:num)’]=‘admin/detail?user_id=$1’;
$route[‘admin/(:num)’]=‘admin/detail/$1’;
可解析得到URI为welcome、detail?.htm?user_id=1、admin/detail/1。
步骤a5,获取所述location块的location_value值,并获取与所述location块对应的server块的server_name信息和port信息,将所述location_value值、所述server_name信息、所述port信息和所述第二URI按第二拼接方式进行拼接,生成对应的URL。
在获取到第二URI之后,获取location块的location_value值,并获取与该location块对应的server块的server_name信息和port信息,将location_value值、server_name信息、port信息和第二URI按第二拼接方式进行拼接,生成对应的URL。具体的第二拼接方式为:http://server_name:port/location_value/index.php/第二URI。
本申请还提供一种URL提取装置。
参照图3,图3为本申请URL提取装置第一实施例的功能模块示意图。
如图3所示,所述URL提取装置包括:
获取模块10,配置为获取预设数组中预设模块的上下文指针对象ctx;
匹配模块20,配置为遍历所述ctx中的各服务server块,并定位至所述server块中的位置location块,从所述location块中匹配得到根root目录和通用网关接口CGI路径;
生成模块30,配置为检测所述root目录与所述CGI路径的前缀字符串是否相匹配,得到检测结果,并根据所述检测结果和预设规则生成对应的统一资源定位符URL。
进一步地,所述生成模块30包括:
第一获取单元,配置为确定所述root目录与所述CGI路径的前缀字符串相匹配,则根据所述root目录和所述CGI路径得到第一统一资源标识符URI,并获取与所述location块对应的server块的名称server_name信息和端口port信息;
第一生成单元,配置为将所述第一URI、所述server_name信息和所述port信息按第一拼接方式进行拼接,生成对应的URL。
进一步地,所述生成模块30还包括:
检测单元,配置为确定所述root目录与所述CGI路径的前缀字符串不匹配,则根据所述root目录和所述CGI路径获取nginx.conf文件中的index.php入口文件,并检测所述index.php入口文件是否符合预设条件,以检测所述CGI路径对应的Web框架是否为MVC框架;
第二获取单元,配置为确定所述CGI路径对应的Web框架为MVC框架后,则获取所述Web框架的路由配置文件,并根据所述路由配置文件得到URI,记为第二URI;
第二生成单元,配置为获取所述location块的location_value值,并获取与所述location块对应的server块的server_name信息和port信息,将所述location_value值、所述server_name信息、所述port信息和所述第二URI按第二拼接方式进行拼接,生成对应的URL。
进一步地,所述第二获取单元包括:
执行子单元,配置为通过预设超文本预处理器php扩展插件的php执行文件调用执行所述index.php入口文件,得到所述index.php入口文件执行的语法树;
获取子单元,配置为根据所述Web框架的类型从所述语法树中获取所述Web框架的路由配置文件;
解析子单元,配置为解析所述路由配置文件得到所述Web框架的路由规则,并根据所述路由规则得到URI,记为第二URI。
进一步的,所述匹配模块20具体配置为:基于预设匹配规则从所述location块中匹配得到root目录和CGI路径,其中,所述预设匹配规则包括精确匹配、前缀匹配、正则匹配、正常匹配和全匹配中的一种或多种。
其中,上述URL提取装置中各个模块的功能实现与上述URL提取方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。
本申请还提供一种计算机可读存储介质,该计算机可读存储介质上存储有URL提取程序,所述URL提取程序被处理器执行时实现如以上任一项实施例所述的URL提取方法的步骤。
本申请计算机可读存储介质的具体实施例与上述URL提取方法各实施例基本相同,在此不作赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (10)

  1. 一种URL提取方法,所述URL提取方法包括:
    获取预设数组中预设模块的上下文指针对象ctx;
    遍历所述ctx中的各服务server块,并定位至所述server块中的位置location块,从所述location块中匹配得到根root目录和通用网关接口CGI路径;
    检测所述root目录与所述CGI路径的前缀字符串是否相匹配,得到检测结果,并根据所述检测结果和预设规则生成对应的统一资源定位符URL。
  2. 如权利要求1所述的URL提取方法,其中,所述根据所述检测结果和预设规则生成对应的统一资源定位符URL的步骤包括:
    确定所述root目录与所述CGI路径的前缀字符串相匹配后,根据所述root目录和所述CGI路径得到第一统一资源标识符URI,并获取与所述location块对应的server块的名称server_name信息和端口port信息;
    将所述第一URI、所述server_name信息和所述port信息按第一拼接方式进行拼接,生成对应的URL。
  3. 如权利要求1所述的URL提取方法,其中,所述根据所述检测结果和预设规则生成对应的统一资源定位符URL的步骤包括:
    确定所述root目录与所述CGI路径的前缀字符串不匹配后,根据所述root目录和所述CGI路径获取nginx.conf文件中的index.php入口文件,并检测所述index.php入口文件是否符合预设条件,以检测所述CGI路径对应的Web框架是否为MVC框架;
    确定所述CGI路径对应的Web框架为MVC框架,则获取所述Web框架的路由配置文件,并根据所述路由配置文件得到URI,记为第二URI;
    获取所述location块的location_value值,并获取与所述location块对应的server块的server_name信息和port信息,将所述location_value值、所述server_name信息、所述port信息和所述第二URI按第二拼接方式进行拼接,生成对应的URL。
  4. 如权利要求3所述的URL提取方法,其中,所述获取所述Web框架的路由配置文件,并根据所述路由配置文件得到URI,记为第二URI的步骤包括:
    通过预设超文本预处理器php扩展插件的php执行文件调用执行所述index.php入口文件,得到所述index.php入口文件执行的语法树;
    根据所述Web框架的类型从所述语法树中获取所述Web框架的路由配置文件;
    解析所述路由配置文件得到所述Web框架的路由规则,并根据所述路由规则得到URI,记为第二URI。
  5. 如权利要求1至4中任一项所述的URL提取方法,其中,所述从所述location块中匹配得到根root目录和通用网关接口CGI路径的步骤包括:
    基于预设匹配规则从所述location块中匹配得到root目录和CGI路径,其中,所述预设匹配规则包括精确匹配、前缀匹配、正则匹配、正常匹配和全匹配中的一种或多种。
  6. 一种URL提取装置,其中,所述URL提取装置包括:
    获取模块,配置为获取预设数组中预设模块的上下文指针对象ctx;
    匹配模块,配置为遍历所述ctx中的各服务server块,并定位至所述server块中的位置location块,从所述location块中匹配得到根root目录和通用网关接口CGI路径;
    生成模块,配置为检测所述root目录与所述CGI路径的前缀字符串是否相匹配,得到检测结果,并根据所述检测结果和预设规则生成对应的统一资源定位符URL。
  7. 如权利要求6所述的URL提取装置,其中,所述生成模块包括:
    第一获取单元,配置为确定所述root目录与所述CGI路径的前缀字符串相匹配后,根据所述root目录和所述CGI路径得到第一统一资源标识符URI,并获取与所述location块对应的server块的名称server_name信息和端口port信息;
    第一生成单元,配置为将所述第一URI、所述server_name信息和所述port信息按第一拼接方式进行拼接,生成对应的URL。
  8. 如权利要求6所述的URL提取装置,其中,所述生成模块还包括:
    检测单元,配置为确定所述root目录与所述CGI路径的前缀字符串不匹配后,根据所述root目录和所述CGI路径获取nginx.conf文件中的index.php入口文件,并检测所述index.php入口文件是否符合预设条件,以检测所述CGI路径对应的Web框架是否为MVC框架;
    第二获取单元,配置为确定所述CGI路径对应的Web框架为MVC框架,获取所述Web框架的路由配置文件,并根据所述路由配置文件得到URI,记为第二URI;
    第二生成单元,配置为获取所述location块的location_value值,并获取与所述location块对应的server块的server_name信息和port信息,将所述location_value值、所述server_name信息、所述port信息和所述第二URI按第二拼接方式进行拼接,生成对应的URL。
  9. 一种URL提取设备,其中,所述URL提取设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的URL提取程序,所述URL提取程序被所述处理器执行时实现如权利要求1至5中任一项所述的URL提取方法的步骤。
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有URL提取程序,所述URL提取程序被处理器执行时实现如权利要求1至5中任一项所述的URL提取方法的步骤。
PCT/CN2020/108187 2019-08-20 2020-08-10 Url提取方法、装置、设备及计算机可读存储介质 WO2021031902A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910776693.6 2019-08-20
CN201910776693.6A CN110472165B (zh) 2019-08-20 2019-08-20 Url提取方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021031902A1 true WO2021031902A1 (zh) 2021-02-25

Family

ID=68512733

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/108187 WO2021031902A1 (zh) 2019-08-20 2020-08-10 Url提取方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN110472165B (zh)
WO (1) WO2021031902A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472165B (zh) * 2019-08-20 2024-01-16 深圳前海微众银行股份有限公司 Url提取方法、装置、设备及计算机可读存储介质
CN111078140B (zh) * 2019-11-20 2023-05-23 岭澳核电有限公司 核电站文件上传管理方法、装置、终端设备及介质
CN112632423B (zh) * 2021-03-10 2021-06-29 北京邮电大学 Url提取方法及装置
CN115499274B (zh) * 2022-09-30 2024-03-22 中国银行股份有限公司 一种拼接参数网关路由方法和系统、电子设备、存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7512665B1 (en) * 2000-08-17 2009-03-31 International Business Machines Corporation Chained uniform resource locators
CN107040504A (zh) * 2016-02-04 2017-08-11 北京京东尚科信息技术有限公司 测试方法和装置
CN108809890A (zh) * 2017-04-26 2018-11-13 腾讯科技(深圳)有限公司 漏洞检测方法、测试服务器及客户端
CN109710861A (zh) * 2018-12-26 2019-05-03 贵阳朗玛信息技术股份有限公司 一种生成url的系统及方法
CN110472165A (zh) * 2019-08-20 2019-11-19 深圳前海微众银行股份有限公司 Url提取方法、装置、设备及计算机可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6411952B1 (en) * 1998-06-24 2002-06-25 Compaq Information Technologies Group, Lp Method for learning character patterns to interactively control the scope of a web crawler
US6519626B1 (en) * 1999-07-26 2003-02-11 Microsoft Corporation System and method for converting a file system path into a uniform resource locator
US20030208472A1 (en) * 2000-04-11 2003-11-06 Pham Peter Manh Method and apparatus for transparent keyword-based hyperlink
KR101739415B1 (ko) * 2015-10-28 2017-05-24 주식회사 엘지유플러스 인터넷을 통한 정보의 접속 제어 장치 및 그 방법
CN106815248B (zh) * 2015-11-30 2020-07-03 北京国双科技有限公司 网站分析方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7512665B1 (en) * 2000-08-17 2009-03-31 International Business Machines Corporation Chained uniform resource locators
CN107040504A (zh) * 2016-02-04 2017-08-11 北京京东尚科信息技术有限公司 测试方法和装置
CN108809890A (zh) * 2017-04-26 2018-11-13 腾讯科技(深圳)有限公司 漏洞检测方法、测试服务器及客户端
CN109710861A (zh) * 2018-12-26 2019-05-03 贵阳朗玛信息技术股份有限公司 一种生成url的系统及方法
CN110472165A (zh) * 2019-08-20 2019-11-19 深圳前海微众银行股份有限公司 Url提取方法、装置、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN110472165B (zh) 2024-01-16
CN110472165A (zh) 2019-11-19

Similar Documents

Publication Publication Date Title
WO2021031902A1 (zh) Url提取方法、装置、设备及计算机可读存储介质
US11038917B2 (en) System and methods for building statistical models of malicious elements of web pages
CN109768992B (zh) 网页恶意扫描处理方法及装置、终端设备、可读存储介质
US9081961B2 (en) System and method for analyzing malicious code using a static analyzer
CN103744802B (zh) Sql注入攻击的识别方法及装置
WO2021017735A1 (zh) 一种智能合约的形式化验证方法、电子装置及存储介质
US10491618B2 (en) Method and apparatus for website scanning
CN109948334B (zh) 一种漏洞检测方法、系统及电子设备和存储介质
KR102090982B1 (ko) 악의 웹 사이트 식별 방법, 장치 및 컴퓨터 기억매체
JP6450022B2 (ja) 解析装置、解析方法、および、解析プログラム
JP5752642B2 (ja) 監視装置および監視方法
CN110830416A (zh) 网络入侵检测方法和装置
CN114626061A (zh) 网页木马检测的方法、装置、电子设备及介质
WO2020073493A1 (zh) Sql注入漏洞检测方法、装置、设备及可读存储介质
US20140283080A1 (en) Identifying stored vulnerabilities in a web service
CN109246069B (zh) 网页登录方法、装置和可读存储介质
US11163876B2 (en) Guided security analysis
US10515219B2 (en) Determining terms for security test
RU2697960C1 (ru) Способ определения неизвестных атрибутов фрагментов веб-данных при запуске веб-страницы в браузере
KR102497201B1 (ko) Sql 주입 취약점 진단 방법, 장치 및 컴퓨터 프로그램
KR102622018B1 (ko) 보안데이터 처리장치, 보안데이터 처리방법 및 보안데이터를 처리하는 컴퓨터로 실행 가능한 프로그램을 저장하는 저장매체
KR101394420B1 (ko) 멀티 프레임워크 지원 방법과, 그를 위한 장치 및 컴퓨터로 읽을 수 있는 기록매체
CN114499968A (zh) 一种xss攻击检测方法及装置
CN115514539A (zh) 一种网络攻击的防护方法及装置、存储介质及电子设备
CN117371041A (zh) 一种查询检测方法和相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20854059

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20854059

Country of ref document: EP

Kind code of ref document: A1