CN111723400A

CN111723400A - JS sensitive information leakage detection method, device, equipment and medium

Info

Publication number: CN111723400A
Application number: CN202010548330.XA
Authority: CN
Inventors: 廖喜君; 范渊; 黄进
Original assignee: Hangzhou Dbappsecurity Technology Co Ltd
Current assignee: DBAPPSecurity Co Ltd; Hangzhou Dbappsecurity Technology Co Ltd
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2020-09-29

Abstract

The application discloses a JS sensitive information leakage detection method, a JS sensitive information leakage detection device, JS sensitive information leakage detection equipment and a JS sensitive information leakage detection medium, which comprise the following steps: acquiring a URL to be detected of a target website; the URL to be detected is accessed based on a crawler technology, and a corresponding first JS file is obtained; performing JS file scanning on the target website by using a file dictionary to obtain a corresponding second JS file; matching the file contents of the first JS file and the second JS file by using a preset rule base to obtain corresponding first sensitive information; if the first sensitive information is a sensitive URL, sending a request to the sensitive URL to obtain corresponding response data; and matching the response data by using the preset rule base to obtain corresponding second sensitive information. Therefore, the detection efficiency of JS sensitive information leakage and the detection comprehensiveness can be improved.

Description

JS sensitive information leakage detection method, device, equipment and medium

Technical Field

The application relates to the technical field of network security, in particular to a JS sensitive information leakage detection method, device, equipment and medium.

Background

JavaScript, as a fairly simple but powerful client-side scripting language, is inherently an interpreted language. Therefore, the execution principle is to operate while interpreting. The above characteristics determine that the JavaScript is different from some server scripting languages (such as ASP, PHP) and compiled languages (such as C, C + +), and the source code thereof can be easily obtained by anyone. In the process of website development, developers often write some sensitive information such as account number and password, cookie, api key and the like into a JS (JavaScript) file for debugging, if the sensitive information is not cleaned in time when a program is on line, and the JS file is not confused for encryption, due to the characteristics of JS, an attacker can easily make a summary of the information, so that different threats are caused to WEB services and user privacy.

At present, the conventional detection method for JS sensitive information leakage generally includes that after a package grabbing tool is used to browse a test site and a JS file is acquired, whether keywords related to sensitive leakage are contained in JS file content or whether information conforming to a format exists is searched in a regular manner. However, the detection efficiency is low, and the detection content is not comprehensive, so that the report is missed.

Disclosure of Invention

In view of this, an object of the present application is to provide a JS sensitive information leakage detection method, apparatus, device, and medium, which can improve detection efficiency of JS sensitive information leakage and detection comprehensiveness. The specific scheme is as follows:

in a first aspect, the application discloses a JS sensitive information leakage detection method, which includes:

acquiring a URL to be detected of a target website;

the URL to be detected is accessed based on a crawler technology, and a corresponding first JS file is obtained;

performing JS file scanning on the target website by using a file dictionary to obtain a corresponding second JS file;

matching the file contents of the first JS file and the second JS file by using a preset rule base to obtain corresponding first sensitive information;

if the first sensitive information is a sensitive URL, sending a request to the sensitive URL to obtain corresponding response data;

and matching the response data by using the preset rule base to obtain corresponding second sensitive information.

Optionally, based on crawler technology visit wait to detect the URL, obtain corresponding first JS file, include:

and crawling a website page corresponding to the URL to be detected in an asynchronous crawler mode to obtain the corresponding first JS file.

Optionally, the JS sensitive information leakage detection method further includes:

and when all the JS files of the website page corresponding to the URL to be detected which are crawled have the same file in the already-crawled JS files, stopping the crawler.

Optionally, the rule base is preset by the utilization, and before the first JS file and the second JS file are matched, the method further includes:

and filtering the first JS file and the second JS file by using the names of the JS files.

Optionally, the file dictionary that is utilized is right the JS file scanning is carried out on the target website, and before obtaining the corresponding second JS file, the method further includes:

and removing the JS file name corresponding to the first JS file in the file dictionary.

Optionally, it is right to utilize the rule base of predetermineeing the first JS file with the second JS file matches, include:

matching the first JS file with the second JS file by using a regular expression in the preset rule base;

optionally, the matching the response data by using the preset rule base includes:

and matching the response data by using the regular expression in the preset rule base.

classifying the first sensitive information and the second sensitive information by utilizing the preset rule base;

generating a corresponding detection report; the detection report comprises the JS file matched with the preset rule base, the first sensitive information, the second sensitive information and the sensitive information type.

In a second aspect, the application discloses a JS sensitive information leakage detection device, including:

the website URL acquisition module is used for acquiring a URL to be detected of a target website;

the JS file crawling module is used for visiting the URL to be detected based on a crawler technology to obtain a corresponding first JS file;

the file dictionary scanning module is used for scanning the JS files of the target website by using the file dictionary to obtain corresponding second JS files;

the JS file matching module is used for matching the file contents of the first JS file and the second JS file by using a preset rule base to obtain corresponding first sensitive information;

the response data acquisition module is used for sending a request to the sensitive URL to obtain corresponding response data if the first sensitive information is the sensitive URL;

and the response data matching module is used for matching the response data by utilizing the preset rule base to obtain corresponding second sensitive information.

In a third aspect, the application discloses a JS sensitive information leakage detection device, which comprises a processor and a memory; wherein the content of the first and second substances,

the memory is used for storing a computer program;

the processor is used for executing the computer program to realize the JS-sensitive information leakage detection method.

In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned JS-sensitive information leakage detection method.

It is thus clear that this application acquires the URL that waits of target website earlier, then visits based on crawler technology the URL that waits obtains corresponding first JS file to and it is right to utilize the file dictionary the target website carries out JS file scanning, obtains corresponding second JS file, later utilizes and predetermines the rule base right first JS file with the file content of second JS file matches, obtains corresponding first sensitive information, if first sensitive information is sensitive URL, then right sensitive URL sends the request to obtain corresponding response data, utilizes at last predetermine the rule base right response data matches, obtains corresponding second sensitive information. Therefore, a comprehensive JS file is obtained based on crawler technology and file dictionary scanning, after the obtained JS file content is matched, a request response is sent to the matched sensitive URL, then response data is matched, and the detection efficiency of JS sensitive information leakage and the detection comprehensiveness can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a JS sensitive information leakage detection method disclosed in the present application;

fig. 2 is a flowchart of a specific JS sensitive information leakage detection method disclosed in the present application;

fig. 3 is a flowchart of a specific JS sensitive information leakage detection method disclosed in the present application;

fig. 4 is a schematic structural view of a JS sensitive information leakage detection device disclosed in the present application;

fig. 5 is a structural diagram of a JS-sensitive information leakage detection apparatus disclosed in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, a traditional detection method for JS sensitive information leakage generally includes that after a package grabbing tool is used for browsing a test site and a JS file is acquired, whether a keyword related to sensitive leakage is contained in JS file content is searched for. However, the detection efficiency is low, and the detection content is not comprehensive, so that the report is missed. Therefore, the JS sensitive information leakage detection scheme is provided, and the detection efficiency and the detection comprehensiveness of the JS sensitive information leakage can be improved.

Referring to fig. 1, an embodiment of the application discloses a JS sensitive information leakage detection method, including:

step S11: and acquiring a URL (uniform resource locator) to be detected of the target website.

In a specific implementation manner, the obtained URLs to be detected may be mass-imported URLs or URLs which are input one by one.

That is, in this embodiment, the URL to be detected may be imported in batch by the terminal device, or the URL to be detected may be entered singly, and the corresponding detection task is issued after the URL is imported.

Step S12: and accessing the URL to be detected based on a crawler technology to obtain a corresponding first JS file.

In a specific implementation manner, in this embodiment, the website page corresponding to the URL to be detected can be crawled in an asynchronous crawler manner, so as to obtain the corresponding first JS file.

And when all the crawled JS files of the website page corresponding to the URL to be detected have the same file in the crawled JS files, stopping the crawler.

It should be noted that, this embodiment first uses the mode of crawler to obtain JS connection file, visits each page through the crawler, obtains the JS file therefrom, and in order to improve the speed of crawler, this embodiment uses the mode of asynchronous crawler. Because the JS files introduced in a plurality of places in the website are all the same, in order to improve the efficiency, when a certain page is crawled, the obtained JS files are all obtained before, and the crawler is immediately stopped.

Step S13: and scanning the JS files of the target website by using the file dictionary to obtain a corresponding second JS file.

In a specific embodiment, since the JS files generally all exist in one web directory folder, the embodiment may scan the directory folder of the target website by using the file dictionary to find the corresponding JS folder, and then perform file dictionary scan on the directory content in the JS folder.

In addition, in order to improve the detection speed, the JS file name corresponding to the first JS file can be removed from the file dictionary before the JS file is scanned.

It should be noted that, because some JS files do not appear in the response of the crawler, in order to acquire as many JS files to be detected as possible, in this embodiment, a dictionary containing a common JS file name may be used to perform fuzz on the JS directory of the website, and if the http status code 200 is returned, it indicates that the corresponding JS file exists. In order to improve efficiency, the JS file names acquired by the crawler can be excluded from the dictionary firstly, so that part of package sending requests can be reduced, then the website path of the JS acquired by the crawler is spliced with the JS file names in the dictionary, and then fuzz is carried out. In this way, invalid requests with a status code of 404 are reduced.

Step S14: and matching the file contents of the first JS file and the second JS file by using a preset rule base to obtain corresponding first sensitive information.

In a specific implementation manner, the embodiment may utilize the regular expression in the preset rule base to match the first JS file with the second JS file.

In addition, in this embodiment, the file contents of the first JS file and the second JS file may be obtained first, for the obtained JS files, the content of each JS file may be obtained by sending an http GET request, the body of response is the content of the JS file, and in order to prevent code confusion, the embodiment analyzes the corresponding JS content according to the code corresponding to the website.

Further, the present embodiment may classify the first sensitive information and the second sensitive information by using the preset rule base.

Specifically, in this embodiment, an http request is sent to the URL of the JS file to be tested, and corresponding response content is acquired, and then rule base matching and classification are performed on the response content.

The preset rule base comprises regular expressions of common types of sensitive information and the types of the sensitive information, such as url, mailbox, token or password leakage, file path, intranet ip, cloud leakage, mobile phone number, domain name, identity card number, user name password and the like.

That is, sensitive information such as a mailbox, a telephone number, an identity card number, an intranet IP, a user name password, and the like can be collected in advance, and the sensitive information is represented by a regular expression, so that a rule base is formed. And then, rapidly matching the obtained JS file content by using a regular expression in a rule base.

For example: matching a mobile phone number in the content of the JS file, extracting information satisfying a regular expression (.

Step S15: and if the first sensitive information is a sensitive URL, sending a request to the sensitive URL to obtain corresponding response data.

Step S16: and matching the response data by using the preset rule base to obtain corresponding second sensitive information.

That is, if the URL is matched, the request packet is sent to the URL, and then the rule matching is performed on the response packet. And if the matching is successful, extracting the matching information, and classifying according to the rule type.

In a specific implementation manner, the response data is matched by using a regular expression in the preset rule base.

For example: and if the content of the JS file is matched with url http:// www.xxxx.com/api. php, the api is packaged, and matching is carried out according to the response content. For example, matching to intranet IP address 192.168.1.1, and marking the result as intranet IP according to the rule type.

It should be noted that some JS files that exist in the WEB server and are not used any more due to version update or JS files that are backed up do not appear in the url list of the bale plucking tool, and if sensitive information is contained in the JS files, the JS files cannot be detected. In addition, many times, sensitive information often appears in the request response of AJAX (Asynchronous JavaScript And XML) in the JS file, And if the url is not requested, the sensitive information leakage problem of the part is missed.

It can be seen that, this application embodiment acquires the URL that waits of target website earlier, then visits based on crawler technology the URL that waits obtains corresponding first JS file to and utilize the file dictionary right the target website carries out JS file scanning, obtains corresponding second JS file, later utilize and predetermine the rule base right first JS file with the file content of second JS file matches, obtains corresponding first sensitive information, if first sensitive information is sensitive URL, then right sensitive URL sends the request to obtain corresponding response data, utilizes at last predetermine the rule base right response data matches, obtains corresponding second sensitive information. Therefore, a comprehensive JS file is obtained based on crawler technology and file dictionary scanning, after the obtained JS file content is matched, a request response is sent to the matched sensitive URL, then response data is matched, and the detection efficiency of JS sensitive information leakage and the detection comprehensiveness can be improved.

Referring to fig. 2, the embodiment of the application discloses a specific JS sensitive information leakage detection method, which includes:

step S21: and acquiring the URL to be detected of the target website.

Step S22: and accessing the URL to be detected based on a crawler technology to obtain a corresponding first JS file.

Step S23: and scanning the JS files of the target website by using the file dictionary to obtain a corresponding second JS file.

Step S24: and filtering the first JS file and the second JS file by using the names of the JS files.

In a specific implementation mode, the embodiment can utilize the JS file name to be right the first JS file and the second JS file perform deduplication operation, and then utilize the JS file name to perform JS file filtering so as to filter out the JS file which does not need to be detected.

That is, the JS file acquired by the crawler and the dictionary scanning is first deduplicated. Further, because some JS files are JS files of the third-party security component, the JS files finally obtained through the above process reveal the JS file url to be tested as sensitive information. Because the JS file of the third-party component often does not have the sensitive information of the user to find, the detection efficiency is low because the JS file is too many files to detect if the JS file is not filtered out. Js files can be filtered out according to the characteristics of file names, such as: jquery and the like, sensitive information stored by a developer does not exist in the JS file, and the JS file is filtered by using a white list in order to improve the detection efficiency. After filtering is completed, the finally obtained JS file is the JS file url to be detected for sensitive information leakage.

Step S25: and matching the file contents of the first JS file and the second JS file after filtering by utilizing a preset rule base to obtain corresponding first sensitive information.

Step S26: and if the first sensitive information is a sensitive URL, sending a request to the sensitive URL to obtain corresponding response data.

Step S27: and matching the response data by using the preset rule base to obtain corresponding second sensitive information.

Step S28: generating a corresponding detection report; the detection report comprises the JS file matched with the preset rule base, the first sensitive information, the second sensitive information and the sensitive information type.

That is, the present embodiment can generate a detection report of JS sensitive information leakage, where the content includes url of the requested JS, sensitive information leakage content, and type, and output the report to the Word report.

For example, referring to fig. 3, an embodiment of the present application discloses a flowchart of a specific JS-sensitive information leakage detection method.

That is, the method and the system can automatically detect and verify the JS sensitive information leakage, improve the accuracy and the working efficiency of detecting the JS sensitive information leakage leak, help website administrators and operation and maintenance personnel to find out the sensitive information leakage problem in the JS and correct the sensitive information leakage problem in time, so as to prevent attackers from utilizing the sensitive information.

Referring to fig. 4, an embodiment of the present application discloses a JS sensitive information leakage detection device, including:

a website URL obtaining module 11, configured to obtain a to-be-detected URL of a target website;

the JS file crawling module 12 is used for accessing the URL to be detected based on a crawler technology to obtain a corresponding first JS file;

the file dictionary scanning module 13 is configured to perform JS file scanning on the target website by using a file dictionary to obtain a corresponding second JS file;

the JS file matching module 14 is configured to match file contents of the first JS file and the second JS file by using a preset rule base, so as to obtain corresponding first sensitive information;

a response data obtaining module 15, configured to send a request to a sensitive URL if the first sensitive information is the sensitive URL, so as to obtain corresponding response data;

and the response data matching module 16 is configured to match the response data by using the preset rule base to obtain corresponding second sensitive information.

The JS file crawling module 12 is specifically configured to crawl a website page corresponding to the URL to be detected in an asynchronous crawler manner to obtain the corresponding first JS file.

The JS sensitive information leakage detection device further comprises a crawler stopping control module, and the crawler stopping control module is used for stopping the crawler when the same files exist in all the JS files of the website page corresponding to the URL to be detected in the crawling process.

The JS sensitive information leakage detection device further comprises a JS file filtering module, and the first JS file and the second JS file are filtered by using the JS file name.

The JS sensitive information leakage detection device further comprises a dictionary file name removal module, and the JS file names corresponding to the first JS files are removed from the file dictionary.

The JS file matching module 14 is specifically configured to match the first JS file with the second JS file by using the regular expression in the preset rule base;

the response data matching module 16 is specifically configured to match the response data by using a regular expression in the preset rule base.

The JS sensitive information leakage detection device also comprises a sensitive information classification module which classifies the first sensitive information and the second sensitive information by utilizing the preset rule base;

the JS sensitive information leakage detection device also comprises a detection report generation module for generating a corresponding detection report; the detection report comprises the JS file matched with the preset rule base, the first sensitive information, the second sensitive information and the sensitive information type.

Referring to fig. 5, the embodiment of the present application discloses a JS-sensitive information leakage detection device, which includes a processor 21 and a memory 22; wherein, the memory 22 is used for saving computer programs; the processor 21 is configured to execute the computer program to implement the JS sensitive information leakage detection method disclosed in the foregoing embodiment.

For a specific process of the JS sensitive information leakage detection method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not described here.

Further, an embodiment of the present application further discloses a computer-readable storage medium for storing a computer program, where the computer program is executed by a processor to implement the JS sensitive information leakage detection method disclosed in the foregoing embodiment.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The method, the device, the equipment and the medium for detecting the JS sensitive information leakage provided by the application are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the application, and the description of the embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A JS sensitive information leakage detection method is characterized by comprising the following steps:

acquiring a URL to be detected of a target website;

2. The JS sensitive information leakage detection method as claimed in claim 1, wherein the accessing the URL to be detected based on crawler technology to obtain a corresponding first JS file comprises:

3. The JS-sensitive information leakage detection method according to claim 1, further comprising:

4. The JS-sensitive information leakage detection method according to claim 1, wherein before the matching of the first JS file and the second JS file is performed by using the preset rule base, the method further includes:

5. The JS-sensitive information leakage detection method according to claim 1, wherein the JS file scanning is performed on the target website by using the file dictionary, and before the corresponding second JS file is obtained, the method further includes:

6. The JS-sensitive information leakage detection method as recited in claim 1,

utilize and preset the rule base right first JS file with the second JS file matches, include:

the matching the response data by using the preset rule base comprises:

7. The JS-sensitive information leakage detection method according to any one of claims 1 to 6, characterized by further comprising:

8. The utility model provides a JS sensitive information leakage detection device which is characterized by comprising:

9. The JS-sensitive information leakage detection device is characterized by comprising a processor and a memory; wherein the content of the first and second substances,

the memory is used for storing a computer program;

the processor configured to execute the computer program to implement the JS-sensitive information leakage detecting method according to any one of claims 1 to 7.

10. A computer-readable storage medium characterized by holding a computer program, wherein the computer program when executed by a processor implements the JS-sensitive information leakage detecting method according to any one of claims 1 to 7.