CN114996714A - Vulnerability detection method and device, electronic equipment and storage medium - Google Patents

Vulnerability detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114996714A
CN114996714A CN202210660234.3A CN202210660234A CN114996714A CN 114996714 A CN114996714 A CN 114996714A CN 202210660234 A CN202210660234 A CN 202210660234A CN 114996714 A CN114996714 A CN 114996714A
Authority
CN
China
Prior art keywords
webpage
login
information
condition
blasting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210660234.3A
Other languages
Chinese (zh)
Inventor
陈松
何灿
吴鹤意
闫凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202210660234.3A priority Critical patent/CN114996714A/en
Publication of CN114996714A publication Critical patent/CN114996714A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a vulnerability detection method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a Uniform Resource Locator (URL) of each webpage of a target website; determining a login webpage of the target website based on the URL of each webpage; detecting whether the login webpage has a verification code or not to obtain a first detection result; and detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode aiming at the login webpage.

Description

Vulnerability detection method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of network security technologies, and in particular, to a vulnerability detection method and apparatus, an electronic device, and a storage medium.
Background
The web browsing is an important link of modern life, and various websites usually have an identity authentication mechanism, that is, a user needs to log in the website and then browse the web. If the website has identity authentication loopholes, the network security of the user can be affected.
However, for how to automatically detect the identity authentication vulnerabilities existing in various websites, no effective solution exists in the related art.
Disclosure of Invention
In order to solve the related technical problems, embodiments of the present application provide a vulnerability detection method, apparatus, electronic device, and storage medium.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a vulnerability detection method, which comprises the following steps:
acquiring a Uniform Resource Locator (URL) of each webpage of a target website;
determining a login webpage of the target website based on the URL of each webpage;
detecting whether the login webpage has a verification code or not to obtain a first detection result;
and detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode aiming at the login webpage.
In the foregoing solution, the determining the login webpage of the target website based on the URL of each webpage includes:
acquiring first information and/or second information of each webpage based on the URL of each webpage; the first information comprises response information returned by the target website when the target website sends an access request of a corresponding webpage; the second information comprises text presented by a corresponding webpage;
and determining the login webpage of the target website according to the first information and/or the second information of each webpage.
In the foregoing solution, the determining the login webpage of the target website according to the first information and the second information of each webpage includes:
judging whether first information of the corresponding webpage contains a preset first field or not aiming at each webpage;
and under the condition that the first information of the corresponding webpage does not contain the first field, judging whether the corresponding webpage is a login webpage or not by utilizing the second information of the corresponding webpage.
In the foregoing solution, the determining whether the corresponding web page is a login web page by using the second information of the corresponding web page includes:
performing regular matching on second information of the corresponding webpage by using keywords in a preset first keyword set to obtain a first matching result; the first matching result represents the matching degree between the second information of the corresponding webpage and the first keyword set;
determining that the corresponding webpage is a login webpage under the condition that the first matching result meets a first condition; determining that the corresponding webpage is not a login webpage under the condition that the first matching result meets a second condition;
under the condition that the first matching result does not meet a first condition and the first matching result does not meet a second condition, determining the text feature vector of the corresponding webpage by using second information of the corresponding webpage; and inputting the text feature vector of the corresponding webpage into a pre-trained first model to judge whether the corresponding webpage is a login webpage or not.
In the above solution, the detecting whether the verification code exists in the login webpage includes:
acquiring third information of the login webpage, wherein the third information comprises a first label in a source code of the login webpage;
matching the third information with keywords in a preset second keyword set;
under the condition that keywords matched with the third information exist in the second keyword set, the first detection result represents that a verification code exists in the login webpage; and under the condition that the keyword matched with the third information does not exist in the second keyword set, the first detection result represents that the verification code does not exist in the login webpage.
In the above scheme, the verification code includes a picture verification code; the detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode for the login webpage comprises the following steps:
under the condition that the first detection result represents that the login webpage has the verification code, acquiring a picture corresponding to the verification code;
identifying the verification code by inputting a picture corresponding to the verification code into a pre-trained second model;
blasting the login webpage by using the identified verification code and a preset weak password dictionary;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network has no identity authentication vulnerability.
In the above scheme, the detecting whether the target website has an identity authentication vulnerability based on the first detection result and the weak password blasting mode for the login webpage includes:
blasting the login webpage by using a preset weak password dictionary under the condition that the first detection result represents that the login webpage does not have the verification code;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network has no identity authentication vulnerability.
In the above scheme, when blasting the login webpage, the method further includes:
determining at least one of fourth information, fifth information, and sixth information; the fourth information comprises response information returned by the target website when blasting the login webpage; the fifth information represents the change condition of the URL of the login webpage when the login webpage is exploded; the sixth information represents the change condition of the text presented by the login webpage when the login webpage is exploded;
and judging whether the blasting is successful or not by using at least one of the fourth information, the fifth information and the sixth information.
In the above scheme, when blasting the login webpage, the method further includes:
blasting the login webpage in a mode of preferentially traversing the login password contained in the weak password dictionary.
In the above solution, when blasting the login webpage, the method further includes:
according to a preset period, periodically acquiring second information of the login webpage, and matching the second information with keywords in a preset third keyword set; the second information comprises a text presented by the login webpage; keywords in the third keyword set are related to login limiting conditions;
stopping blasting when the keywords matched with the second information exist in the third keyword set; and under the condition that the keywords matched with the second information do not exist in the third keyword set, continuing blasting.
The embodiment of the present application further provides a vulnerability detection apparatus, including:
an acquisition unit configured to acquire a URL of each web page of a target website;
the first processing unit is used for determining a login webpage of the target website based on the URL of each webpage;
the second processing unit is used for detecting whether the login webpage has the verification code or not to obtain a first detection result;
and the third processing unit is used for detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode aiming at the login webpage.
An embodiment of the present application further provides an electronic device, including: a processor and a memory for storing a computer program capable of running on the processor,
wherein the processor is configured to perform the steps of any of the above methods when running the computer program.
Embodiments of the present application further provide a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of any one of the above methods.
According to the vulnerability detection method, the vulnerability detection device, the electronic equipment and the storage medium, the URL of each webpage of the target website is obtained; determining a login webpage of the target website based on the URL of each webpage; detecting whether the login webpage has a verification code or not to obtain a first detection result; and detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode aiming at the login webpage. According to the scheme provided by the embodiment of the application, the login webpage of the target website is determined based on the URL of each webpage of the target website, whether the login webpage has a verification code or not is detected, a first detection result is obtained, and whether the target website has an identity authentication vulnerability or not is detected based on the first detection result and a weak password blasting mode aiming at the login webpage; therefore, automatic black box scanning for website identity authentication vulnerabilities can be achieved, namely, only a target website needs to be obtained, whether a login webpage of the target website and a verification code exist in the login webpage can be intelligently (namely automatically) identified, whether the identity authentication vulnerabilities exist in the target website is detected, and therefore the detection capability and the scanning efficiency of the website identity authentication vulnerabilities can be improved.
Drawings
Fig. 1 is a schematic flowchart of a vulnerability detection method according to an embodiment of the present application;
fig. 2 is a schematic overall detection flow diagram of an automated black box scanning method for network (Web) identity authentication vulnerabilities based on Artificial Intelligence (AI) in an embodiment of the present application;
FIG. 3 is a schematic diagram of a Web page and a verification code according to an embodiment of the present application;
FIG. 4 is a schematic view of a login page identification process according to an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a dynamic verification code identification process according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a weak password blasting process according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a vulnerability detection apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples.
In the related art, vulnerability scanning can be performed by using a scanning tool.
However, the scanning tool in the related art cannot automatically scan the identity authentication vulnerabilities of various websites, that is, cannot automatically detect the identity authentication vulnerabilities existing in the various websites; specifically, the scanning tool in the related art has the following problems:
problem 1: the primary work of identity authentication vulnerability scanning is to search a user background login page, and a scanning tool in the related technology needs to perform secondary identification (namely, judge whether a crawled webpage is a login webpage) on the crawled Web page by relying on a crawler technology to determine the login page, so that the scanning efficiency is low; or, the login page needs to be manually specified in advance; in other words, the scan tool in the related art cannot automatically identify the landing page.
Problem 2: the scanning tool in the related art cannot process the login page with the limitation of the dynamic verification code, namely, the content identification of the dynamic verification code is not supported, and the automatic scanning cannot be realized aiming at the Web page with the verification code.
Based on this, in various embodiments of the present application, a login webpage of a target website is determined based on a URL of each webpage of the target website, whether a verification code exists in the login webpage is detected to obtain a first detection result, and whether an identity authentication vulnerability exists in the target website is detected based on the first detection result and a weak password blasting manner for the login webpage; therefore, automatic black box scanning aiming at website identity authentication loopholes can be realized, namely, only a target website needs to be known, the login webpage of the target website can be intelligently (namely, automatically) identified, so that the problem 1 is solved, whether the login webpage has a verification code or not can be automatically identified, the problem 2 is solved, whether the target website has the identity authentication loopholes or not can be automatically detected, and the detection capability and the scanning efficiency of the website identity authentication loopholes are improved.
The embodiment of the present application provides a vulnerability detection method, which is applied to an electronic device (such as a server), and as shown in fig. 1, the method includes:
step 101: acquiring the URL of each webpage of a target website;
step 102: determining a login webpage of the target website based on the URL of each webpage;
step 103: detecting whether the login webpage has a verification code or not to obtain a first detection result;
step 104: and detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode aiming at the login webpage.
Here, it is understood that the login web page is used for a user to log in the target website.
In practical application, the Web page can also be called a Web page; the login webpage can also be called a login page; the verification code refers to a dynamic verification code, namely a verification code which dynamically changes along with refreshing of a webpage.
In step 101, in actual application, the electronic device may obtain, from a local or other electronic device (e.g., a User Equipment (UE), another server, etc.), related information of the target website, such as a URL of a home page of the target website; acquiring the URL of each webpage of the target website based on the related information of the target website; for example, the electronic device may crawl (i.e., acquire) the URL of each web page of the target website by means of a crawler based on the relevant information of the target website. Alternatively, the electronic device may directly obtain the URL of each web page of the target website from a local or other electronic device. The specific manner of obtaining the URL of each web page of the target website may be set according to requirements, which is not limited in the embodiment of the present application.
In step 102, in actual application, for each web page of the target website, it is considered that response information returned by the target website when sending an access request of the corresponding web page to the target website may include a field (hereinafter, referred to as a first field) capable of representing that the corresponding web page is a login web page; therefore, the electronic device can traverse the URL of each web page, i.e., send an access request of each web page to the target website, and determine whether the corresponding web page is a login web page according to response information returned by the target website.
Based on this, in an embodiment, the specific implementation of step 102 may include:
acquiring first information of each webpage based on the URL of each webpage, wherein the first information comprises response information returned by the target website when an access request of the corresponding webpage is sent to the target website;
and determining the login webpage of the target website according to the first information of each webpage.
In practical application, the obtaining the first information of each web page based on the URL of each web page may include: and aiming at each webpage, sending an access request of the corresponding webpage to the target website based on the URL of the corresponding webpage, and receiving response information returned by the target website based on the access request.
In an embodiment, the determining the login webpage of the target website according to the first information of each webpage may include:
judging whether first information of the corresponding webpage contains a preset first field or not aiming at each webpage;
and determining that the corresponding webpage is a login webpage under the condition that the first information of the corresponding webpage contains the first field.
Here, the first field may be set according to requirements, which is not limited in this embodiment of the application. Illustratively, the first field may contain "www-author".
In actual application, the login webpage of the target website can be quickly identified by judging whether the first information of the corresponding webpage contains the first field, so that the detection capability and the scanning efficiency of the website identity authentication vulnerability are improved.
In practical application, when the first information of the corresponding webpage does not include the first field, it cannot be stated that the corresponding webpage is not necessarily a login webpage, and meanwhile, the text presented by the login webpage usually includes specific symbolic keywords, such as "login", "password", "login", and "password", etc.; therefore, when the first information of each web page is acquired, the electronic device may further acquire a text (denoted as second information in the following description) presented by the corresponding web page, and determine whether the corresponding web page is a login web page by combining the first information and the second information.
Based on this, in an embodiment, the specific implementation of step 102 may include:
acquiring first information and second information of each webpage based on the URL of each webpage; the first information comprises response information returned by the target website when the target website sends an access request of a corresponding webpage; the second information comprises text presented by a corresponding webpage;
and determining the login webpage of the target website according to the first information and the second information of each webpage.
In an embodiment, the determining the login webpage of the target website according to the first information and the second information of each webpage may include:
judging whether first information of the corresponding webpage contains a preset first field or not aiming at each webpage;
and under the condition that the first information of the corresponding webpage does not contain the first field, judging whether the corresponding webpage is a login webpage or not by utilizing the second information of the corresponding webpage.
In practical application, whether the corresponding webpage is the login webpage or not is judged by combining the first information and the second information, and the efficiency and the accuracy of login webpage identification can be considered, so that the efficiency and the accuracy of website identity authentication vulnerability detection are considered.
In actual application, a set of symbolic keywords (hereinafter referred to as a first keyword set) that may be included in a text presented by a login webpage may be preset, and whether the corresponding webpage is the login webpage or not may be determined by matching second information of the corresponding webpage with keywords in the first keyword set.
Based on this, in an embodiment, the determining whether the corresponding web page is the login web page by using the second information of the corresponding web page may include:
and judging whether the corresponding webpage is a login webpage or not by utilizing the second information of the corresponding webpage and a preset first keyword set.
In practical application, the keywords in the first keyword set may be set according to requirements, such as "login", "password", "login", and "password", which are not limited in this embodiment of the present application.
In an embodiment, the determining whether the corresponding web page is a login web page by using the second information of the corresponding web page and the first keyword set may include:
performing regular matching on second information of the corresponding webpage by using the keywords in the first keyword set to obtain a first matching result; the first matching result represents the matching degree between the second information of the corresponding webpage and the first keyword set;
determining that the corresponding webpage is a login webpage under the condition that the first matching result meets a first condition; and determining that the corresponding webpage is not the login webpage under the condition that the first matching result meets a second condition.
In practical application, the specific manner of calculating the first matching result, the first condition, and the second condition may be set according to requirements, which is not limited in the embodiment of the present application. Each keyword in the first keyword set may correspond to a preset weight value, and the weight value may reflect a probability that the corresponding web page is the login web page in a case that the second information of the corresponding web page includes the corresponding keyword. When the keywords in the first keyword set are used to perform regular matching on the second information of the corresponding webpage, the weight values corresponding to the keywords in the first keyword set included in the second information of the corresponding webpage may be added to obtain the first matching result. In the case that the first matching result is greater than a first threshold (i.e., the first condition), it may be determined that the corresponding web page is a login web page; in the case that the first matching result is smaller than a second threshold (i.e., the second condition), it may be determined that the corresponding web page is not a login web page; the second threshold is smaller than the first threshold, and values of the first threshold and the second threshold can be set according to requirements.
In practical application, whether the corresponding webpage is a login webpage or not is judged by matching the second information of the corresponding webpage with the keywords in the first keyword set, and the login webpage of the target website can be quickly identified, so that the detection capability and the scanning efficiency of website identity authentication vulnerabilities are improved.
In practical application, in consideration of the fact that the first matching result may not satisfy the first condition nor the second condition, the electronic device may determine whether the corresponding web page is a login web page by using a pre-trained AI classification model (hereinafter, referred to as a first model) when the first matching result does not satisfy the first condition and the second condition.
Based on this, in an embodiment, the determining whether the corresponding web page is the login web page by using the second information of the corresponding web page may include:
and judging whether the corresponding webpage is a login webpage or not by utilizing the second information of the corresponding webpage, a preset first keyword set and a pre-trained first model.
In an embodiment, the determining whether the corresponding web page is a login web page by using the second information of the corresponding web page, the first keyword set, and the first model may include:
performing regular matching on second information of the corresponding webpage by using keywords in the first keyword set to obtain a first matching result; the first matching result represents the matching degree between the second information of the corresponding webpage and the first keyword set;
under the condition that the first matching result does not satisfy a first condition and a second condition, namely under the condition that the first matching result does not satisfy the first condition and the first matching result does not satisfy the second condition, determining the text feature vector of the corresponding webpage by using second information of the corresponding webpage;
judging whether the corresponding webpage is a login webpage or not by utilizing the text characteristic vector of the corresponding webpage and the first model; in other words, it is determined whether the corresponding web page is a login web page by inputting the text feature vector of the corresponding web page into the first model.
Here, it is understood that the first matching result satisfies the first condition, which indicates (i.e., characterizes) that the corresponding web page is a login web page; the first matching result meets the second condition, and the corresponding webpage is not a login webpage; the first matching result does not satisfy the first condition and the second condition, which indicates that whether the corresponding webpage is a login webpage cannot be judged by using the keywords in the first keyword set.
In practical application, the text presented by the corresponding webpage may contain Chinese text and English text, so that when the text feature vector of the corresponding webpage is determined, the Chinese text feature and the English text feature of the corresponding webpage can be respectively extracted, and then the Chinese text feature and the English text feature of the corresponding webpage are utilized to determine the text feature vector of the corresponding webpage. Illustratively, the chinese text features of the corresponding web page may be extracted using Natural Language Processing (NLP) technology, such as extracting the chinese text features of the corresponding web page using NLP tool jieba; extracting English text features of the corresponding webpage by using a preset English keyword set (which is marked as a fourth keyword set in subsequent description); and finally, determining the text characteristic vector of the corresponding webpage by using the Chinese text characteristic and the English text characteristic of the corresponding webpage.
In practical application, the step of judging whether the corresponding webpage is the login webpage or not by using the text feature vector of the corresponding webpage and the first model means that the text feature vector of the corresponding webpage is input into the first model to obtain an output result of the first model, and the output result represents whether the corresponding webpage is the login webpage or not.
In practical application, whether the corresponding webpage is a login webpage or not is judged by utilizing the second information of the corresponding webpage, the first keyword set and the first model, and a double recognition mode of priority of keyword regular matching and bottom entry of an AI classification model is realized, so that the advantages of keyword matching speed block and AI accurate recognition can be fully played, and the balance of final detection effect (namely accuracy) and scanning performance (namely efficiency) is guaranteed; in other words, the efficiency and the accuracy of identification of the login webpage can be considered, so that the efficiency and the accuracy of detection of the website identity authentication vulnerability can be considered.
In practical application, the first model needs to be trained in advance.
Based on this, in an embodiment, the method may further include:
determining a first sample set;
and training the first model by using the first sample set and a first AI algorithm, wherein the first model is used for determining whether the corresponding webpage is a login webpage or a non-login webpage according to the text characteristic vector of the webpage, namely judging whether the corresponding webpage is the login webpage or not.
In practical application, specific types and numbers of the first AI algorithm used for training the first model may be set according to requirements, for example, an algorithm related to an NLP technology, a Text Convolutional Neural Network (TextCNN) algorithm, and the like, which is not limited in the embodiment of the present application. Illustratively, a large number of login webpages and non-login webpages may be collected and labeled, the first sample set is determined (the number of samples in the first sample set and the ratio of positive samples and negative samples may be set according to requirements, which is not limited in this embodiment of the present application), and a text feature vector of each sample (i.e., each webpage) in the first sample set is determined; for each sample, performing word segmentation and extracting Chinese text features by using an NLP (non line of sight) tool jieba, extracting specified text contents (namely texts corresponding to keywords in the fourth keyword set) by using the fourth keyword set to obtain English text features, and determining text feature vectors of the corresponding samples by using the Chinese text features and the English text features of the corresponding samples; and training a TextCNN model (namely the first model) based on the text feature vector of each sample to obtain the optimal network model parameters, namely completing the training of the first model.
In actual application, the first keyword set may not be used, and the first model may be directly used to determine whether the corresponding web page is a login web page.
Based on this, in an embodiment, the determining whether the corresponding web page is the login web page by using the second information of the corresponding web page may include:
and judging whether the corresponding webpage is a login webpage or not by utilizing the second information of the corresponding webpage and the pre-trained first model.
In an embodiment, the determining whether the corresponding web page is a login web page by using the second information of the corresponding web page and the first model may include:
determining a text feature vector of the corresponding webpage by using the second information of the corresponding webpage;
and judging whether the corresponding webpage is a login webpage or not by utilizing the text feature vector of the corresponding webpage and the first model.
Here, the determining whether the corresponding web page is the login web page by using the text feature vector of the corresponding web page and the first model means that the text feature vector of the corresponding web page is input into the first model to obtain an output result of the first model, and the output result represents whether the corresponding web page is the login web page. In addition, it should be noted that: the specific manner of determining the text feature vector of the corresponding web page and the training manner of the first model are described in detail above, and are not repeated here.
In practical application, the first model is directly used for judging whether the corresponding webpage is a login webpage or not, and the login webpage of the target website can be accurately identified, so that the detection capability and the scanning efficiency of website identity authentication vulnerabilities are improved.
In practical application, the first information of the corresponding webpage is not used, and whether the corresponding webpage is the login webpage or not can be directly judged according to the second information of the corresponding webpage.
Based on this, in an embodiment, the specific implementation of step 102 may include:
acquiring second information of each webpage based on the URL of each webpage, wherein the second information comprises texts presented by the corresponding webpage;
and determining the login webpage of the target website according to the second information of each webpage.
Specifically, in an embodiment, the determining the login webpage of the target website according to the second information of each webpage may include:
and for each webpage, judging whether the corresponding webpage is a login webpage or not by using the second information of the corresponding webpage and a preset first keyword set and/or a pre-trained first model.
Here, it should be noted that: the specific manner for determining whether the corresponding web page is the login web page by using the second information of the corresponding web page and the first keyword set and/or the first model has been described in detail above, and is not repeated here.
In actual application, whether the corresponding webpage is the login webpage or not is directly judged according to the second information of the corresponding webpage, and the login webpage of the target website can be accurately identified, so that the detection capability and the scanning efficiency of website identity authentication vulnerabilities are improved.
In step 103, in actual application, it is considered that the login webpage usually presents the dynamic verification code in the form of a picture, and therefore, whether the login webpage has the verification code can be detected by detecting whether a tag related to the picture (hereinafter, referred to as a first tag) in a source code of the login webpage has a keyword in a specific keyword set (hereinafter, referred to as a second keyword set).
Based on the above, in an embodiment, the specific implementation of step 103 may include:
acquiring third information of the login webpage, wherein the third information comprises a first label in a source code of the login webpage;
matching the third information with keywords in a preset second keyword set;
under the condition that keywords matched with the third information exist in the second keyword set, the first detection result represents that a verification code exists in the login webpage; and under the condition that the keyword matched with the third information does not exist in the second keyword set, the first detection result represents that the verification code does not exist in the login webpage.
Here, the authentication code includes a picture authentication code, i.e., an authentication code presented in a picture form.
In practice, the third information may include hypertext Markup Language (HTML) code, and the first tag may include an < img > tag.
In practical application, the keywords in the second keyword set may be set according to requirements, such as "yanzhengma", "capture", "verification code", "verify code", and the like, which is not limited in this embodiment of the present application.
In step 104, in actual application, under the condition that the first detection result indicates that the verification code exists in the login webpage, the verification code in the verification code picture may be recognized by using a pre-trained AI model (which is written as a second model in the subsequent description), and then the login webpage is blasted by using the recognized verification code and a preset weak password dictionary.
Based on this, in an embodiment, in a case that the verification code includes a picture verification code, the specific implementation of step 104 may include:
under the condition that the first detection result represents that the login webpage has the verification code, acquiring a picture corresponding to the verification code;
identifying the verification code by using the picture corresponding to the verification code and a pre-trained second model; in other words, the verification code is identified by inputting the picture corresponding to the verification code into the second model;
blasting the login webpage by using the identified verification code and a preset weak password dictionary;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network does not have identity authentication loopholes.
In practical application, the second model needs to be trained in advance.
Based on this, in an embodiment, the method may further include:
determining a second set of samples;
and training the second model by utilizing the second sample set and a second AI algorithm, wherein the second model is used for identifying the verification code in the picture.
In practical applications, the specific type and number of the second AI algorithm used for training the second model may be set according to requirements, such as a Convolutional Neural Network (CNN) algorithm, a Long Short-Term Memory (LSTM) algorithm, a Connection Time Classification (CTC) algorithm, and the like, and this is not limited in this embodiment of the present application. Exemplarily, a large number of verification code pictures can be generated or collected, and the verification code corresponding to each verification code picture is labeled to obtain the second sample set (the number of samples in the second sample set can be set according to requirements, which is not limited in the embodiment of the present application); extracting picture features of each sample (namely each verification code picture) in the second sample set, and iteratively training a CNN + LSTM + CTC neural network model (namely the second model) based on the extracted picture features until cross entropy loss is reduced to the minimum; in the CNN + LSTM + CTC neural network model, CNN is used for extracting the characteristics of the verification code picture, LSTM is used for identifying a single character, and CTC is responsible for filtering redundant and blank contents.
In practical application, the identifying the verification code by using the picture corresponding to the verification code and the pre-trained second model means that the picture corresponding to the verification code is input into the second model to obtain an output result of the second model, and the output result includes the verification code identified by the second model.
In practical application, the weak password dictionary may include a plurality of passwords that are easily guessed or cracked by a cracking tool, the passwords may include user names (english may be expressed as username) and passwords (english may be expressed as password), such as "admin", "root", "123456", and the like, and specific contents of the weak password dictionary may be set according to requirements, which is not limited in the embodiment of the present application.
In practical application, the blasting of the login webpage by using the identified verification code and the preset weak password dictionary means that the login webpage is attacked by using the identified verification code and the weak password in the weak password dictionary, in other words, the target website is logged in on the login webpage by using the identified verification code and the user name and the password in the weak password dictionary. It can be understood that when the target website is successfully logged in on the login webpage by using the identified verification code and any user name and password in the weak password dictionary, the blasting is successful; and blasting failure occurs when the target website cannot be successfully logged in on the login webpage by using the identified verification code and all user names and passwords in the weak password dictionary.
In practical application, when the first detection result represents that the login webpage does not have the verification code, the login webpage can be blasted by directly using a preset weak password dictionary.
Based on this, in an embodiment, the specific implementation of step 104 may include:
blasting the login webpage by using a preset weak password dictionary under the condition that the first detection result represents that the login webpage does not have the verification code;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network has no identity authentication vulnerability.
In practical application, the blasting of the login webpage by using the preset weak password dictionary means that the weak password in the weak password dictionary is used for attacking the login webpage, in other words, the target website is logged in the login webpage by using the user name and the password in the weak password dictionary. It can be understood that when the target website is successfully logged in on the login webpage by using the user name and the password in the weak password dictionary, the blasting is successful; and blasting failure occurs when the target website cannot be successfully logged in on the login webpage by using all user names and passwords in the weak password dictionary.
In practical application, because the weak password dictionary may include a plurality of passwords that are easily guessed or cracked by a cracking tool, when blasting the login webpage by using the identified verification code and the weak password dictionary, or when blasting the login webpage by using the weak password dictionary, the passwords in the weak password dictionary need to be traversed, that is, the user names and the passwords in the weak password dictionary are traversed. For example, when the login webpage is blasted by using the identified verification code and the weak password dictionary, the electronic device may search for a user name, a password, and a label and a textbox position of the verification code on the login webpage through a built-in browser control (e.g., driver), load the user name, the password, and the identified verification code in the weak password dictionary in a traversing manner, sequentially fill the textbox of the login webpage, and submit a POST request through the built-in browser control to blast the login webpage, that is, send a login request carrying the user name, the password, and the identified verification code in the weak password dictionary to the target website.
In practical application, when the identified verification code and the weak password dictionary are used for blasting the login webpage, or when the weak password dictionary is used for blasting the login webpage, whether blasting is successful or not can be determined according to at least one of response information returned by the target website, the change condition of the URL of the login webpage and the change condition of the text presented by the login webpage, namely whether the target website is successfully logged in is determined.
Based on this, in an embodiment, when blasting the login webpage, the method may further include:
determining at least one of fourth information, fifth information, and sixth information; the fourth information comprises response information returned by the target website when blasting is carried out on the login webpage; the fifth information represents the change condition of the URL of the login webpage when the login webpage is exploded; the sixth information represents the change condition of the text presented by the login webpage when the login webpage is exploded;
and judging whether the blasting is successful or not by using at least one of the fourth information, the fifth information and the sixth information.
In actual application, the electronic device logs in the target website by using the identified verification Code and the user name and the password in the weak password dictionary on the login webpage, or after logging in the target website by using the user name and the password in the weak password dictionary on the login webpage, the target website may return response information including a page response Status Code (english may be expressed as Status Code); in other words, the fourth information may include a page response status code. Illustratively, in the case that the page response status code included in the fourth information is 200, the login is successful, that is, the blasting is successful; if the page response status code included in the fourth message is 404 or 304, the login failure, i.e., the blasting failure, is described.
In practical applications, before blasting the login webpage, the URL of the login webpage may include a preset second field, such as "login. In the case that blasting is successfully performed on the login webpage, the URL of the login webpage may include a preset third field, such as "index. When the URL of the login webpage represented by the fifth information is changed from the second field to the third field, the login is successful, namely the blasting is successful; and under the condition that the fifth information represents that the URL of the login webpage is not changed, the login failure is indicated, namely the blasting failure.
In practical application, when the sixth information meets a third condition, the successful login, that is, the successful blasting, can be determined; in a case where the sixth information satisfies a fourth condition, it may be determined that the login fails, that is, the blasting fails; the specific manner of determining the sixth information and the specific contents of the fourth condition and the fifth condition may be set according to requirements, which is not limited in the embodiment of the present application. For example, a preset Pthon library for determining text similarity may be used to determine a change condition of a text presented by the login webpage when the login webpage is exploded, that is, sixth information output by the Pthon library is obtained; when the sixth information is 0 (i.e., the third condition), it indicates that the similarity of the text presented by the login webpage before and after blasting the login webpage is small (i.e., the change is large), and it can be determined that the login is successful; when the sixth information is 1 (i.e., the fourth condition), it is described that the similarity of the text presented by the login web page before and after blasting the login web page is large (i.e., the change is small), and it may be determined that the login has failed.
Based on this, in an embodiment, the determining whether the blasting is successful by using at least one of the fourth information, the fifth information, and the sixth information may include:
determining that the blasting was successful when at least one of the following conditions is satisfied:
the fourth information includes a preset first page response status code (such as 200);
the fifth information represents that the URL of the login webpage is changed from a second field (such as "login. jsp") containing a preset number to a third field (such as "index. jsp") containing a preset number;
the sixth information satisfies a third condition (for example, the sixth information is 0).
In an embodiment, the determining whether blasting is successful by using at least one of the fourth information, the fifth information, and the sixth information may further include:
determining a blasting failure when at least one of the following conditions is satisfied:
the fourth information comprises a preset second page response status code (such as 404 or 304);
the fifth information represents that the URL of the login webpage is not changed;
the sixth information satisfies a fourth condition (for example, the fourth information is 1).
In practical application, the target website may have a login frequency limiting mechanism (i.e., the number of times of inputting a wrong password is allowed), and in order to avoid invalid blasting (i.e., the number of times of inputting a wrong password reaches the upper limit of the target website), login blasting may be implemented in a password-first traversal manner, so that the detection capability and the scanning efficiency of website identity authentication vulnerabilities are improved.
Based on this, in an embodiment, when blasting the login webpage, the method may further include:
blasting the login webpage in a mode of preferentially traversing the login password contained in the weak password dictionary.
Here, the preferentially traversing the login password included in the weak password dictionary means that the user name is preferentially traversed by using the login password included in the weak password dictionary. Illustratively, assume that the weak password dictionary contains three usernames, "admin", "root", and "user", and three passwords, "123456", "654321", and "root 1234"; when the login blasting is implemented in a mode of password-first traversal, the user name admin, the root and the user can be traversed by the password 123456, namely the login webpage is blasted by the admin, 123456, the root, 123456 and the user, 123456 respectively; then, the user name 'admin', 'root' and 'user' are traversed by the password '654321', and finally, the user name 'admin', 'root' and 'user' are traversed by the password 'root 1234'.
In practical application, in order to avoid invalid blasting, it may be further detected every M times that blasting is performed to detect whether a keyword related to login restriction (which is written as a keyword in a third keyword set in the following description) exists in a text presented in the login webpage, where M is an integer greater than 1, a value of M may be set according to a requirement, for example, 5, and the value of M is not limited in the embodiment of the present application. Stopping blasting if the keywords related to login limitation exist; if no keywords related to login limitations exist, blasting is continued and monitoring is kept (namely, whether keywords related to login limitations exist in the text presented by the login webpage is detected every M times by blasting).
Based on this, in an embodiment, when blasting the login webpage, the method may further include:
according to a preset period (namely M), periodically acquiring second information of the login webpage, and matching the second information with keywords in a preset third keyword set; the second information comprises a text presented by the login webpage; keywords in the third keyword set are related to login limiting conditions;
stopping blasting when the keywords matched with the second information exist in the third keyword set; and under the condition that the keywords matched with the second information do not exist in the third keyword set, continuing blasting.
In practical applications, the keywords in the third keyword set may be set according to requirements, such as "frequently", "lock", and "try again later", which is not limited in this embodiment of the present application.
In practical application, after blasting is successful, detailed information of identity authentication loopholes can be recorded, namely at least one group of user name and password which are successfully logged in, or at least one group of user name, password and verification code which are successfully logged in; and the detailed information of the identity authentication loophole can be provided for the user, so that the user can maintain and/or upgrade the target website.
According to the vulnerability detection method provided by the embodiment of the application, the URL of each webpage of a target website is obtained; determining a login webpage of the target website based on the URL of each webpage; detecting whether the login webpage has a verification code or not to obtain a first detection result; and detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode aiming at the login webpage. According to the scheme provided by the embodiment of the application, the login webpage of the target website is determined based on the URL of each webpage of the target website, whether the login webpage has a verification code or not is detected, a first detection result is obtained, and whether the target website has an identity authentication vulnerability or not is detected based on the first detection result and the weak password blasting mode aiming at the login webpage; therefore, automatic black box scanning for website identity authentication vulnerabilities can be achieved, namely, only a target website needs to be obtained, whether a login webpage of the target website and a verification code exist in the login webpage can be intelligently (namely automatically) identified, whether the identity authentication vulnerabilities exist in the target website is detected, and therefore the detection capability and the scanning efficiency of the website identity authentication vulnerabilities can be improved.
The present application will be described in further detail with reference to the following application examples.
The application embodiment provides an automatic black box scanning method for Web identity authentication vulnerabilities based on AI aiming at the problems (namely the problems 1 and 2) of the scanning tool in the related technology, overcomes the problem that the scanning tool cannot automatically identify login pages and dynamic verification codes, and realizes identity authentication vulnerability scanning of various complex Web pages. As shown in fig. 2, the overall detection process of the AI-based automatic black box scanning method for Web identity authentication vulnerabilities according to the embodiment of the present application may include three sub-processes: log-in page identification, indefinite length dynamic verification code identification and weak password blasting. Fig. 3 shows some common Web pages (including landing pages and non-landing pages), and common picture authentication codes containing numbers and/or english.
The following describes the login page identification process, the variable-length dynamic authentication code identification process, and the weak password blasting process with reference to fig. 4 to 6, respectively.
First, the login page identification flow is described in detail with reference to fig. 4.
In practical application, the primary work of identity authentication vulnerability scanning is to search a user background login page, and the efficiency of secondary identification by manually specifying or crawling the page is low. Therefore, in the embodiment of the application, the background login page (i.e., the login web page) is intelligently searched based on the fusion framework of the keyword rule matching and the AI classification model, so that the performance of the scanning tool can be greatly improved, i.e., the scanning efficiency and the scanning accuracy are improved.
In the embodiment of the present application, as shown in fig. 4, the login page identification process includes two sub-processes of offline classification model (i.e. the first model) training and online Web page identification. When offline classification model training is performed, a large number of login pages and non-login pages (for example, 13000 login pages and 19000 non-login pages) can be collected and labeled, and the ratio of positive samples to negative samples can be 1: 1.5; then, Chinese and English text features in the page can be respectively extracted, and aiming at the Chinese text features, a NLP tool jieba can be used for carrying out word segmentation and then unified extraction; for the english text feature, the specified text content (i.e. the english text feature) can be extracted by presetting the english keyword set (i.e. the fourth keyword set). Finally, the TextCNN model (i.e., the first model) may be trained based on the extracted Chinese and English feature vectors to obtain optimal network model parameters.
In the application embodiment, when the online Web page is identified, because the login page usually has specific symbolic keywords, such as "login", "password", "login", and "password", etc., in consideration of the balance between the final detection effect (i.e., the accuracy of the detection result) and the scanning performance (i.e., the scanning efficiency), the application embodiment provides a dual identification mode with priority given to regular matching of the keywords and bottom of the AI classification model, and fully exerts the advantages of high matching speed of the keywords and accurate AI identification.
Specifically, as shown in fig. 4, the URL link of each page may be accessed, that is, an access request for each page is sent, and whether the current page is in the "www-authentication" login authentication mode is determined according to the header information of the Response (english may be expressed as Response), that is, whether the header information of the Response contains the "www-authentication" field (i.e., the first field) is determined; if this field is included, it may be determined that the Web login page was successfully identified. If the field is not included, a keyword set (namely the first keyword set) can be loaded to carry out regular matching on Chinese and/or English text contents at different positions in the page, and different weight matching scores (namely the weight values) can be corresponding to different keywords hit; if the final score S (i.e., the sum of the weight matching scores corresponding to the hit keywords) is greater than T0 (i.e., the first threshold), it may be determined that the current Web page is a landing page; if the final score S is less than T1 (i.e., the second threshold), then it may be determined that the current Web page is not a landing page (i.e., is a non-landing page). If the final score S is greater than or equal to T1 and less than or equal to T0 (i.e. the first matching result does not satisfy the first condition and the second condition), the text feature vector of the current Web page may be extracted, the extracted vector is input into an AI classification model (i.e. the first model) for prediction, and the result output by the model is the final decision result (i.e. whether the current Web page is equal to the page).
Next, the flow of the indefinite length dynamic authentication code identification is described in detail with reference to fig. 5.
In the embodiment of the application, as shown in fig. 5, after the login page is identified, it is necessary to determine whether the current Web page needs to be identified by the dynamic authentication code. The dynamic identifying process of the identifying code comprises two sub-processes of identifying code searching and identifying the content of the identifying code, and the identifying process of the content of the identifying code further comprises two sub-processes of model training under the line and identifying the identifying code on the line.
In this embodiment of the application, when the verification code is searched, it is necessary to determine whether a verification code mechanism exists on the current page, so that the Web login page may be scanned to obtain all < img > tags (i.e., the first tag) set, and all < img > tags may be subjected to matching traversal according to a preset verification code keyword (i.e., a keyword in the second keyword set) to determine whether the verification code exists. If the verification code exists, the specific position of the verification code needs to be positioned and recorded, and the verification code picture is stored locally for subsequent identification; if the verification code does not exist, skipping the verification code content identification process, and directly executing the blasting of the login page without the verification code.
In the embodiment of the application, when the model training is performed on line, a large amount of marked verification code data (namely verification code pictures) can be generated or collected, and the CNN + LSTM + CTC neural network model (namely the second model) is iteratively trained based on the extracted picture characteristics until the cross entropy loss is reduced to the minimum; the CNN is used for extracting the characteristics of the verification codes, the LSTM is used for identifying single characters, and the CTC is used for filtering redundant and blank contents.
In the embodiment of the application, after the verification code mechanism of the Web login page is determined, that is, after the verification code picture of the Web login page is stored locally, the content of the verification code picture can be identified by using a trained AI model (namely, the second model); in other words, the saved verification code picture may be input into the AI model, the model output result (the model output result includes the identification result of the content of the verification code picture) may be recorded, and the execution of the login page weak password blasting procedure may be started.
Finally, the weak password blasting process is described in detail with reference to fig. 6.
In the embodiment of the application, a local weak password dictionary (username/password) can be loaded to blast a current Web login page so as to judge whether an identity authentication vulnerability exists. Specifically, as shown in fig. 6, when the authentication code mechanism exists in the current Web login page, a user name, a password, and a tag and a position of the authentication code may be searched for on the Web page through a built-in browser driver; then, traversing and loading the username, password and identifying code recognition result in the weak password dictionary, and filling the results into a text box in sequence; and a POST request can be submitted through a built-in browser, and whether blasting is successful or not is judged according to page response until the dictionary traversal is finished.
In this embodiment, whether blasting is successful or not may be comprehensively determined according to at least one of three information, i.e., the page response status code (i.e., the fourth information), whether the page URL changes or not (i.e., the fifth information), and the reference page similarity (i.e., the sixth information).
In the embodiment of the application, in order to prevent invalid blasting caused by a login frequency limiting mechanism of a website, login blasting can be implemented by adopting a password-first traversal algorithm; and, every M (M may be equal to 5) shots, it may be checked regularly whether there are keywords related to login restriction (i.e. keywords in the third set of keywords) in the page, such as "frequent", "locked", and "retry later", etc.; if the keywords exist, stopping blasting; if no such key exists, blasting continues and monitoring is maintained (i.e., monitoring pages for the presence of keys regarding login restrictions).
In practical applications, compared with fig. 6, for a Web login page without an authentication code mechanism, the blasting process is not different except for the authentication code identification result, and is not described herein again.
The scheme provided by the application embodiment has the following advantages:
1) through a technical architecture based on keyword regular matching + AI TextCNN (namely the first model) fusion, the identification of a background login page (namely the login webpage) can be automatically realized;
2) through a CNN + LSTM + CTC deep neural network model (namely the second model) trained based on mass data, the contents of the variable-length dynamic picture verification codes can be intelligently identified, so that full-coverage vulnerability scanning of login pages containing the verification codes and/or login pages not containing the verification codes can be realized;
3) the position of a user name/password/verification code label can be automatically positioned through a built-in browser driver, so that a weak password dictionary can be loaded for login blasting, and whether an identity authentication vulnerability exists in a target website is determined;
4) through the overall detection process of the AI-based Web identity authentication vulnerability automatic black box scanning method, the functions of background login page search, verification code content identification and weak password blasting can be intelligently realized, and the vulnerability detection capability and scanning efficiency are obviously improved; through the combination of keyword regular matching and login page identification, indefinite-length verification code identification and other multiple AI technologies, the login page can be efficiently and accurately searched (namely identified) and the limitation of a dynamic verification code picture is bypassed, Web identity authentication vulnerability scanning is carried out in an automatic mode, and the performance of a vulnerability scanning tool and the efficiency of later-stage operation are remarkably improved.
In order to implement the method according to the embodiment of the present application, an embodiment of the present application further provides a vulnerability detection apparatus, as shown in fig. 7, the apparatus includes:
an acquisition unit 701 configured to acquire a URL of each web page of a target website;
a first processing unit 702, configured to determine a login webpage of the target website based on a URL of each webpage;
the second processing unit 703 is configured to detect whether the login webpage has the verification code, so as to obtain a first detection result;
a third processing unit 704, configured to detect whether the target website has an identity authentication vulnerability based on the first detection result and a weak password blasting manner for the login webpage.
In an embodiment, the first processing unit 702 is further configured to:
acquiring first information and/or second information of each webpage based on the URL of each webpage; the first information comprises response information returned by the target website when the target website sends an access request of a corresponding webpage; the second information comprises text presented by a corresponding webpage;
and determining the login webpage of the target website according to the first information and/or the second information of each webpage.
In an embodiment, the first processing unit 702 is further configured to:
judging whether first information of the corresponding webpage contains a preset first field or not aiming at each webpage;
and under the condition that the first information of the corresponding webpage does not contain the first field, judging whether the corresponding webpage is a login webpage or not by utilizing the second information of the corresponding webpage.
In an embodiment, the first processing unit 702 is further configured to:
performing regular matching on second information of the corresponding webpage by using keywords in a preset first keyword set to obtain a first matching result; the first matching result represents the matching degree between the second information of the corresponding webpage and the first keyword set;
determining that the corresponding webpage is a login webpage under the condition that the first matching result meets a first condition; determining that the corresponding webpage is not a login webpage under the condition that the first matching result meets a second condition;
under the condition that the first matching result does not meet a first condition and the first matching result does not meet a second condition, determining the text feature vector of the corresponding webpage by using second information of the corresponding webpage; and inputting the text feature vector of the corresponding webpage into a pre-trained first model to judge whether the corresponding webpage is a login webpage or not.
In an embodiment, the second processing unit 703 is further configured to:
acquiring third information of the login webpage, wherein the third information comprises a first label in a source code of the login webpage;
matching the third information with keywords in a preset second keyword set;
under the condition that keywords matched with the third information exist in the second keyword set, the first detection result represents that a verification code exists in the login webpage; and under the condition that the keyword matched with the third information does not exist in the second keyword set, the first detection result represents that the verification code does not exist in the login webpage.
In an embodiment, the verification code comprises a picture verification code; the third processing unit 704 is further configured to:
under the condition that the first detection result represents that the login webpage has the verification code, acquiring a picture corresponding to the verification code;
identifying the verification code by inputting a picture corresponding to the verification code into a pre-trained second model;
blasting the login webpage by using the identified verification code and a preset weak password dictionary;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network has no identity authentication vulnerability.
In an embodiment, the third processing unit 704 is further configured to:
blasting the login webpage by using a preset weak password dictionary under the condition that the first detection result represents that the login webpage does not have the verification code;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network has no identity authentication vulnerability.
In an embodiment, when blasting the login webpage, the third processing unit 704 is further configured to:
determining at least one of fourth information, fifth information, and sixth information; the fourth information comprises response information returned by the target website when blasting the login webpage; the fifth information represents the change condition of the URL of the login webpage when the login webpage is exploded; the sixth information represents the change condition of the text presented by the login webpage when the login webpage is exploded;
and judging whether the blasting is successful or not by using at least one of the fourth information, the fifth information and the sixth information.
In an embodiment, when blasting the login webpage, the third processing unit 704 is further configured to blast the login webpage by preferentially traversing the login password included in the weak password dictionary.
In an embodiment, when blasting the login webpage, the third processing unit 704 is further configured to:
according to a preset period, periodically acquiring second information of the login webpage, and matching the second information with keywords in a preset third keyword set; the second information comprises a text presented by the login webpage; keywords in the third keyword set are related to login limiting conditions;
stopping blasting when the keywords matched with the second information exist in the third keyword set; and under the condition that the keywords matched with the second information do not exist in the third keyword set, continuing blasting.
In practical application, the obtaining unit 701, the first processing unit 702, the second processing unit 703 and the third processing unit 704 may be implemented by a processor in the vulnerability detection apparatus in combination with a communication interface.
It should be noted that: the vulnerability detection apparatus provided in the foregoing embodiment is only exemplified by the division of the program modules when detecting a vulnerability, and in practical applications, the processing allocation may be completed by different program modules as needed, that is, the internal structure of the apparatus is divided into different program modules to complete all or part of the processing described above. In addition, the vulnerability detection device and the vulnerability detection method provided by the embodiments belong to the same concept, and the specific implementation process is described in the method embodiments in detail and is not described herein again.
Based on the hardware implementation of the program module, and in order to implement the method according to the embodiment of the present application, an embodiment of the present application further provides an electronic device, as shown in fig. 8, where the electronic device 800 includes:
a communication interface 801 capable of performing information interaction with other electronic devices;
the processor 802 is connected with the communication interface 801 to realize information interaction with other electronic devices, and is used for executing the method provided by one or more technical schemes when running a computer program;
a memory 803 storing a computer program capable of running on the processor 802.
Specifically, the processor 802 is configured to:
acquiring the URL of each web page of the target website through the communication interface 801;
determining a login webpage of the target website based on the URL of each webpage;
detecting whether the login webpage has a verification code or not to obtain a first detection result;
and detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode aiming at the login webpage.
In an embodiment, the processor 802 is further configured to:
acquiring first information and/or second information of each webpage through the communication interface 801 based on the URL of each webpage; the first information comprises response information returned by the target website when the target website sends an access request of a corresponding webpage; the second information comprises text presented by a corresponding webpage;
and determining the login webpage of the target website according to the first information and/or the second information of each webpage.
In an embodiment, the processor 802 is further configured to:
judging whether first information of the corresponding webpage contains a preset first field or not aiming at each webpage;
and under the condition that the first information of the corresponding webpage does not contain the first field, judging whether the corresponding webpage is a login webpage or not by utilizing the second information of the corresponding webpage.
In an embodiment, the processor 802 is further configured to:
performing regular matching on second information of the corresponding webpage by using keywords in a preset first keyword set to obtain a first matching result; the first matching result represents the matching degree between the second information of the corresponding webpage and the first keyword set;
determining that the corresponding webpage is a login webpage under the condition that the first matching result meets a first condition; determining that the corresponding webpage is not a login webpage under the condition that the first matching result meets a second condition;
under the condition that the first matching result does not meet a first condition and the first matching result does not meet a second condition, determining the text feature vector of the corresponding webpage by using second information of the corresponding webpage; and inputting the text feature vector of the corresponding webpage into a pre-trained first model to judge whether the corresponding webpage is a login webpage or not.
In an embodiment, the processor 802 is further configured to:
acquiring third information of the login webpage through the communication interface 801, wherein the third information comprises a first tag in a source code of the login webpage;
matching the third information with keywords in a preset second keyword set;
under the condition that keywords matched with the third information exist in the second keyword set, the first detection result represents that a verification code exists in the login webpage; and under the condition that the keyword matched with the third information does not exist in the second keyword set, the first detection result represents that the verification code does not exist in the login webpage.
In an embodiment, the verification code comprises a picture verification code; the processor 802 is further configured to:
under the condition that the first detection result represents that the verification code exists in the login webpage, acquiring a picture corresponding to the verification code through the communication interface 801;
identifying the verification code by inputting a picture corresponding to the verification code into a pre-trained second model;
blasting the login webpage by using the identified verification code and a preset weak password dictionary;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network does not have identity authentication loopholes.
In an embodiment, the processor 802 is further configured to:
blasting the login webpage by using a preset weak password dictionary under the condition that the first detection result represents that the login webpage does not have the verification code;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network has no identity authentication vulnerability.
In an embodiment, when blasting the login page, the processor 802 is further configured to:
determining at least one of fourth information, fifth information, and sixth information; the fourth information comprises response information returned by the target website when blasting the login webpage; the fifth information represents the change condition of the URL of the login webpage when the login webpage is exploded; the sixth information represents the change condition of the text presented by the login webpage when the login webpage is exploded;
and judging whether the blasting is successful or not by using at least one of the fourth information, the fifth information and the sixth information.
In an embodiment, when blasting the login webpage, the processor 802 is further configured to blast the login webpage by preferentially traversing the login password included in the weak password dictionary.
In an embodiment, when blasting the login webpage, the processor 802 is further configured to:
according to a preset period, periodically acquiring second information of the login webpage through the communication interface 801, and matching the second information with keywords in a preset third keyword set; the second information comprises a text presented by the login webpage; keywords in the third keyword set are related to login limiting conditions;
stopping blasting when the keywords matched with the second information exist in the third keyword set; and under the condition that the keywords matched with the second information do not exist in the third keyword set, continuing blasting.
It should be noted that: the specific processing procedures of the communication interface 801 and the processor 802 can be understood by referring to the above method, and are not described herein again.
Of course, in practice, the various components in the electronic device 800 are coupled together by a bus system 804. It is understood that the bus system 804 is used to enable communications among the components. The bus system 804 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 804 in FIG. 8.
The memory 803 in the present embodiment is used to store various types of data to support the operation of the electronic device 800. Examples of such data include: any computer program for operating on the electronic device 800.
The method disclosed in the embodiments of the present application can be applied to the processor 802, or implemented by the processor 802. The processor 802 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 802. The Processor 802 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 802 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 803, and the processor 802 reads the information in the memory 803 and performs the steps of the aforementioned methods in conjunction with its hardware.
In an exemplary embodiment, the electronic Device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, Micro Controllers (MCUs), microprocessors (microprocessors), or other electronic components for performing the foregoing methods.
It is to be appreciated that the memory 803 of the subject embodiment can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a magnetic random access Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memories described in the embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.
In an exemplary embodiment, the present application further provides a storage medium, specifically a computer storage medium, for example, a memory 803 storing a computer program, which can be executed by the processor 802 of the electronic device 800 to perform the steps of the foregoing method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
It should be noted that: "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims (10)

1. A vulnerability detection method is characterized by comprising the following steps:
acquiring the URL of each webpage of a target website;
determining a login webpage of the target website based on the URL of each webpage;
detecting whether the login webpage has a verification code or not to obtain a first detection result;
and detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode aiming at the login webpage.
2. The method of claim 1, wherein determining the login page of the target website based on the URL of each web page comprises:
acquiring first information and/or second information of each webpage based on the URL of each webpage; the first information comprises response information returned by the target website when the target website sends an access request of a corresponding webpage; the second information comprises text presented by a corresponding webpage;
and determining the login webpage of the target website according to the first information and/or the second information of each webpage.
3. The method according to claim 2, wherein the determining the login webpage of the target website according to the first information and the second information of each webpage comprises:
judging whether first information of the corresponding webpage contains a preset first field or not aiming at each webpage;
and under the condition that the first information of the corresponding webpage does not contain the first field, judging whether the corresponding webpage is a login webpage or not by utilizing the second information of the corresponding webpage.
4. The method of claim 3, wherein the determining whether the corresponding web page is a login web page by using the second information of the corresponding web page comprises:
performing regular matching on second information of the corresponding webpage by using keywords in a preset first keyword set to obtain a first matching result; the first matching result represents the matching degree between the second information of the corresponding webpage and the first keyword set;
determining that the corresponding webpage is a login webpage under the condition that the first matching result meets a first condition; determining that the corresponding webpage is not a login webpage under the condition that the first matching result meets a second condition;
under the condition that the first matching result does not meet a first condition and the first matching result does not meet a second condition, determining the text feature vector of the corresponding webpage by using second information of the corresponding webpage; and inputting the text feature vector of the corresponding webpage into a pre-trained first model to judge whether the corresponding webpage is a login webpage or not.
5. The method of claim 1, wherein the detecting whether the verification code exists in the login webpage comprises:
acquiring third information of the login webpage, wherein the third information comprises a first label in a source code of the login webpage;
matching the third information with keywords in a preset second keyword set;
under the condition that keywords matched with the third information exist in the second keyword set, the first detection result represents that a verification code exists in the login webpage; and under the condition that the keyword matched with the third information does not exist in the second keyword set, the first detection result represents that the verification code does not exist in the login webpage.
6. The method of claim 1, wherein the authentication code comprises a picture authentication code; the detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and the weak password blasting mode for the login webpage comprises the following steps:
under the condition that the first detection result represents that the login webpage has the verification code, acquiring a picture corresponding to the verification code;
identifying the verification code by inputting a picture corresponding to the verification code into a pre-trained second model;
blasting the login webpage by using the identified verification code and a preset weak password dictionary;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network has no identity authentication vulnerability.
7. The method according to claim 1, wherein the detecting whether the target website has an identity authentication vulnerability based on the first detection result and a weak password blasting manner for the login webpage comprises:
blasting the login webpage by using a preset weak password dictionary under the condition that the first detection result represents that the login webpage does not have the verification code;
under the condition that blasting is successful, the target network has identity authentication loopholes; and under the condition of blasting failure, the target network has no identity authentication vulnerability.
8. A vulnerability detection apparatus, comprising:
an acquisition unit configured to acquire a URL of each web page of a target website;
the first processing unit is used for determining a login webpage of the target website based on the URL of each webpage;
the second processing unit is used for detecting whether the login webpage has the verification code or not to obtain a first detection result;
and the third processing unit is used for detecting whether the target website has an identity authentication vulnerability or not based on the first detection result and a weak password blasting mode aiming at the login webpage.
9. An electronic device, comprising: a processor and a memory for storing a computer program capable of running on the processor,
wherein the processor is adapted to perform the steps of the method of any one of claims 1 to 7 when running the computer program.
10. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, performing the steps of the method of any one of claims 1 to 7.
CN202210660234.3A 2022-06-13 2022-06-13 Vulnerability detection method and device, electronic equipment and storage medium Pending CN114996714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210660234.3A CN114996714A (en) 2022-06-13 2022-06-13 Vulnerability detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210660234.3A CN114996714A (en) 2022-06-13 2022-06-13 Vulnerability detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114996714A true CN114996714A (en) 2022-09-02

Family

ID=83033869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210660234.3A Pending CN114996714A (en) 2022-06-13 2022-06-13 Vulnerability detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114996714A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905767A (en) * 2023-01-07 2023-04-04 珠海金智维信息科技有限公司 Webpage login method and system based on fixed candidate box target detection algorithm

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905767A (en) * 2023-01-07 2023-04-04 珠海金智维信息科技有限公司 Webpage login method and system based on fixed candidate box target detection algorithm

Similar Documents

Publication Publication Date Title
CN107145481B (en) Electronic equipment, storage medium, and method and device for filling webpage form
US11256912B2 (en) Electronic form identification using spatial information
US10873618B1 (en) System and method to dynamically generate a set of API endpoints
Haruta et al. Visual similarity-based phishing detection scheme using image and CSS with target website finder
CN111931935B (en) Network security knowledge extraction method and device based on One-shot learning
CN108768982B (en) Phishing website detection method and device, computing equipment and computer storage medium
CN107862039B (en) Webpage data acquisition method and system and data matching and pushing method
CN111522708B (en) Log recording method, computer equipment and storage medium
RU2652451C2 (en) Methods for anomalous elements detection on web pages
CN116917894A (en) Detecting phishing URLs using a converter
US20220237240A1 (en) Method and apparatus for collecting information regarding dark web
CN114996714A (en) Vulnerability detection method and device, electronic equipment and storage medium
US11379527B2 (en) Sibling search queries
EP3550789A1 (en) Method for protecting web applications by automatically generating application models
JP2022016303A (en) Automated API access using machine learning
CN113449816A (en) Website classification model training method, website classification method, device, equipment and medium
CN113312258A (en) Interface testing method, device, equipment and storage medium
CN117294510A (en) WEB injection attack classification detection method and detection system
KR102483004B1 (en) Method for detecting harmful url
CN116756382A (en) Method, device, setting and storage medium for detecting sensitive character string
CN115238124A (en) Video character retrieval method, device, equipment and storage medium
RU2659741C1 (en) Methods of detecting the anomalous elements of web pages on basis of statistical significance
CN112711574A (en) Database security detection method and device, electronic equipment and medium
CN116248375B (en) Webpage login entity identification method, device, equipment and storage medium
KR102561918B1 (en) Method for machine learning-based harmful web site classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination