CN110968875B - Method and device for detecting permission vulnerability of webpage - Google Patents

Method and device for detecting permission vulnerability of webpage Download PDF

Info

Publication number
CN110968875B
CN110968875B CN201911220029.XA CN201911220029A CN110968875B CN 110968875 B CN110968875 B CN 110968875B CN 201911220029 A CN201911220029 A CN 201911220029A CN 110968875 B CN110968875 B CN 110968875B
Authority
CN
China
Prior art keywords
webpage
classification
management function
target
web page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911220029.XA
Other languages
Chinese (zh)
Other versions
CN110968875A (en
Inventor
徐文浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201911220029.XA priority Critical patent/CN110968875B/en
Publication of CN110968875A publication Critical patent/CN110968875A/en
Application granted granted Critical
Publication of CN110968875B publication Critical patent/CN110968875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and a device for detecting authority vulnerability of a webpage are disclosed. And performing authority vulnerability detection on the webpage by using a text classification model and/or an image classification model which are constructed by adopting a supervised learning method. The text classification model is used for classifying whether the webpage should be configured with the authority management function from the perspective of webpage text, and the picture classification model is used for classifying whether the webpage should be configured with the authority management function from the perspective of webpage vision. The web page to which the rights management function should be deployed is typically a web page containing sensitive information, such as information relating to the privacy of the enterprise or user.

Description

Method and device for detecting permission vulnerability of webpage
Technical Field
The embodiment of the specification relates to the technical field of information, in particular to a method and a device for detecting permission vulnerabilities of a webpage.
Background
Currently, in order to provide business services to external users, internet enterprises usually expose a batch of web pages to an external network for the external users to access.
Some web pages contain sensitive information (e.g., information related to enterprise or user privacy) and thus require rights management functions to be configured before being exposed to an external network. And for the service web page which is not configured but is configured with the authority management function, the web page is the web page with the authority vulnerability.
However, in practice, the number of web pages that an internet enterprise needs to expose to an external network is huge, and it is not practical to manually detect the permission vulnerability of each web page that needs to be exposed to the external network.
Disclosure of Invention
In order to solve the problem that the cost for manually detecting the permission vulnerabilities of a large number of webpages is too high, embodiments of the present specification provide a method and an apparatus for detecting the permission vulnerabilities of webpages, and the technical scheme is as follows:
according to the 1 st aspect of the embodiments of the present specification, there is provided a method for performing permission vulnerability detection on a web page, including:
acquiring a target webpage to be detected;
performing first classification and/or second classification on the target webpage;
judging whether the target webpage has an authority vulnerability or not according to the classification result of the first classification and/or the classification result of the second classification;
performing a first classification for the target web page, comprising:
acquiring webpage text characteristics of the target webpage;
inputting the webpage text features into a text classification model, and outputting a classification result for representing whether the target webpage should be configured with a permission management function;
performing a second classification on the target web page, including:
converting the rendered target webpage into a picture, and acquiring webpage picture characteristics;
and inputting the webpage picture characteristics into a picture classification model, and outputting a classification result for representing whether the target webpage is configured with the authority management function.
According to the 2 nd aspect of the embodiments of the present specification, there is provided an apparatus for detecting a permission vulnerability of a web page, including:
the acquisition module acquires a target webpage to be detected;
the classification module comprises a first classification submodule and a second classification submodule;
the judging module is used for judging whether the target webpage has the permission vulnerability or not according to the classification result of the first classification and/or the classification result of the second classification;
the first classification submodule acquires webpage text characteristics of the target webpage; inputting the webpage text features into a text classification model, and outputting a classification result for representing whether the target webpage should be configured with a permission management function;
the second classification submodule converts the rendered target webpage into a picture and acquires webpage picture characteristics; and inputting the webpage picture characteristics into a picture classification model, and outputting a classification result for representing whether the target webpage is configured with the authority management function.
According to the technical scheme provided by the embodiment of the specification, the webpage is subjected to permission vulnerability detection by using a text classification model and/or an image classification model. The text classification model is used for classifying whether the webpage should be configured with the authority management function from the perspective of webpage text, and the picture classification model is used for classifying whether the webpage should be configured with the authority management function from the perspective of webpage vision. Therefore, the two classification models can be flexibly adopted to detect the permission vulnerability of the webpage according to the specific requirements on the accuracy and the recall rate in engineering practice.
Through this specification embodiment, can replace the manual work to carry out the authority leak detection of webpage, save cost, the accuracy is also higher.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the invention.
In addition, any one of the embodiments in the present specification is not required to achieve all of the effects described above.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic flowchart of a method for detecting a vulnerability of a web page according to an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a method for training a text classification model according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an apparatus for detecting a vulnerability of a web page according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an apparatus for configuring a method according to an embodiment of the present disclosure.
Detailed Description
The web page with the permission hole is an external web page which is provided with the permission management function because the web page contains sensitive information but is not provided with the permission management function actually. An outbound web page generally refers to a web page provided by an internet enterprise to an external user.
It should be noted that the rights management function generally refers to a function for managing rights of users who access web page content. For example, one form of configuring the right management function for the web page is that before accessing the content of a certain web page, the user needs to input an account and a password, and after the account and the password are verified, the user can access the content of the web page.
In the embodiment of the description, the permission vulnerability detection can be performed on all the external webpages of the internet enterprises which are not actually configured with the permission management function, and the external webpages with the permission vulnerability are detected.
The method can also detect the permission holes of all external webpages of the internet enterprises (including the webpages actually configured with the permission management function and the webpages not actually configured with the permission management function), so that the external webpages with the permission holes can be detected, and the webpages which are not necessarily configured with the permission management function but are actually configured with the permission management function can also be detected by the way.
In one or more embodiments of the present disclosure, the web page may be subjected to permission leak detection by using a pre-trained text classification model, or may be subjected to permission leak detection by using a pre-trained image classification model, or may be subjected to permission leak detection by using the text classification model and the image classification model in combination.
For the text classification model, a supervised learning mode can be adopted to obtain a batch of webpage samples marked with whether the authority management function is to be configured, the webpage text characteristics of the webpage samples are taken as model input, the marks of the webpage samples are taken as model output, and model training is carried out.
For the image classification model, a supervised learning mode can be adopted to obtain a batch of webpage samples marked with whether the authority management function should be configured, webpage image features (namely visual features of rendered webpages) of the webpage samples are used as model input, and marks of the webpage samples are used as model output to perform model training.
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present specification, the technical solutions in the embodiments of the present specification will be described in detail below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of protection.
The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a method for detecting a vulnerability of a web page according to an embodiment of the present specification, including the following steps:
s100: and acquiring a target webpage to be detected.
For convenience of description, the example of detecting whether the target webpage has the permission vulnerability is described herein. The target webpage can be any external webpage of the internet enterprise or any external webpage which is not actually configured with the authority management function, and can also be a specific external webpage of the internet enterprise.
S102: and performing first classification and/or second classification on the target webpage.
In this specification, the first classification operation may refer to classifying the target web page using a text classification model; the second classification operation may refer to classifying the target web page using a picture classification model.
S104: and judging whether the target webpage has the permission vulnerability or not according to the classification result of the first classification and/or the classification result of the second classification.
In the embodiment of the specification, if a specified condition is met, determining that the target webpage has an authority vulnerability; and if the specified condition is not met, determining that the target webpage has no permission vulnerability.
There are several situations where specified conditions need to be addressed here.
1. The target web page is a web page that is not actually configured with rights management functionality.
1.1, if the detection of whether the web page has the authority vulnerability is more focused on accuracy, and the web page which should not be configured with the authority management function is prevented from being regarded as the web page which should be configured with the authority management function, the specified conditions can be set as: and the classification result of the first classification and the classification result of the second classification both represent that the target webpage is configured with an authority management function.
1.2, if the detection of whether the web page has the permission vulnerability is more focused on the comprehensiveness, the web page possibly needing the permission management function is detected as much as possible, and a higher recall rate is ensured, the specified conditions can be set as: at least one of the classification result of the first classification and the classification result of the second classification represents that the target webpage should be configured with a right management function.
2. The target web page may be a web page not actually configured with the rights management function, or may be a web page actually configured with the rights management function.
2.1 if the detection of whether the web page has the permission vulnerability is more focused on accuracy, and the web page which should not be configured with the permission management function is prevented from being regarded as the web page which should be configured with the permission management function, the specified conditions can be set as: the classification result of the first classification and the classification result of the second classification both represent that the target webpage is configured with the authority management function, and the target webpage is not configured with the authority management function actually.
2.2, if the detection of whether the web page has the permission vulnerability is more focused on the comprehensiveness, the web page possibly needing the permission management function is detected as much as possible, and a higher recall rate is ensured, the specified conditions can be set as: at least one of the classification result of the first classification and the classification result of the second classification represents that the target webpage should be configured with the authority management function, and the target webpage is not configured with the authority management function actually.
In practice, the classification result of the first classification may be written as 0 or 1, or the classification result of the second classification may be written as 0 or 1. Taking the first classification as an example, if the classification result of the first classification is 1, it represents that the target webpage should be configured with the authority management function, and if the classification result of the first classification is 0, the target webpage should not be configured with the authority management function.
By the method shown in fig. 1, the web page is subjected to permission vulnerability detection by using a text classification model and/or an image classification model. The text classification model is used for classifying whether the webpage should be configured with the authority management function from the perspective of webpage text, and the picture classification model is used for classifying whether the webpage should be configured with the authority management function from the perspective of webpage vision. Therefore, the two classification models can be flexibly adopted to detect the permission vulnerability of the webpage according to the specific requirements on the accuracy and the recall rate in engineering practice.
Through this specification embodiment, can replace the manual work to carry out the authority leak detection of webpage, save the cost. Moreover, because the interference of subjective factors in manual detection can be eliminated, the detection accuracy is also improved.
In addition, on the premise that the target web page is not actually configured with the authority management function, for the specific implementation of S102 to S104, there may be the following cases:
1. firstly, carrying out first classification on the target webpage, and if the classification result of the first classification represents that the target webpage is to be configured with an authority management function, detecting the target webpage as a webpage with authority loopholes; if the classification result of the first classification indicates that the target webpage should not be configured with the authority management function, continuing to perform second classification on the target webpage, and if the classification result of the second classification indicates that the target webpage should be configured with the authority management function, detecting the target webpage as a webpage with authority vulnerability; and if the classification result of the second classification represents that the target webpage should not be configured with the authority management function, detecting the target webpage as a webpage without authority vulnerability.
It is worth emphasizing here that in case 1, the step S102 and the step S104 are not strictly executed in a sequential order. In fact, the first classification in step S102 may be performed first, then the determination according to the classification result of the first classification in step S104 is performed, if the classification result of the first classification indicates that the target web page should not be configured with the right management function, then the second classification in step S102 is performed, and finally the determination is continued according to the classification result of the second classification.
2. Based on case 1, the execution order of the first classification and the second classification may be transposed.
3. And performing first classification and second classification on the target webpage, detecting the target webpage as having permission holes if the classification result of the first classification and the classification result of the second classification both represent that the target webpage should be configured with the permission management function, and detecting the target webpage as not having the permission holes if any one of the classification result of the first classification and the classification result of the second classification represents that the target webpage should not be configured with the permission management function.
In addition, an embodiment of the present specification further provides a method for training a text classification model, as shown in fig. 2, including the following steps:
s200: a sample set of web pages is obtained.
In practical applications, the number of known web pages to which the authority management function should be configured is not large enough, and therefore, in order to train a qualified text classification model, the number of web page samples to which the authority management function should be configured needs to be expanded.
Specifically, the web page sample set may be obtained as follows:
acquiring a plurality of external web pages to which the authority management function is to be configured and a plurality of internal web pages to which the authority management function is to be configured as web page samples to which the authority management function is to be configured; and acquiring a plurality of external web pages which should not be configured with the authority management function as web page samples which should not be configured with the authority management function. The internal web page generally refers to a web page of an internet enterprise that is not exposed to an external network.
The webpage sample is marked with two possibilities, one is that the webpage sample is characterized that the right management function should be configured, and the other is that the webpage sample is not configured with the right management function. The marking of the web page sample may be made empirically by a business expert.
S202: and extracting the title and the text from the webpage codes of the webpage samples aiming at each webpage sample in the webpage sample set.
S204: and performing word segmentation operation on the title and the text extracted from the webpage code of the webpage sample, and mapping each obtained word segmentation into a word vector.
When the extracted title and text are subjected to word segmentation operation, words such as company names, department names, common website general identifiers and the like which are irrelevant to whether the webpage is provided with the authority management function (whether sensitive information is involved) can be filtered. In addition, error reporting codes (such as 404, 500 and the like) in webpage codes can be reserved and combined, and English words in the webpage codes can be reserved, and the data is often associated with whether the webpage should be provided with the authority management function or not.
S206: determining at least one key tag type appearing in the webpage codes of the webpage sample, and determining the ratio of the number of tags of the key tag type in the webpage codes of the webpage sample to the number of all tags in the webpage codes of the webpage sample aiming at each key tag type appearing.
S208: and constructing the webpage text characteristics of the webpage sample according to each word vector corresponding to the webpage sample and the ratio corresponding to each key label type appearing in the webpage code of the webpage sample.
S210: and (3) taking the webpage text characteristics of each webpage sample as model input, taking whether each webpage sample should be configured with the authority management function as model output, and training to obtain a text classification model.
It should be noted that the key tag type is a tag type that can clearly distinguish between a web page to which the right management function should be configured and a web page to which the right management function should not be configured. That is, the probability that the web page should be configured with the authority management function is positively correlated with the probability that the key tag type appears in the web page code of the web page.
In practical application, some label types can be selected as key label types according to business experience. The key tag type may also be determined as follows:
acquiring M webpage samples with which the authority management function is to be configured and N webpage samples with which the authority management function is not to be configured; counting the number M of the webpage samples with the label type in the M webpage samples aiming at each label type appearing in the webpage codes of the webpage samples*And the number N of the label types appearing in the N webpage samples*(ii) a Calculating a difference table corresponding to the label typeA characteristic value; wherein, if M*Division of M and N*The larger the difference between N is, the larger the difference characterization value corresponding to the label type is; and taking the first L label types as key label types according to the descending order of the difference representation values respectively corresponding to the label types.
It is further noted that it is well known that for supervised machine learning algorithms, the model inputs used during the model training phase should be of the same data type as the model inputs used during the model prediction phase. That is, in the method shown in fig. 1, the above-mentioned method for constructing the web page text features of the web page sample may be adopted to construct the web page text features of the target web page, and the constructed web page text features may be input into the text classification model for classification.
Fig. 3 is a schematic structural diagram of an apparatus for detecting a vulnerability of a web page provided in an embodiment of the present specification, including:
the acquisition module 301 acquires a target webpage to be detected;
a classification module 302 comprising a first classification submodule and a second classification submodule;
the judging module 303 is configured to judge whether the target webpage has a permission vulnerability according to the classification result of the first classification and/or the classification result of the second classification;
the first classification submodule 3021 acquires a web page text feature of the target web page; inputting the webpage text features into a text classification model, and outputting a classification result for representing whether the target webpage should be configured with a permission management function;
the second classification submodule 3022 converts the rendered target web page into a picture, and obtains a web page picture feature; and inputting the webpage picture characteristics into a picture classification model, and outputting a classification result for representing whether the target webpage is configured with the authority management function.
The first classification submodule 3021 extracts a text from the web page code of the target web page, performs a word segmentation operation on the text extracted from the web page code of the target web page, and maps each obtained word segmentation to a word vector; determining at least one key label type appearing in the webpage code of the target webpage, and determining the ratio of the number of labels of the key label type in the webpage code of the target webpage to the number of all labels in the webpage code of the target webpage aiming at each key label type appearing; constructing webpage text characteristics of the target webpage according to each word vector corresponding to the target webpage and a ratio corresponding to each key tag type appearing in a webpage code of the target webpage;
the probability that the right management function is configured on the webpage is positively correlated with the probability that the key label type appears in the webpage code of the webpage.
The key tag type is determined as follows:
acquiring M webpage samples with which the authority management function is to be configured and N webpage samples with which the authority management function is not to be configured;
counting the number M of the webpage samples with the label type in the M webpage samples aiming at each label type appearing in the webpage codes of the webpage samples*And the number N of the label types appearing in the N webpage samples*
Calculating a difference characterization value corresponding to the label type; wherein, if M*Division of M and N*The larger the difference between N is, the larger the difference characterization value corresponding to the label type is;
and taking the first L label types as key label types according to the descending order of the difference representation values respectively corresponding to the label types.
The webpage sample set used for training the text classification model and/or the image classification model is obtained by the following method:
acquiring a plurality of external web pages to which the authority management function is to be configured and a plurality of internal web pages to which the authority management function is to be configured as web page samples to which the authority management function is to be configured;
and acquiring a plurality of external web pages which should not be configured with the authority management function as web page samples which should not be configured with the authority management function.
The judging module 303 determines that the target webpage has a permission vulnerability if a specified condition is met; if the specified condition is not met, determining that the target webpage has no permission vulnerability;
wherein the specified conditions are as follows: at least one of the classification result of the first classification submodule and the classification result of the second classification submodule represents that the target webpage is configured with the authority management function, and the target webpage is not configured with the authority management function actually.
Embodiments of the present specification also provide a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method shown in fig. 1 when executing the program.
Fig. 4 is a schematic diagram illustrating a more specific hardware structure of a computing device according to an embodiment of the present disclosure, where the computing device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Embodiments of the present description also provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the method shown in fig. 4.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a service device, or a network device) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The systems, methods, modules or units described in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the method embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to the partial description of the method embodiment for relevant points. The above-described method embodiments are merely illustrative, wherein the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present specification. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.

Claims (11)

1. A method for detecting authority vulnerability of a webpage comprises the following steps:
acquiring a target webpage to be detected;
performing first classification and/or second classification on the target webpage;
judging whether the target webpage has an authority vulnerability or not according to the classification result of the first classification and/or the classification result of the second classification;
performing a first classification for the target web page, comprising:
acquiring webpage text characteristics of the target webpage;
inputting the webpage text features into a text classification model, and outputting a classification result for representing whether the target webpage should be configured with a permission management function;
performing a second classification on the target web page, including:
converting the rendered target webpage into a picture, and acquiring webpage picture characteristics;
and inputting the webpage picture characteristics into a picture classification model, and outputting a classification result for representing whether the target webpage is configured with the authority management function.
2. The method of claim 1, wherein the obtaining of the web page text features of the target web page specifically comprises:
extracting a title and a text from the webpage code of the target webpage, performing word segmentation operation on the title and the text extracted from the webpage code of the target webpage, and mapping each obtained word segmentation into a word vector;
determining at least one key label type appearing in the webpage code of the target webpage, and determining the ratio of the number of labels of the key label type in the webpage code of the target webpage to the number of all labels in the webpage code of the target webpage aiming at each key label type appearing;
constructing webpage text characteristics of the target webpage according to each word vector corresponding to the target webpage and a ratio corresponding to each key tag type appearing in a webpage code of the target webpage;
the probability that the right management function is configured on the webpage is positively correlated with the probability that the key label type appears in the webpage code of the webpage.
3. The method of claim 2, the key tag type is determined by:
acquiring M webpage samples with which the authority management function is to be configured and N webpage samples with which the authority management function is not to be configured;
counting the number M of the webpage samples with the label type in the M webpage samples aiming at each label type appearing in the webpage codes of the webpage samples*And the number N of the label types appearing in the N webpage samples*
Calculating a difference characterization value corresponding to the label type; wherein, if M*Division of M and N*The larger the difference between N is, the larger the difference characterization value corresponding to the label type is;
and taking the first L label types as key label types according to the descending order of the difference representation values respectively corresponding to the label types.
4. A method according to any one of claims 1 to 3, wherein the sample set of web pages for training the text classification model and/or the picture classification model is obtained by:
acquiring a plurality of external web pages to which the authority management function is to be configured and a plurality of internal web pages to which the authority management function is to be configured as web page samples to which the authority management function is to be configured;
and acquiring a plurality of external web pages which should not be configured with the authority management function as web page samples which should not be configured with the authority management function.
5. The method according to claim 1, wherein the step of judging whether the target webpage has the permission vulnerability according to the classification result of the first classification and/or the classification result of the second classification specifically comprises:
if the specified conditions are met, determining that the target webpage has an authority vulnerability;
if the specified condition is not met, determining that the target webpage has no permission vulnerability;
wherein the specified conditions are as follows: at least one of the classification result of the first classification and the classification result of the second classification represents that the target webpage should be configured with the authority management function, and the target webpage is not configured with the authority management function actually.
6. An apparatus for performing permission vulnerability detection on a web page, comprising:
the acquisition module acquires a target webpage to be detected;
the classification module comprises a first classification submodule and a second classification submodule;
the judging module is used for judging whether the target webpage has the permission vulnerability or not according to the classification result of the first classification and/or the classification result of the second classification;
the first classification submodule acquires webpage text characteristics of the target webpage; inputting the webpage text features into a text classification model, and outputting a classification result for representing whether the target webpage should be configured with a permission management function;
the second classification submodule converts the rendered target webpage into a picture and acquires webpage picture characteristics; and inputting the webpage picture characteristics into a picture classification model, and outputting a classification result for representing whether the target webpage is configured with the authority management function.
7. The apparatus of claim 6, wherein the first classification sub-module extracts text from the web page code of the target web page, performs a word segmentation operation on the text extracted from the web page code of the target web page, and maps each obtained word segmentation to a word vector; determining at least one key label type appearing in the webpage code of the target webpage, and determining the ratio of the number of labels of the key label type in the webpage code of the target webpage to the number of all labels in the webpage code of the target webpage aiming at each key label type appearing; constructing webpage text characteristics of the target webpage according to each word vector corresponding to the target webpage and a ratio corresponding to each key tag type appearing in a webpage code of the target webpage;
the probability that the right management function is configured on the webpage is positively correlated with the probability that the key label type appears in the webpage code of the webpage.
8. The apparatus of claim 7, the key tag type is determined by:
acquiring M webpage samples with which the authority management function is to be configured and N webpage samples with which the authority management function is not to be configured;
counting the number M of the webpage samples with the label type in the M webpage samples aiming at each label type appearing in the webpage codes of the webpage samples*And the number N of the label types appearing in the N webpage samples*
Calculating a difference characterization value corresponding to the label type; wherein, if M*Division of M and N*The greater the difference between N, the tag typeThe larger the corresponding difference characterization value;
and taking the first L label types as key label types according to the descending order of the difference representation values respectively corresponding to the label types.
9. The apparatus according to any one of claims 6 to 8, wherein the web page sample set for training the text classification model and/or the picture classification model is obtained by:
acquiring a plurality of external web pages to which the authority management function is to be configured and a plurality of internal web pages to which the authority management function is to be configured as web page samples to which the authority management function is to be configured;
and acquiring a plurality of external web pages which should not be configured with the authority management function as web page samples which should not be configured with the authority management function.
10. The device of claim 6, wherein the determining module determines that the target webpage has a permission vulnerability if a specified condition is met; if the specified condition is not met, determining that the target webpage has no permission vulnerability;
wherein the specified conditions are as follows: at least one of the classification result of the first classification submodule and the classification result of the second classification submodule represents that the target webpage is configured with the authority management function, and the target webpage is not configured with the authority management function actually.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any one of claims 1 to 5.
CN201911220029.XA 2019-12-03 2019-12-03 Method and device for detecting permission vulnerability of webpage Active CN110968875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911220029.XA CN110968875B (en) 2019-12-03 2019-12-03 Method and device for detecting permission vulnerability of webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911220029.XA CN110968875B (en) 2019-12-03 2019-12-03 Method and device for detecting permission vulnerability of webpage

Publications (2)

Publication Number Publication Date
CN110968875A CN110968875A (en) 2020-04-07
CN110968875B true CN110968875B (en) 2022-01-28

Family

ID=70032646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911220029.XA Active CN110968875B (en) 2019-12-03 2019-12-03 Method and device for detecting permission vulnerability of webpage

Country Status (1)

Country Link
CN (1) CN110968875B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768931A (en) * 2018-04-09 2018-11-06 卓望数码技术(深圳)有限公司 A kind of multimedia file tampering detection System and method for
CN109033838A (en) * 2018-07-27 2018-12-18 平安科技(深圳)有限公司 Website security detection method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609507A (en) * 2012-02-03 2012-07-25 浙江工业大学 Data visualization system based on Web
CN103605926A (en) * 2013-11-29 2014-02-26 北京奇虎科技有限公司 Webpage tampering detecting method and device
CN103605925A (en) * 2013-11-29 2014-02-26 北京奇虎科技有限公司 Webpage tampering detecting method and device
US9547854B2 (en) * 2014-12-02 2017-01-17 Paypal, Inc. User-friendly transaction interface
CN106257886B (en) * 2015-06-17 2020-06-23 腾讯科技(深圳)有限公司 Information processing method and device, terminal and server
CN107547555B (en) * 2017-09-11 2021-04-16 北京匠数科技有限公司 Website security monitoring method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768931A (en) * 2018-04-09 2018-11-06 卓望数码技术(深圳)有限公司 A kind of multimedia file tampering detection System and method for
CN109033838A (en) * 2018-07-27 2018-12-18 平安科技(深圳)有限公司 Website security detection method and device

Also Published As

Publication number Publication date
CN110968875A (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN109145238B (en) Card display method and device and mobile device
CN109214421B (en) Model training method and device and computer equipment
CN108734304B (en) Training method and device of data model and computer equipment
CN111291374B (en) Application program detection method, device and equipment
CN104268472B (en) Reduction is by the method and apparatus of third party's dynamic base Modification growth function address
US20210142121A1 (en) Image classification masking
CN111142851A (en) Abnormal request processing method and device, electronic equipment and storage medium
CN111428162A (en) Page screenshot method and device
CN113076961B (en) Image feature library updating method, image detection method and device
CN111460448B (en) Malicious software family detection method and device
CN110968875B (en) Method and device for detecting permission vulnerability of webpage
CN110020264B (en) Method and device for determining invalid hyperlinks
CN113220949B (en) Construction method and device of private data identification system
CN110880023A (en) Method and device for detecting certificate picture
CN105739717A (en) Information input method and device
CN115544982A (en) Document access method, device, equipment, medium and program product
CN115168575A (en) Subject supplement method applied to audit field and related equipment
CN111125605B (en) Page element acquisition method and device
CN109656805B (en) Method and device for generating code link for business analysis and business server
CN114329495A (en) Endogenous security based asset vulnerability static analysis method and device
CN108734149B (en) Text data scanning method and device
CN109190352B (en) Method and device for verifying accuracy of authorization text
CN110032624B (en) Sample screening method and device
CN107608947B (en) HTML file processing method and device and electronic equipment
JP2017045106A (en) Information processing device and information processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant