CN112084439A - Method, device, equipment and storage medium for identifying variable in URL - Google Patents

Method, device, equipment and storage medium for identifying variable in URL Download PDF

Info

Publication number
CN112084439A
CN112084439A CN202010909457.XA CN202010909457A CN112084439A CN 112084439 A CN112084439 A CN 112084439A CN 202010909457 A CN202010909457 A CN 202010909457A CN 112084439 A CN112084439 A CN 112084439A
Authority
CN
China
Prior art keywords
data
node
path
variable
quantitative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010909457.XA
Other languages
Chinese (zh)
Other versions
CN112084439B (en
Inventor
尚侠
张雪松
罗清篮
陈宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Mule Network Technology Co ltd
Original Assignee
Shanghai Mule Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Mule Network Technology Co ltd filed Critical Shanghai Mule Network Technology Co ltd
Priority to CN202010909457.XA priority Critical patent/CN112084439B/en
Publication of CN112084439A publication Critical patent/CN112084439A/en
Application granted granted Critical
Publication of CN112084439B publication Critical patent/CN112084439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method, a device, equipment and a storage medium for identifying variables in a URL. The method comprises the following steps: acquiring access path data of a website to be identified; preprocessing the access path data to obtain a path relation data set and a level threshold value set; identifying quantitative data and suspected variable data in the access path data according to the path relation data set and the level threshold value set; checking the suspected variable data to obtain variable data; and integrating and outputting the quantitative data and the variable data. By adopting the method, the variable identification can be automatically carried out on the access path data, and the variable identification efficiency is greatly improved.

Description

Method, device, equipment and storage medium for identifying variable in URL
Technical Field
The invention relates to the technical field of website testing and protection, in particular to a method, a device, equipment and a storage medium for identifying variables in a URL (uniform resource locator).
Background
With the popularization of the application of websites, more and more websites are applied to various industries. Before a website is put into use, in order to ensure that the website can normally operate according to an expected plan, a penetration test needs to be performed on the website. Submitting attack codes through parameters is a common means of penetration testing or scanning of websites.
There are typically 3 forms of parameter delivery, the first known as Query String, which is delivered via a URL, such as http:// www.host.com? The section "a ═ 1 ═ b ═ 2" in a ═ 1& b ═ 2 represents that the value of parameter a is 1 and the value of parameter b is 2. Such approaches are common to acquiring data if details of an article are acquired. And the second method is to transmit through a form, and the content filled by the user is assembled by the front end according to the requirement and then is put in the payload part of the request data packet. Such approaches are often used to send data to the back end, such as sending article related content to the back end when creating an article. The third is the "5 f0ea827cf1361002210387 f" part contained in the URL, such as http:// www.host.com/project/5f0ea827cf1361002210387f/tasks, which the back-end uses as a parameter, in this case as an object id. Such approaches are common in RESTful APIs or pseudo-staticizing of routes, which may occur when data is obtained from or sent to the back-end. Other existing approaches are usually manual tagging, and usually treat the entire URL as a complete variable, for example, when a website uses RESTful API or address pseudo-staticizing, sometimes parameters are submitted as part of the access path, and it is not known that part is a parameter exactly as a form. Meanwhile, the prior art usually abandons the analysis of variable parts in addresses or marks them in a way of manually defining configuration when identifying request parameters. Therefore, the existing penetration method will affect the learning method of machine learning and the accuracy of machine learning, and
disclosure of Invention
In view of the foregoing, it is an object of the present invention to overcome the deficiencies of the prior art and to provide a method, an apparatus, a device and a storage medium for identifying a variable in a URL.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method of identifying a variable in a URL, comprising:
acquiring access path data of a website to be identified;
preprocessing the access path data to obtain a path relation data set and a level threshold value set;
identifying quantitative data and suspected variable data in the access path data according to the path relation data set and the level threshold set;
checking the suspected variable data to obtain variable data;
and integrating and outputting the quantitative data and the variable data.
Optionally, the preprocessing the access path data to obtain a path relationship data set and a level threshold set includes:
dividing the access path data according to a preset rule to obtain a plurality of path section nodes;
generating a node data structure according to the path segment node and the path relation; the path relation is obtained by the path section node according to the access path data;
and obtaining the path relation data set and the level threshold value set according to the node data structure.
Optionally, the path relation data set includes: a child node number, a referenced number, and a sibling node number;
the set of hierarchical thresholds comprises: a child node number threshold, a referenced number threshold, and a back-referenced coefficient threshold;
the obtaining the path relation data set and the level threshold value set according to the node data structure includes:
counting the number of child nodes, the number of father nodes and the number of brother nodes of each path section node;
calculating the average value of the number of the quoted nodes and the number of the quoted nodes of the path section according to the number of the father nodes;
calculating the sub-node number weighting coefficient of each level in the node data structure according to the sub-node number;
calculating the sub-node number threshold according to the sub-node number weighting coefficient;
calculating the weighted coefficient of the number to be quoted of each level node according to the average value of the number to be quoted;
calculating the referenced number threshold according to the referenced number weighting coefficient;
calculating a reverse reference coefficient according to the parent node number and the brother node number;
calculating the backward reference coefficient mean value according to the backward reference coefficient;
calculating the back reference coefficient threshold using the back reference coefficient mean.
Optionally, the identifying quantitative data and suspected variable data in the access path data according to the path relation data set and the level threshold set includes:
judging whether the number of the child nodes of any path segment node is larger than the threshold value of the number of the child nodes;
if yes, the path section node is judged to be quantitative data;
otherwise, judging whether the number of the quoted objects is larger than the threshold value of the number of the quoted objects;
if yes, the path section node is judged to be quantitative data;
otherwise, judging whether the number of the quoted nodes is larger than the number of the brother nodes;
if yes, the path section node is judged to be quantitative data;
otherwise, judging whether the reverse reference coefficient threshold value is divided by the child node number threshold value or not;
if yes, the path section node is judged to be quantitative data;
otherwise, the path section node is judged to be the suspected variable data.
Optionally, the method further includes:
and marking the suspected variable data by using a preset wildcard character, and generating a tree-shaped path structure by combining the quantitative data.
Optionally, the verifying the suspected variable data to obtain variable data includes:
traversing lower nodes of the suspected variable data and lower nodes of the quantitative data in the tree path structure;
judging whether quantitative nodes meeting a first preset condition exist in subordinate nodes of the quantitative data or not; the quantitative node is the same as a lower node of the suspected variable data;
if yes, judging the suspected variable data to be quantitative data;
otherwise, judging whether the suspected variable data has a brother node meeting a second preset condition; the brother node is a terminal node, and the brother node is quantitative data;
if yes, judging the suspected variable data to be quantitative data;
otherwise, judging the suspected variable data to be variable data.
Optionally, the preset rule is: with "/" as the segmentation point.
An apparatus for identifying variables in a URL, comprising:
the access path acquisition module is used for acquiring access path data of the website to be identified;
the preprocessing module is used for preprocessing the access path data to obtain a path relation data set and a level threshold set;
the quantitative identification module is used for identifying quantitative data and suspected variable data in the access path data according to the path relation data set and the level threshold value set;
the suspected variable checking module is used for checking the suspected variable data to obtain variable data;
and the result integration output module is used for integrating and outputting the quantitative data and the variable data.
An apparatus for identifying variables in a URL, comprising:
a processor, and a memory coupled to the processor;
said memory being adapted to store a computer program adapted to at least perform said method of identifying a variable in a URL;
the processor is used for calling and executing the computer program in the memory.
A storage medium storing a computer program which, when executed by a processor, performs the steps of the method of identifying a variable in a URL as described above.
The technical scheme provided by the application can comprise the following beneficial effects:
the application discloses a method for identifying variables in a URL, which comprises the following steps: acquiring access path data of a website to be identified; preprocessing the access path data to obtain a path relation data set and a level threshold value set; identifying quantitative data and suspected variable data in the access path data according to the path relation data set and the level threshold value set; checking the suspected variable data to obtain variable data; and integrating and outputting the quantitative data and the variable data. According to the method, the access path data of the website is preprocessed, then the quantitative data and the suspected variable data in the path are identified, then the suspected variable data are verified, the final variable data are determined, and then the quantitative data and the variable data are integrated to obtain the final identification result. According to the method, the variable data part in the access path can be automatically analyzed and calculated through the access data, and the variable identification efficiency is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for identifying variables in a URL according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of pre-processing provided by an embodiment of the present invention;
FIG. 3 is a flow chart of a method of quantitative data identification provided by an embodiment of the present invention;
FIG. 4 is a flowchart of a method for checking suspected variables according to an embodiment of the present invention;
FIG. 5 is a block diagram of an apparatus for identifying variables in a URL according to one embodiment of the present invention;
fig. 6 is a diagram illustrating an apparatus for identifying variables in a URL according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
FIG. 1 is a flowchart of a method for identifying variables in a URL according to an embodiment of the present invention. Referring to fig. 1, a method of identifying a variable in a URL, comprising:
step 101: and acquiring access path data of the website to be identified. The access path data in the present application is log data.
Step 102: and preprocessing the access path data to obtain a path relation data set and a level threshold value set. In this step, after the access path data to be analyzed is loaded into the system corresponding to this embodiment, the data is preprocessed in advance, and then data capable of performing subsequent identification operation is obtained.
Step 103: and identifying quantitative data and suspected variable data in the access path data according to the path relation data set and the hierarchy threshold value set.
Step 104: and checking the suspected variable data to obtain variable data.
Step 105: and integrating and outputting the quantitative data and the variable data.
According to the method, the variable data part and the quantitative data part in the path are identified by analyzing the access path data, so that the automatic identification function of the variable part in the network path is realized, the variable identification efficiency is greatly improved, and the efficiency of the penetration test is further improved.
More specifically, on the basis of the above embodiment, the present application further discloses a step 102 of preprocessing the access path data to obtain a path relation data set and a hierarchical threshold set, which includes the following specific steps:
FIG. 2 is a flow chart of a method of preprocessing provided by an embodiment of the present invention. Referring to fig. 2, preprocessing the access path data to obtain a path relation data set and a level threshold set, includes:
step 201: and segmenting the access path data according to a preset rule to obtain a plurality of path segment nodes. In the present application, a path is divided into a plurality of path segments using "/" according to the characteristics of the URL, and the individual path segments are collectively referred to as path segment nodes.
Step 202: generating a node data structure according to the path segment node and the path relation; and the path relation is obtained by the path section node according to the access path data. Generating a node data structure by each path segment node according to the original path relation, wherein when the paths of the original data in the log are respectively/a/b/c,/a/b/e,/a/d/c, the node data structure is as follows:
Figure BDA0002662735810000071
step 203: and obtaining the path relation data set and the level threshold value set according to the node data structure. Wherein the path relation data set comprises: a child node number, a referenced number, and a sibling node number; the set of hierarchical thresholds includes: a child node number threshold, a referenced number threshold, and a back-referenced coefficient threshold.
The specific process of step 203 is as follows: and counting the number of child nodes, the number of father nodes and the number of brother nodes of each path section node. For example: in the above-mentioned node data structure, the child nodes of b have c and e, so the number of child nodes of b is 2. The parent node of the node c has two types, b and d, so the number of the parent node of the node c is 2. The brother node of the e node is c, the number of the brother nodes is 1, and the brother node of the c is e and the number of the brother nodes is 1.
And calculating the average value of the number of the quoted nodes and the number of the quoted nodes of the path segment according to the parent node number. Taking the node data structure as an example, e is referenced 1 time by b, the referenced number is 1, c is referenced 2 times by b and d respectively, the referenced number is 2, the referenced number average value of the child node levels of b is (1+ 2)/2-1.5, and the referenced number average value of the child node levels of d is 2/1-2 at the depth of level 3.
Calculating the sub-node number weighting coefficient of each level in the node data structure according to the sub-node number; calculating the sub-node number threshold according to the sub-node number weighting coefficient; and the sub-node number threshold is obtained according to the sub-node number weighting coefficient and the dispersion degree between the superposed data.
Calculating the weighted coefficient of the number to be quoted of each level node according to the average value of the number to be quoted; calculating the referenced number threshold according to the referenced number weighting coefficient; and the weighted coefficient of the number of the quoted data is calculated according to the number of the quoted data of each path section node of the hierarchy and the dispersion among the data.
Calculating a reverse reference coefficient according to the parent node number and the brother node number; calculating the backward reference coefficient mean value according to the backward reference coefficient; calculating the back reference coefficient threshold using the back reference coefficient mean.
It should be noted that the threshold values mentioned in the above embodiments may be replaced by values obtained by a calculation method for segmenting the corpus, such as a mean value, a median value, a value obtained by (max + min)/2, and the like, and the specific form is not limited. The weighting coefficients can be replaced by values expressing discrete relations among the data of the complete set, such as standard deviation, variance and the like, and the specific expression form is not limited.
In more detail, on the basis of the above embodiment, the present application further discloses a step 103 of identifying an implementation process of quantitative data and suspected variable data in the access path data according to the path relation data set and the hierarchical threshold set, which is specifically as follows:
FIG. 3 is a flow chart of a method for quantitative data identification according to an embodiment of the present invention. Identifying quantitative data and suspected variable data within the access path data according to the path relationship data set and the level threshold set, including:
step 301: judging whether the number of the child nodes of any path segment node is larger than the threshold value of the number of the child nodes;
step 302: if the number of the child nodes is larger than the child node number threshold value, the path section node is judged to be quantitative data;
step 303: judging whether the number of the child nodes is not larger than the threshold value of the number of the child nodes or not, and judging whether the number of the quoted nodes is larger than the threshold value of the number of the quoted nodes or not; if yes, go to step 302;
step 304: judging whether the number of the quoted nodes is larger than the number of the sibling nodes or not; if yes, go to step 302;
step 305: determining whether the number of the referenced nodes is not more than the number of the sibling nodes, and dividing the reverse reference coefficient threshold by the number of the child nodes to determine whether the number of the child nodes is more than the child node threshold; if yes, go to step 302;
step 306: and dividing the back reference coefficient threshold by the sub-node number not greater than the sub-node number threshold, and judging the path segment node as the suspected variable data.
Further, on the basis of the above embodiment, the method further includes: and marking the suspected variable data by using a preset wildcard character, and generating a tree-shaped path structure by combining the quantitative data. And generalizing the variable nodes into "&" according to the judgment result, merging the variable nodes and outputting the merged variable nodes to a data structure, wherein if the judgment b and the judgment d are suspected variables in the example and other judgment results are quantitative, the output tree path structure is as follows:
Figure BDA0002662735810000091
meanwhile, on the basis of the above embodiment, the present application further discloses an implementation process of step 104, which is specifically as follows:
fig. 4 is a flowchart of a method for checking a suspected variable according to an embodiment of the present invention. Referring to fig. 4, verifying the suspected variable data to obtain variable data includes:
step 401: and traversing lower nodes of the suspected variable data and lower nodes of the quantitative data in the tree path structure.
Step 402: judging whether quantitative nodes meeting a first preset condition exist in subordinate nodes of the quantitative data or not; the quantitative node is the same as a lower node of the suspected variable data. The following data structure is taken as an example:
Figure BDA0002662735810000092
the original parent node of the e node is b.
Step 403: and if so, judging the suspected variable data to be quantitative data. F and b, which are regarded as quantitative nodes, have the same child node e, and at this stage, the variable b is restored to quantitative data, and the corrected output result is:
Figure BDA0002662735810000093
step 404: otherwise, judging whether the suspected variable data has a brother node meeting a second preset condition; the sibling node is a terminal node and the sibling node is quantitative data. If yes, go to step 403;
for example, the following cases:
Figure BDA0002662735810000101
after quantitative data identification processing, g is identified as a suspected variable, and the data structure is
Figure BDA0002662735810000102
Since g is an end node, its sibling h is also an end node and is considered quantitative, g is reduced to quantitative data at this stage.
Step 405: otherwise, judging the suspected variable data to be variable data.
In the embodiment, the variable part in the path can be automatically analyzed and calculated according to the log data, compared with manual marking, the workload can be reduced, the website does not need to be known, and the assistance of developers is not needed, the automatic calculation can be automatically updated along with the updating of the analyzed target, and the real-time performance is improved. Meanwhile, the method solves the problem that the variable cannot be effectively identified in the URL, and provides accurate learning characteristics for machine learning. Upon identifying the variables, such as/a/1/c/and/a/2/c can be merged into/a/? And c, combining originally dispersed weights together, and providing help for improving the precision of machine learning.
The embodiment of the invention also provides a device for identifying the variable in the URL. Please see the examples below.
FIG. 5 is a block diagram of an apparatus for identifying variables in a URL according to an embodiment of the present invention. An apparatus for identifying variables in a URL, comprising:
an access path obtaining module 501, configured to obtain access path data of a website to be identified;
a preprocessing module 502, configured to preprocess the access path data to obtain a path relation data set and a level threshold set;
a quantitative identification module 503, configured to identify quantitative data and suspected variable data in the access path data according to the path relationship data set and the level threshold set;
a suspected variable checking module 504, configured to check the suspected variable data to obtain variable data;
and a result integration output module 505, configured to integrate and output the quantitative data and the variable data.
In more detail, the preprocessing module 502 is specifically configured to: dividing the access path data according to a preset rule to obtain a plurality of path section nodes; generating a node data structure according to the path segment node and the path relation; the path relation is obtained by the path section node according to the access path data; and obtaining the path relation data set and the level threshold value set according to the node data structure.
The quantitative identification module 503 is specifically configured to: judging whether the number of the child nodes of any path segment node is larger than the threshold value of the number of the child nodes; if yes, the path section node is judged to be quantitative data; otherwise, judging whether the number of the quoted objects is larger than the threshold value of the number of the quoted objects; if yes, the path section node is judged to be quantitative data; otherwise, judging whether the number of the quoted nodes is larger than the number of the brother nodes; if yes, the path section node is judged to be quantitative data; otherwise, judging whether the reverse reference coefficient threshold value is divided by the child node number threshold value or not; if yes, the path section node is judged to be quantitative data; otherwise, the path section node is judged to be the suspected variable data.
The suspected variable checking module 504 is specifically configured to: traversing lower nodes of the suspected variable data and lower nodes of the quantitative data in the tree path structure; judging whether quantitative nodes meeting a first preset condition exist in subordinate nodes of the quantitative data or not; the quantitative node is the same as a lower node of the suspected variable data; if yes, judging the suspected variable data to be quantitative data; otherwise, judging whether the suspected variable data has a brother node meeting a second preset condition; the brother node is a terminal node, and the brother node is quantitative data; if yes, judging the suspected variable data to be quantitative data; otherwise, judging the suspected variable data to be variable data.
Further, on the basis of the above embodiments, the apparatus in the present application further includes:
and the wildcard character marking module is used for marking the suspected variable data by using a preset wildcard character and generating a tree-shaped path structure by combining the quantitative data.
The variable identification device can be used for automatically identifying the variable in the access path, and the variable identification efficiency is greatly improved. Meanwhile, the identified scattered variables are integrated, and help is provided for improving the precision of machine learning.
In order to more clearly introduce a hardware system implementing the embodiment of the present invention, in correspondence to the method for identifying a variable in a URL provided in the embodiment of the present invention, an embodiment of the present invention further provides a device for identifying a variable in a URL. Please see the examples below.
Fig. 6 is a diagram illustrating an apparatus for identifying variables in a URL according to an embodiment of the present invention. Referring to fig. 6, an apparatus for identifying a variable in a URL, includes:
a processor 601, and a memory 602 connected to the processor 601;
the memory 602 is used for storing a computer program for performing at least the above-mentioned method of identifying a variable in a URL;
the processor 601 is used for calling and executing the computer program in the memory 602.
On the basis of the above embodiment, a storage medium is also disclosed, which stores a computer program that, when executed by a processor, implements the steps of the method for identifying variables in URLs as described above.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A method of identifying a variable in a URL, comprising:
acquiring access path data of a website to be identified;
preprocessing the access path data to obtain a path relation data set and a level threshold value set;
identifying quantitative data and suspected variable data in the access path data according to the path relation data set and the level threshold set;
checking the suspected variable data to obtain variable data;
and integrating and outputting the quantitative data and the variable data.
2. The method of claim 1, wherein preprocessing the access path data to obtain a set of path relationship data and a set of hierarchical thresholds comprises:
dividing the access path data according to a preset rule to obtain a plurality of path section nodes;
generating a node data structure according to the path segment node and the path relation; the path relation is obtained by the path section node according to the access path data;
and obtaining the path relation data set and the level threshold value set according to the node data structure.
3. The method of claim 2, wherein the path relationship dataset comprises: a child node number, a referenced number, and a sibling node number;
the set of hierarchical thresholds comprises: a child node number threshold, a referenced number threshold, and a back-referenced coefficient threshold;
the obtaining the path relation data set and the level threshold value set according to the node data structure includes:
counting the number of child nodes, the number of father nodes and the number of brother nodes of each path section node;
calculating the average value of the number of the quoted nodes and the number of the quoted nodes of the path section according to the number of the father nodes;
calculating the sub-node number weighting coefficient of each level in the node data structure according to the sub-node number;
calculating the sub-node number threshold according to the sub-node number weighting coefficient;
calculating the weighted coefficient of the number to be quoted of each level node according to the average value of the number to be quoted;
calculating the referenced number threshold according to the referenced number weighting coefficient;
calculating a reverse reference coefficient according to the parent node number and the brother node number;
calculating the backward reference coefficient mean value according to the backward reference coefficient;
calculating the back reference coefficient threshold using the back reference coefficient mean.
4. The method of claim 3, wherein identifying quantitative data and plausible variable data within the access path data from the set of path relationship data and the set of hierarchical thresholds comprises:
judging whether the number of the child nodes of any path segment node is larger than the threshold value of the number of the child nodes;
if yes, the path section node is judged to be quantitative data;
otherwise, judging whether the number of the quoted objects is larger than the threshold value of the number of the quoted objects;
if yes, the path section node is judged to be quantitative data;
otherwise, judging whether the number of the quoted nodes is larger than the number of the brother nodes;
if yes, the path section node is judged to be quantitative data;
otherwise, judging whether the reverse reference coefficient threshold value is divided by the child node number threshold value or not;
if yes, the path section node is judged to be quantitative data;
otherwise, the path section node is judged to be the suspected variable data.
5. The method of claim 1, further comprising:
and marking the suspected variable data by using a preset wildcard character, and generating a tree-shaped path structure by combining the quantitative data.
6. The method of claim 5, wherein the verifying the suspected variable data to obtain variable data comprises:
traversing lower nodes of the suspected variable data and lower nodes of the quantitative data in the tree path structure;
judging whether quantitative nodes meeting a first preset condition exist in subordinate nodes of the quantitative data or not; the quantitative node is the same as a lower node of the suspected variable data;
if yes, judging the suspected variable data to be quantitative data;
otherwise, judging whether the suspected variable data has a brother node meeting a second preset condition; the brother node is a terminal node, and the brother node is quantitative data;
if yes, judging the suspected variable data to be quantitative data;
otherwise, judging the suspected variable data to be variable data.
7. The method according to claim 2, wherein the preset rule is: with "/" as the segmentation point.
8. An apparatus for identifying a variable in a URL, comprising:
the access path acquisition module is used for acquiring access path data of the website to be identified;
the preprocessing module is used for preprocessing the access path data to obtain a path relation data set and a level threshold set;
the quantitative identification module is used for identifying quantitative data and suspected variable data in the access path data according to the path relation data set and the level threshold value set;
the suspected variable checking module is used for checking the suspected variable data to obtain variable data;
and the result integration output module is used for integrating and outputting the quantitative data and the variable data.
9. An apparatus for identifying variables in a URL, comprising:
a processor, and a memory coupled to the processor;
the memory for storing a computer program for performing at least the method of identifying a variable in a URL of any one of claims 1-7;
the processor is used for calling and executing the computer program in the memory.
10. A storage medium, characterized in that it stores a computer program which, when executed by a processor, carries out the steps of the method of identifying variables in URLs according to any one of claims 1 to 7.
CN202010909457.XA 2020-09-02 2020-09-02 Method, device, equipment and storage medium for identifying variable in URL Active CN112084439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010909457.XA CN112084439B (en) 2020-09-02 2020-09-02 Method, device, equipment and storage medium for identifying variable in URL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010909457.XA CN112084439B (en) 2020-09-02 2020-09-02 Method, device, equipment and storage medium for identifying variable in URL

Publications (2)

Publication Number Publication Date
CN112084439A true CN112084439A (en) 2020-12-15
CN112084439B CN112084439B (en) 2023-12-19

Family

ID=73731800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010909457.XA Active CN112084439B (en) 2020-09-02 2020-09-02 Method, device, equipment and storage medium for identifying variable in URL

Country Status (1)

Country Link
CN (1) CN112084439B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101088256A (en) * 2004-12-21 2007-12-12 艾利森电话股份有限公司 Arrangement and method relating to flow of packets in communication systems
CN101692639A (en) * 2009-09-15 2010-04-07 西安交通大学 Bad webpage recognition method based on URL
CN102360367A (en) * 2011-09-29 2012-02-22 广州中浩控制技术有限公司 XBRL (Extensible Business Reporting Language) data search method and search engine
WO2012105967A1 (en) * 2011-02-01 2012-08-09 Limelight Networks, Inc. Asset management architecture for content delivery networks
WO2015000342A1 (en) * 2013-07-02 2015-01-08 Tencent Technology (Shenzhen) Company Limited Method and client device for accessing webpage
CN107426132A (en) * 2016-05-23 2017-12-01 腾讯科技(深圳)有限公司 The detection method and device of network attack
CN108304410A (en) * 2017-01-13 2018-07-20 阿里巴巴集团控股有限公司 A kind of detection method, device and the data analysing method of the abnormal access page
CN109391584A (en) * 2017-08-03 2019-02-26 武汉安天信息技术有限责任公司 A kind of recognition methods of doubtful malicious websites and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101088256A (en) * 2004-12-21 2007-12-12 艾利森电话股份有限公司 Arrangement and method relating to flow of packets in communication systems
CN101692639A (en) * 2009-09-15 2010-04-07 西安交通大学 Bad webpage recognition method based on URL
WO2012105967A1 (en) * 2011-02-01 2012-08-09 Limelight Networks, Inc. Asset management architecture for content delivery networks
CN102360367A (en) * 2011-09-29 2012-02-22 广州中浩控制技术有限公司 XBRL (Extensible Business Reporting Language) data search method and search engine
WO2015000342A1 (en) * 2013-07-02 2015-01-08 Tencent Technology (Shenzhen) Company Limited Method and client device for accessing webpage
CN107426132A (en) * 2016-05-23 2017-12-01 腾讯科技(深圳)有限公司 The detection method and device of network attack
CN108304410A (en) * 2017-01-13 2018-07-20 阿里巴巴集团控股有限公司 A kind of detection method, device and the data analysing method of the abnormal access page
CN109391584A (en) * 2017-08-03 2019-02-26 武汉安天信息技术有限责任公司 A kind of recognition methods of doubtful malicious websites and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡忠义 等: "融合多源网络评估数据及URL特征的钓鱼网站识别技术研究", 数据分析与知识发现 *

Also Published As

Publication number Publication date
CN112084439B (en) 2023-12-19

Similar Documents

Publication Publication Date Title
CN106796585B (en) Conditional validation rules
US7926026B2 (en) Graphical analysis to detect process object anomalies
EP1677213A2 (en) Data object association based on graph theory techniques
CN108897842A (en) Computer readable storage medium and computer system
TW201710993A (en) Method, apparatus and system for detecting fraudulent software promotion
CN114363019B (en) Training method, device, equipment and storage medium for phishing website detection model
CN110008462B (en) Command sequence detection method and command sequence processing method
CN111539206B (en) Method, device, equipment and storage medium for determining sensitive information
CN112559538A (en) Incidence relation generation method and device, computer equipment and storage medium
CN112839014A (en) Method, system, device and medium for establishing model for identifying abnormal visitor
CN116881156A (en) Automatic test method, device, equipment and storage medium
CN111092769A (en) Web fingerprint identification method based on machine learning
CN111737371B (en) Data flow detection classification method and device capable of dynamically predicting
CN111353860A (en) Product information pushing method and system
CN112084439A (en) Method, device, equipment and storage medium for identifying variable in URL
CN109710651B (en) Data type identification method and device
CN110598115A (en) Sensitive webpage identification method and system based on artificial intelligence multi-engine
CN116342164A (en) Target user group positioning method and device, electronic equipment and storage medium
CN115858345A (en) Application service module verification method and device, electronic equipment and storage medium
KR102217092B1 (en) Method and apparatus for providing quality information of application
CN114579398A (en) Log storage method, device, equipment and storage medium
CN109558418B (en) Method for automatically identifying information
CN113032251A (en) Method, device and storage medium for determining service quality of application program
CN112527622A (en) Performance test result analysis method and device
CN116522036B (en) Environment variable construction method based on multi-party information summarized at webpage end

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method, device, device, and storage medium for identifying variables in URLs

Granted publication date: 20231219

Pledgee: Shanghai Rural Commercial Bank Co.,Ltd. Songjiang sub branch

Pledgor: Shanghai Mule Network Technology Co.,Ltd.

Registration number: Y2024310000373