CN112966194A

CN112966194A - Method and system for checking two-dimensional code

Info

Publication number: CN112966194A
Application number: CN202110200750.3A
Authority: CN
Inventors: 骆信智; 范渊; 杨勃
Original assignee: Hangzhou Dbappsecurity Technology Co Ltd
Current assignee: DBAPPSecurity Co Ltd; Hangzhou Dbappsecurity Technology Co Ltd
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2021-06-15

Abstract

The application relates to a method, a device, a system, an electronic device and a storage medium for checking a two-dimensional code, wherein the method comprises the following steps: acquiring a target URL link corresponding to a two-dimensional code to be detected; under the condition that the target URL link is not matched in the blacklist and the white list, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a well-trained phishing website identification model for risk prediction to obtain a model prediction risk value; opening a target URL link and carrying out logic dynamic test on a target website, and obtaining a dynamic test risk value according to a logic dynamic test result; and integrating the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirming that the two-dimensional code to be detected corresponding to the target URL link points to the phishing website. The method and the device solve the problems of low accuracy, incomplete coverage rate and low response speed in checking whether the two-dimension code is a phishing website.

Description

Method and system for checking two-dimensional code

Technical Field

The present application relates to the field of network security, and in particular, to a method, an apparatus, a system, an electronic apparatus, and a storage medium for checking a two-dimensional code.

Background

In recent years, with the advantages of two-dimensional codes gradually appearing in a new internet age, more and more websites are directly accessed by using the two-dimensional codes in daily life, but the two-dimensional codes also have the characteristics of incapability of authentication and incapability of tracing. The two-dimension code is a new carrier of a phishing website because the two-dimension code lacks a uniform management system. At present, a user has no precaution to a two-dimensional code which is visible everywhere, and various convenient and quick code scanning software does not have a targeted detection flow, so that the user can be guided to a dangerous phishing website by scanning once, and privacy information of the user is leaked.

When the existing mobile terminal uses two-dimension code scanning, most mobile terminals do not set phishing link detection, but directly open the link, so that a user can not be prevented from directly entering a website and personal information is revealed. The other part of the mobile terminals can set phishing website detection in the browser, most of the selected schemes are based on black lists and white lists, but the phishing websites have short survival time, so that the accuracy rate of detecting the phishing websites is low; in addition, the labor consumption for maintaining the black list and the white list is large. Moreover, the phishing website is complicated in setting, the analysis time is long, and the missing report and the false report are easily caused by the analysis error, so that the coverage rate of the phishing website is not comprehensive and the response speed is low.

At present, no effective solution is provided for the problems of low accuracy, incomplete coverage rate and low response speed of checking whether the two-dimensional code is a phishing website in the related technology.

Disclosure of Invention

The embodiment of the application provides a method, a device, a system, an electronic device and a storage medium for checking a two-dimensional code, and at least solves the problems that in the related technology, the accuracy of checking whether the two-dimensional code is a phishing website is not high, the coverage rate is not comprehensive, and the response speed is low. In a first aspect, an embodiment of the present application provides a method for checking a two-dimensional code, including: acquiring a target URL link corresponding to a two-dimensional code to be detected;

matching the target URL link with a preset white list and a preset black list;

under the condition that the target URL link is not matched in the blacklist and the white list, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a well-trained phishing website identification model for risk prediction to obtain a model prediction risk value;

opening the target URL link and a target website to perform logic dynamic test, and obtaining a dynamic test risk value according to the result of the logic dynamic test;

and integrating the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirming that the two-dimensional code to be detected corresponding to the target URL link points to a phishing website.

In some embodiments, the matching the target URL link with a preset white list and a preset black list includes:

matching the target URL link with the white list;

if the target URL link is matched in the white list, confirming that the two-dimensional code to be detected corresponding to the target URL link points to a safe website;

if the target URL link is not matched in the white list, matching the target URL link with the black list;

and if the target URL link is matched in the blacklist, confirming that the two-dimensional code to be detected corresponding to the target URL link points to a phishing website.

In some embodiments, the well-trained phishing website identification model comprises a first risk prediction model and a second risk prediction model; the first risk prediction model is obtained by training based on a logistic regression model, and the second risk prediction model is obtained by training based on a decision tree model; the characteristic extraction of the target URL link is carried out to obtain a keyword of the target URL link, and the keyword is input into a phishing website identification model which is well trained to carry out risk prediction to obtain a model prediction risk value, and the method comprises the following steps:

performing feature extraction on the target URL link to obtain a keyword of the target URL link;

inputting the keywords into the first risk prediction model to perform first risk prediction to obtain a first risk value;

inputting the keywords into the second risk prediction model to perform second risk prediction to obtain a second risk value;

and summing the first risk value and the second risk value to obtain the model prediction risk value.

In some embodiments, when the target URL link is not matched in both the black list and the white list, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a well-trained phishing website recognition model for risk prediction to obtain a model prediction risk value, the method further includes:

acquiring characteristics of a sample URL link in blacklist sample data and white list sample data, and extracting the characteristics in the sample URL link to obtain keywords in the sample URL link;

inputting keywords in the sample URL link into the logistic regression model for training to obtain the first risk prediction model;

and inputting the keywords in the sample URL link into the decision tree model for training to obtain the second risk prediction model.

In some of these embodiments, the dynamic test risk values include a third risk value, a fourth risk value, a fifth risk value, and a sixth risk value; the opening of the target URL link and the target website for logic dynamic test, and obtaining a dynamic test risk value according to the result of the logic dynamic test comprise:

accessing the target website corresponding to the URL link, and judging whether the target website is a waste paper network security firewall or not;

setting a preset risk value as a third risk value under the condition that the target website is not configured with a network security firewall;

accessing the target website corresponding to the URL link, receiving a return message of the target website, and judging whether the second return message of the target website is the same as the first return message;

under the condition that the second return message of the target website is the same as the first return message, setting a preset risk value as a fourth risk value;

accessing the target website corresponding to the URL link, and checking whether a set-cookie field exists in a corresponding header field in a cookie value generated by the target website;

setting a preset risk value as a fifth risk value if the set-cookie field does not exist in the header field;

accessing the target website corresponding to the URL link, trying to submit the logged state of the target website, and judging whether the logged state of the target website is abnormal;

setting a preset risk value into a sixth risk value under the condition that the login state of the website is abnormal;

and summing the third risk value, the fourth risk value, the fifth risk value and the sixth risk value to obtain the dynamic test risk value.

In some of these embodiments, the method further comprises: and if the comprehensive risk value does not exceed the comprehensive risk value threshold value, opening, copying or sharing the target URL link according to the selection of the user. In a second aspect, an embodiment of the present application provides a device for checking a two-dimensional code, including:

the device comprises an acquisition module, a matching module, a model prediction module, a dynamic test module and a judgment module;

the acquisition module is used for acquiring a target URL link corresponding to the two-dimensional code to be detected;

the matching module is used for matching the target URL link with a preset white list and a preset black list;

the model prediction module is used for extracting the characteristics of the target URL link to obtain a keyword of the target URL link under the condition that the target URL link is not matched in the blacklist and the white list, and inputting the keyword into a well-trained phishing website identification model for risk prediction to obtain a model prediction risk value;

the dynamic test module is used for opening the target URL link and carrying out logic dynamic test on the target website, and obtaining a dynamic test risk value according to the result of the logic dynamic test;

and the judging module is used for integrating the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirming that the two-dimensional code to be detected corresponding to the target URL link points to a phishing website.

In a third aspect, an embodiment of the present application provides a system, including: mobile terminal equipment, transmission equipment and server equipment; the terminal equipment is connected with the server equipment through the transmission equipment;

the mobile terminal equipment is used for scanning and analyzing the current two-dimensional code to obtain a target URL link;

the transmission equipment is used for transmitting the target URL link obtained by analyzing the two-dimensional code to the server equipment in a wired or wireless mode;

the server equipment is used for acquiring a target URL link corresponding to the two-dimensional code to be detected;

matching the target URL link with a preset white list and a preset black list;

and integrating the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirming that the two-dimensional code to be detected corresponding to the target URL link points to a phishing website. In a fourth aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the method for verifying a two-dimensional code according to the first aspect.

In a fifth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for verifying a two-dimensional code as described in the first aspect.

Compared with the prior art, the embodiment of the application provides a method, a device, a system, an electronic device and a storage medium for checking a two-dimensional code, wherein a target URL link corresponding to the two-dimensional code to be detected is obtained; matching the target URL link with a preset white list and a preset black list; under the condition that the target URL link is not matched in the blacklist and the white list, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a well-trained phishing website identification model for risk prediction to obtain a model prediction risk value; opening a target URL link and carrying out logic dynamic test on a target website, and obtaining a dynamic test risk value according to a logic dynamic test result; the model prediction risk value and the dynamic test risk value are integrated into a comprehensive risk value according to a preset method, if the comprehensive risk value exceeds a preset threshold value, the two-dimensional code to be detected corresponding to the target URL link is confirmed to point to the phishing website, the problems that the accuracy rate of detecting whether the two-dimensional code is the phishing website is low, the coverage rate is not comprehensive, the response speed is low are solved, and the accuracy rate of detecting whether the two-dimensional code is the phishing website is improved.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic diagram of a system for verifying two-dimensional codes according to an embodiment of the present application;

fig. 2 is a flowchart of a method of verifying a two-dimensional code according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a logic function according to an embodiment of the present application;

fig. 4 is a flowchart of another method of verifying a two-dimensional code according to an embodiment of the present application;

fig. 5 is a block diagram of a two-dimensional code verifying apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The present embodiment provides a system for checking a two-dimensional code, and fig. 1 is a schematic diagram of a system for checking a two-dimensional code according to an embodiment of the present application, and as shown in fig. 1, the system includes: a mobile terminal device 10, a transmission device 14, and a server device; wherein, the terminal device is connected with the server device 14 through the transmission device 12;

the mobile terminal device 10 is used for scanning and analyzing the current two-dimensional code to obtain a target URL link;

the transmission device 12 is configured to transmit the target URL link obtained by analyzing the two-dimensional code to the server device 14 in a wired or wireless manner;

the server device 14 is configured to obtain a target URL link corresponding to the two-dimensional code to be detected; matching the target URL link with a preset white list and a preset black list; under the condition that the target URL link is not matched in the blacklist and the white list, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a well-trained phishing website identification model for risk prediction to obtain a model prediction risk value; opening a target URL link and carrying out logic dynamic test on a target website, and obtaining a dynamic test risk value according to a logic dynamic test result; and integrating the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirming that the two-dimensional code to be detected corresponding to the target URL link points to the phishing website.

Through the above steps, the mobile terminal device 10 scans the two-dimensional code appearing currently, and after the current two-dimensional code picture is scanned, the scanned two-dimensional code picture is transmitted to the server device in a wireless or wired manner. It should be noted here that it is also possible to implement the present application in the mobile terminal device, but parsing the two-dimensional code at the mobile terminal device end would result in resource occupation, so the scheme of parsing the two-dimensional code picture executed in the present application is executed at the server device 14. Firstly, the server device 14 acquires a shot two-dimensional code picture and then analyzes the two-dimensional code; in general, two-dimensional code analysis is divided into two cases, one is specific contents such as pictures and characters. The other type is URL link, and the link of many phishing websites is invaded through the URL link, so that the phishing website can be invaded by directly accessing the target URL link after the two-dimensional code is analyzed. Therefore, under the condition that the analysis result is the target URL link, the current target URL link is matched with a white list and a black list in an external server through an interface, and the current target URL link is matched in the white list to show that the current target URL link has no threat and is a normal website; if the website accessed by the current target URL link matched in the blacklist is a phishing website, the suspected degree is high, so that the matching result is sent to the mobile terminal device 10 for risk prompt. And under the condition that the URL link is not matched in the blacklist or the white list, a second round of test is carried out, wherein the second round of test adopts a parallel mode and carries out asynchronous judgment at the same time, and risk values generated in the two aspects are added to carry out final judgment. On one hand, feature extraction is carried out on the URL link, as the phishing website generally has own features, after the features are identified, the features are put into a preset model for prediction to obtain a first risk value, and the first risk value is used as a part of basis for judging the website visited by the current target URL link; in the second aspect, the target URL link is opened to carry out logic dynamic test on the website, and because a message sent to the phishing website has a specific return value, a second risk value is obtained according to the return results of the logic dynamic tests, and the second risk value is used as another part of basis for judging the website visited by the current target URL link; and finally integrating the first risk value and the second risk value into a comprehensive risk value, and judging whether the two-dimensional code to be detected corresponding to the target URL link indeed points to the phishing website according to whether the comprehensive risk value exceeds a preset comprehensive risk value threshold value. The invention combines the black and white list, the training model prediction and the logic dynamic test, so that the precision rate of checking the phishing website is higher, the response speed is high, the coverage rate is more comprehensive, the user alertness is improved, and the normal function cannot be influenced by false alarm.

The present embodiment provides a method for checking a two-dimensional code, and fig. 2 is a flowchart of a method for checking a two-dimensional code according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S201, obtaining a target URL link corresponding to the two-dimensional code to be detected.

Among them, the two-dimensional code is a matrix two-dimensional code symbol which was invented by Denso Wave employee of foyota corporation in japan in 1994 and started to be used first in industries such as automobile manufacturing, and then widely used in industries such as electronic telecommunications in internet Wave. In the present application, the mobile terminal device 10 may perform scanning and then perform parsing to obtain a non-URL link or a URL link, and then transmit the URL link to the server device 14 through the transmission device 12, and the server device 14 obtains the URL link and then performs detection.

Step S202, matching the target URL link with a preset white list and a preset black list.

Where blacklist and whitelist refer to lists of content known to be dangerous or safe. The blacklist may be obtained through puish website, which provides a large number of phishing websites. Meanwhile, the Puish Link website also provides a plurality of interfaces, and the blacklist of the Puish Link website can be acquired through the interfaces, wherein the blacklist of the Puish Link website is updated regularly. The white list is obtained through an Alexa website, and Alexa is the website which has the largest number of URLs and has the most detailed ranking information release at present, so Alexa is used as the white list.

It should be noted that if the target URL link is not matched in the blacklist, it cannot be determined that the URL link corresponding to the current target URL link does not point to the phishing website, and a next round of detection is required.

And step S203, under the condition that the target URL link is not matched in the blacklist and the white list, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a phishing website identification model which is well trained to perform risk prediction to obtain a model prediction risk value.

The method is the most widely used method by combining a machine learning prediction model with an instructive phishing detection technology, firstly, feature vector extraction is carried out, and corresponding features and extraction modes for distinguishing phishing websites and normal webpages are respectively designed according to observation of features of target URL links and summary of HTML page structure contents. The performance of the current data is then analyzed, practiced, and compared by a number of different methods. Two algorithms with the highest precision ratio are integrated: logistic regression and decision trees to be put into practical detection.

And step S204, opening a target URL link to perform logic dynamic test with the target website, and obtaining a dynamic test risk value according to the result of the logic dynamic test.

Specifically, after the target URL link is opened, a response is sent to the server device 14, and the logical dynamic test is a method for determining whether the URL link corresponding to the target URL link points to the phishing website by analyzing the response.

And S205, integrating the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirming that the two-dimensional code to be detected corresponding to the target URL link points to the phishing website.

The phishing website is a false website which is used for an attacker to imitate the appearance of other login websites and induce a user to input information such as a user name and a password so as to steal user credentials.

Through the steps, firstly, a shot two-dimensional code picture is obtained, and then the two-dimensional code is analyzed; in general, two-dimensional code analysis is divided into two cases, one is specific contents such as pictures and characters. The other type is URL link, and the link of many phishing websites is invaded through the URL link, so that the phishing website can be invaded by directly accessing the target URL link after the two-dimensional code is analyzed. Therefore, under the condition that the analysis result is the target URL link, the current target URL link is matched with a white list and a black list in an external server through an interface, and the current target URL link is matched in the white list to show that the current target URL link has no threat and is a normal website; if the website accessed by the current target URL link matched in the blacklist is a phishing website, the suspected degree is high, so that the matching result is sent to the mobile terminal device 10 for risk prompt. And under the condition that the URL link is not matched in the blacklist or the white list, a second round of test is carried out, wherein the second round of test adopts a parallel mode and carries out asynchronous judgment at the same time, and risk values generated in the two aspects are added to carry out final judgment. On one hand, feature extraction is carried out on the URL link, as the phishing website generally has own features, after the features are identified, the features are put into a preset model for prediction to obtain a first risk value, and the first risk value is used as a part of basis for judging the website visited by the current target URL link; in the second aspect, the target URL link is opened to carry out logic dynamic test on the website, and because a message sent to the phishing website has a specific return value, a second risk value is obtained according to the return results of the logic dynamic tests, and the second risk value is used as another part of basis for judging the website visited by the current target URL link; and finally integrating the first risk value and the second risk value into a comprehensive risk value, and judging whether the two-dimensional code to be detected corresponding to the target URL link indeed points to the phishing website according to whether the comprehensive risk value exceeds a preset comprehensive risk value threshold value. The invention combines the black and white list, the training model prediction and the logic dynamic test, so that the precision rate of checking the phishing website is higher, the response speed is high, the coverage rate is more comprehensive, the user alertness is improved, and the normal function cannot be influenced by false alarm.

In some embodiments, matching the target URL link with a preset white list and a preset black list includes:

matching the target URL link with a white list;

if the target URL link is matched in the white list, confirming that the two-dimensional code to be detected corresponding to the target URL link points to the security website;

however, if the target URL link is not matched in the white list, it cannot be determined whether the current target URL link is a safe website or a phishing website, so that matching in the black list is required.

And if the target URL link is matched in the blacklist, confirming that the two-dimensional code to be detected corresponding to the target URL link points to the phishing website. Where blacklist and whitelist refer to lists of content known to be dangerous or safe. The blacklist may be obtained through puish website, which provides a large number of phishing websites. Meanwhile, the Puish Link website also provides a plurality of interfaces, the blacklist of the Puish Link website can be acquired through the interfaces, and the blacklist of the Puish Link website is updated regularly. The white list is obtained through an Alexa website, the Alexa website has the largest number of URLs and is the website with the most detailed ranking information issue, and therefore the Alexa is used as the white list, and the matching accuracy can be higher.

It should be noted that, matching the target URL link with the black list and the white list is a more authoritative and fastest response scheme, but there is a possibility of false negative. Therefore, the blacklist and the whitelist are selected as a first step to judge whether the target URL link is a safe link or a phishing website, so that the response time and unnecessary resource waste of subsequent steps are reduced.

In some embodiments, the well-trained phishing website identification model comprises a first risk prediction model and a second risk prediction model; the first risk prediction model is obtained by training based on a logistic regression model, and the second risk prediction model is obtained by training based on a decision tree model;

the logistic regression model is used for solving the two classification problems of judging whether the phishing websites exist or not and used for predicting the possibility of yes or no. Logistic regression is a generalized linear model that assumes that dependent variables obey bernoulli distributions while introducing nonlinear factors through logistic functions.

The hypothetical function (Hypothesis function) form of logistic regression is as follows:

wherein h is_θ(x) Is a logic function, namely a Sigmoid function. Specifically, x is a feature processing value of each dimension extracted from the target url link, and may be x1, x2, x3, or the like. Through a training setThe x1, x2, x3 and the like are brought into the state that the specific parameters of the function are continuously found, and finally a logic function h is obtained_θ(x) In that respect As shown in fig. 3, the function takes values in between, the curve is an s-shaped curve, and the value of the function will quickly approach 0 or 1 at a distance from 0.

The decision function is fixed under a certain set of defined conditions, i.e. the hypothetical space decided under the algorithm. The assumptions made by the logistic regression model are:

the threshold is determined by actual conditions, and the judgment accuracy of the corresponding requirement positive case of the large threshold is high; whereas if the demand for positive recall is high, a small threshold is selected, typically 0.5 may be selected. That is, when the features to be predicted are brought into each feature processing value, the obtained segmentation function is used for calculation, if the result value is greater than the threshold value, the prediction result is judged to be the suspicious phishing website

Decision boundaries, also called decision surfaces, are used to separate samples of different classes in an N-dimensional space, which is in fact an equation. The decision boundary is determined by the parameters of the hypothesis function, which can be said to be an attribute.

Decision Tree (Decision Tree):

the tree model is different from the linear model in that the tree model can process each feature one by one instead of using a formula to bring all feature parameters into calculation, and the tree model can be classified and divided when each feature is processed, so that a nonlinear segmentation mode can be easily found.

Compared with other algorithms, the tree model can be closer to a human thinking classification mode, and is convenient for generating visual classification logic. It uses information entropy as measurement, and adopts top-down recursion learning to construct tree whose entropy value is most quickly reduced, i.e. stage function between partitions.

The decision tree is composed of several elements: root node: containing the full set of samples. Internal nodes: and testing corresponding characteristic attributes. Leaf node: representing the outcome of the decision. During training, different feature values are respectively used as internal nodes to construct a tree according to feature values in a given training set, when each target URL link passes through the node, whether the target URL link is leftward or rightward is judged according to the feature value corresponding to the node, and finally, the judgment results of the leaf nodes, namely the phishing link and the phishing link, are reached.

During prediction, a certain attribute value is used for judging at an internal node of the tree, and a branch node is determined according to a judgment result until a leaf node is reached to obtain a classification result.

The decision tree is essentially to find the mathematically highest purity, i.e., the purest partitioning method, by building trees and pruning, and to maximally separate the target variables.

Carrying out feature extraction on the target URL link to obtain a keyword of the target URL link, inputting the keyword into a phishing website identification model with complete training for risk prediction to obtain a model prediction risk value, and the method comprises the following steps:

extracting the characteristics of the target URL link to obtain a keyword of the target URL link;

the characteristic extraction is carried out on the target URL link, and the characteristic of the normal URL link and the characteristic of the URL link of the phishing website have various different characteristics due to the fact that the URL link and the URL of the phishing website are compared and analyzed normally.

For example, the first thing is that the normal URL links rarely contain "@, -," and other symbols, but the links of phishing websites can confuse the masses by deliberately imitating some official websites by selecting a false domain name.

Second, the phishing website generally does not apply for purchasing a domain name, but directly places the domain name on the host, so that the victim can access the fake website through the ip address, and whether the link contains the ip address is an important criterion for judging whether the current URL link is the phishing website.

And the third point is that the normal host domain names have 5 levels and are separated by using a point, the last domain name is a top-level domain name, and the common website is a third-level domain name. If the number of occurrences is too high and the number of occurrences in the URL link is too high, it may be determined that the current URL link points to a phishing website.

Fourth, the registration of a shorter domain name usually requires more funds, and a phishing website can select a longer domain name to save expenses, and if a short domain name generation tool is used to hide its own real domain name, the domain name length is very short, which may be a phishing website, so that the too long or too short domain name length is also an important criterion for determining whether the current URL link is a phishing website.

Fifth, since numbers are not usually present in a URL link, except for some paths that may contain numbers of partially static pages, the length of the longest string of numbers in a URL link may be a criterion, and if the length exceeds a certain limit, the URL link is suspect.

Sixth, websites with information value such as banks and the like are often counterfeited by phishing websites, so that for similarity, some key sensitive words may exist in the URL links, here we list a sensitive word dictionary for matching, and if keywords exist in the URL links, the feature can be set to 1. In addition, because the similarity imitation of the domain names is difficult, some malicious links can place the domain name of the target website in the path to confuse the victim of the phishing link, and corresponding detection vectors need to be set for detecting the domain name.

Inputting the keywords into a first risk prediction model to perform first risk prediction to obtain a first risk value;

inputting the keywords into a second risk prediction model to perform second risk prediction to obtain a second risk value;

the features extracted from the previous part are used as input vectors, and a prediction output result is obtained through a model trained by two algorithms.

And summing the first risk value and the second risk value to obtain a model prediction risk value.

Through the steps, the keywords of the target URL link are obtained by performing feature extraction on the URL link, the keywords are input into a phishing website identification model which is completely trained for risk prediction, a model prediction risk value is obtained, the risk value is used for judging whether the current URL link is a phishing website, the situation of omission caused by matching only through a black and white list is greatly avoided, and the precision ratio of the phishing website is increased.

In some embodiments, when the target URL link is not matched in both the black list and the white list, the method further includes performing feature extraction on the target URL link to obtain a keyword of the target URL link, inputting the keyword into a well-trained phishing website identification model for risk prediction, and before obtaining a model prediction risk value:

inputting keywords in a sample URL link into a logistic regression model for training to obtain a first risk prediction model;

and inputting the keywords in the sample URL link into the decision tree model for training to obtain a second risk prediction model.

The method comprises the steps of obtaining characteristics of a sample URL link in blacklist sample data and whitelist sample data, extracting the characteristics of the sample URL link, inputting keywords in the sample URL link into a training model for summarizing general rules in the blacklist sample data and the whitelist sample data for training, obtaining different risk prediction models after training, achieving faster matching of a target URL link through the risk prediction models, and improving the efficiency of detecting whether the target URL link corresponding to a two-dimensional code to be detected points to a phishing website.

In some of these embodiments, the dynamic test risk values include a third risk value, a fourth risk value, a fifth risk value, and a sixth risk value; opening a target URL link and carrying out logic dynamic test on a target website, and obtaining a dynamic test risk value according to a logic dynamic test result, wherein the logic dynamic test risk value comprises the following steps:

accessing a target website corresponding to the URL link, and judging whether the target website is a network security firewall;

setting the preset risk value as a third risk value under the condition that the target website is not configured with the network security firewall;

accessing a target website corresponding to the URL link, receiving a return message of the target website, and judging whether the second return message of the target website is the same as the first return message;

under the condition that the second return message of the target website is the same as the first return message, setting the preset risk value as a fourth risk value;

accessing a target website corresponding to the URL link, and checking whether a set-cookie field exists in a corresponding header field in a cookie value generated by the target website;

setting the preset risk value as a fifth risk value under the condition that the set-cookie field does not exist in the head field;

accessing a target website corresponding to the URL link, trying to submit the logged state of the target website, and judging whether the logged state of the target website is abnormal;

setting the preset risk value as a sixth risk value under the condition that the login state of the website is abnormal;

and summing the third risk value, the fourth risk value, the fifth risk value and the sixth risk value to obtain a dynamic test risk value.

The Cookie is a plain text file stored in a client, such as a txt file, the client is a local computer of the client, when a user uses the local computer to access a webpage through a browser, a server generates a certificate and returns the certificate to the browser and writes the certificate into the local computer, and the certificate is a Cookie.

Specifically, in addition to the existing various detection mode improvements, the applicant finds that the fishing website and the normal website have differences except for information such as URL (uniform resource locator) characteristics, html (hypertext markup language) interface characteristics, DNS (domain name system) and the like due to different essential functions through analysis and test of user names and passwords for stealing related websites of browsers, which are the phishing website intentions, and can be called as dynamic characteristics in the aspect of website business logic because the information needs to be submitted by a simulated user and compared with the content returned by web application interaction and judgment. Since the ultimate goal of the phishing website is to induce the user to submit information, other necessary functions may not be fully developed or even directly absent, and some testing may be done in these areas.

Firstly, in the aspect of investment and configuration of a website, a normal website can invest funds, equipment or strengthen functional configuration, for example, a network security firewall (WAF) for preventing hackers from attacking can filter dangerous flow through big data learning and some rules, the network security firewall provides protection for various open Web applications on the Internet through a series of security strategies, prevents the website from invading a system in a mode similar to SQL injection/login box username and password brute force cracking/directory crossing/any file uploading/cross-site scripting attack and the like, acquires certain authority and causes harm to a host and a database or extracts sensitive information to acquire illegal benefits.

Because the protection key point is SQL injection, for this point, we can determine whether the request is identified and intercepted by submitting the SQL injection string admin 'or' 1 'to' 1 in the input box and checking whether the returned content includes the firewall feature of the website, so as to detect whether the target website is configured with the security device, and if similar configurations exist, the probability that the website is a phishing website is reduced.

Except for a network firewall, some known site interfaces often have some random information, such as random pictures/related popularization, and on the contrary, phishing websites are developed to be simple and crude, and rarely have return of functional random contents, and often only seek for superficial similarity. The likelihood that the website is a malicious phishing website may also be reduced if the number of results obtained from two identical requests is different in length.

Another feature associated with the functionality is the configuration of cookie information. Although the HTTP protocol is stateless, the backend traffic needs to identify from which user the request came, so cookies are used to store state information in the web.

Sites that normally contain user login functionality often require cookie technology to identify and mark a user, whereas phishing websites are just a front-end impersonation, lacking the associated functionality for user and back-end interaction, let alone setting up cookie fields. So if the return header contains a set-cookie field to set the user with a corresponding identity during the attempted login, there is a high probability that a normal web site will be used.

According to the purpose analysis of the phishing website, no matter what the user inputs, the subsequent logic of the website can not be like a normal website, the corresponding user related information is inquired in a data table which forms a database inquiry statement at the later stage and stores a user name and a password, whether the user is correct or not is judged, the user login related logic is continuously executed, and more possibly, the input of the user is not subsequent after being acquired and stored, the user cannot acquire and return after submitting a form, the connection is lost, the returned response content is empty, or the user is slightly modified, and an input error is returned or the login website is redirected. When the above similar situations occur, it can be inferred that the logic of the back end of the secondary login website is not perfect, and the secondary login website is most likely to be an induced phishing website, namely, the induced phishing website is used for acquiring the user name and the password input by the user, and the website construction and the related logic experience are simplified.

Wherein the above information can be represented by the following table:

TABLE 1

The embodiment also provides a method for checking the two-dimensional code. Fig. 4 is a flowchart of another method for checking a two-dimensional code according to an embodiment of the present application, where the flowchart includes the following steps, as shown in fig. 4:

step S401, a target URL link corresponding to the two-dimensional code to be detected is obtained.

Step S402, matching the target URL link with a preset white list and a preset black list.

And S403, under the condition that the target URL link is not matched in the blacklist and the white list, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a phishing website identification model which is well trained to perform risk prediction to obtain a model prediction risk value.

And S404, opening a target URL link and carrying out logic dynamic test on the target URL link and the target website, and obtaining a dynamic test risk value according to a logic dynamic test result.

And S405, integrating the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirming that the two-dimensional code to be detected corresponding to the target URL link points to the phishing website.

Step S406, if the integrated risk value does not exceed the integrated risk value threshold, opening, copying, or sharing the target URL link according to the selection of the user.

Through the steps, firstly, a shot two-dimensional code picture is obtained, and then the two-dimensional code is analyzed; in general, two-dimensional code analysis is divided into two cases, one is specific contents such as pictures and characters. The other type is URL link, and the link of many phishing websites is invaded through the URL link, so that the phishing website can be invaded by directly accessing the target URL link after the two-dimensional code is analyzed. Therefore, under the condition that the analysis result is the target URL link, the current target URL link is matched with a white list and a black list in an external server through an interface, and the current target URL link is matched in the white list to show that the current target URL link has no threat and is a normal website; if the website accessed by the current target URL link matched in the blacklist is a phishing website, the suspected degree is high, so that the matching result is sent to the mobile terminal device 10 for risk prompt.

And under the condition that the URL link is not matched in the blacklist or the white list, a second round of test is carried out, wherein the second round of test adopts a parallel mode and carries out asynchronous judgment at the same time, and risk values generated in the two aspects are added to carry out final judgment. On one hand, feature extraction is carried out on the URL link, as the phishing website generally has own features, after the features are identified, the features are put into a preset model for prediction to obtain a first risk value, and the first risk value is used as a part of basis for judging the website visited by the current target URL link; in the second aspect, the target URL link is opened to carry out logic dynamic test on the website, and because a message sent to the phishing website has a specific return value, a second risk value is obtained according to the return results of the logic dynamic tests, and the second risk value is used as another part of basis for judging the website visited by the current target URL link; and finally integrating the first risk value and the second risk value into a comprehensive risk value, judging whether the two-dimensional code to be detected corresponding to the target URL link really points to the phishing website or not by judging whether the comprehensive risk value exceeds a preset comprehensive risk value threshold value or not, sending a message to the mobile terminal device 10 if the comprehensive risk value does not exceed the comprehensive risk value threshold value, and opening, copying or sharing the target URL link to access the target website by the mobile terminal device 10 according to the selection of a user.

The invention combines the black and white list, the training model prediction and the logic dynamic test, so that the precision rate of checking the phishing website is higher, the response speed is high, the coverage rate is more comprehensive, the user alertness is improved, and the normal function cannot be influenced by false alarm.

The present embodiment further provides a device for checking a two-dimensional code, where the device is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated after the description is given. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 5 is a block diagram illustrating a structure of an apparatus for checking a two-dimensional code according to an embodiment of the present application, as shown in fig. 5, the apparatus including: an obtaining module 51, a matching module 52, a model predicting module 53, a dynamic testing module 54 and a judging module 55;

and an obtaining module 51, configured to obtain all running information of the target program in the application system.

And a matching module 52 for monitoring the alarm signal of the dangerous event sent by the safety protection device connected with the application system.

And the model prediction module 53 is configured to, after monitoring the dangerous event alarm signal, determine, according to the dangerous event alarm signal and the information of the safety protection device that sends the signal, partial operation information of the target program before the dangerous event occurs from all the operation information, and mark the partial operation information to obtain target operation information.

And the dynamic test module 54 is configured to compare and analyze the target operation information and the normal operation information, and determine a vulnerability of the application system according to an analysis result.

And the judging module 55 is configured to integrate the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirm that the two-dimensional code to be detected corresponding to the target URL link points to the phishing website.

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

and step S1, acquiring a target URL link corresponding to the two-dimensional code to be detected.

And step S2, matching the target URL link with a preset white list and a preset black list.

And step S3, under the condition that the target URL link is not matched in the blacklist and the white list, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a well-trained phishing website identification model for risk prediction to obtain a model prediction risk value.

And step S4, integrating the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirming that the two-dimensional code to be detected corresponding to the target URL link points to the phishing website.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

In an embodiment, a computer-readable storage medium is provided, and fig. 6 is a block diagram of a computer-readable storage medium according to an embodiment of the present application, as shown in fig. 6, on which a computer program is stored, and when the computer program is executed by a processor, the steps in a method for troubleshooting an application vulnerability provided by the embodiments described above are implemented, and the steps are as follows:

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to serve as a limitation on the computer-readable storage media on which the disclosed aspects may be implemented, as a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for checking a two-dimensional code is characterized by comprising the following steps:

acquiring a target URL link corresponding to a two-dimensional code to be detected;

matching the target URL link with a preset white list and a preset black list;

2. The method of claim 1, wherein matching the target URL link to a pre-defined whitelist and blacklist comprises:

matching the target URL link with the white list;

3. The method of claim 1, wherein the well-trained phishing website identification model comprises a first risk prediction model and a second risk prediction model; the first risk prediction model is obtained by training based on a logistic regression model, and the second risk prediction model is obtained by training based on a decision tree model; the characteristic extraction of the target URL link is carried out to obtain a keyword of the target URL link, and the keyword is input into a phishing website identification model which is well trained to carry out risk prediction to obtain a model prediction risk value, and the method comprises the following steps:

4. The method of claim 3, wherein in a case that the target URL link is not matched in the blacklist and the whitelist, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a well-trained phishing website recognition model for risk prediction to obtain a model prediction risk value, the method further comprises:

5. The method of claim 1, wherein the dynamic test risk values comprise a third risk value, a fourth risk value, a fifth risk value, and a sixth risk value; the opening of the target URL link and the target website for logic dynamic test, and obtaining a dynamic test risk value according to the result of the logic dynamic test comprise:

6. The method of claim 1, further comprising:

and if the comprehensive risk value does not exceed the comprehensive risk value threshold value, opening, copying or sharing the target URL link according to the selection of the user.

7. The utility model provides an inspection two-dimensional code device which characterized in that includes: the device comprises an acquisition module, a matching module, a model prediction module, a dynamic test module and a judgment module;

8. A system for verifying two-dimensional codes, comprising: mobile terminal equipment, transmission equipment and server equipment; the terminal equipment is connected with the server equipment through the transmission equipment;

the server equipment is used for acquiring a target URL link corresponding to the two-dimensional code to be detected; matching the target URL link with a preset white list and a preset black list; under the condition that the target URL link is not matched in the blacklist and the white list, performing feature extraction on the target URL link to obtain a keyword of the target URL link, and inputting the keyword into a well-trained phishing website identification model for risk prediction to obtain a model prediction risk value; opening the target URL link and a target website to perform logic dynamic test, and obtaining a dynamic test risk value according to the result of the logic dynamic test; and integrating the model prediction risk value and the dynamic test risk value into a comprehensive risk value according to a preset method, and if the comprehensive risk value exceeds a preset threshold value, confirming that the two-dimensional code to be detected corresponding to the target URL link points to a phishing website.

9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for verifying a two-dimensional code according to any one of claims 1 to 6.

10. A storage medium, in which a computer program is stored, wherein the computer program is configured to execute the method for checking a two-dimensional code according to any one of claims 1 to 6 when running.