CN111104618A - Webpage skipping method and device - Google Patents

Webpage skipping method and device Download PDF

Info

Publication number
CN111104618A
CN111104618A CN201911320994.4A CN201911320994A CN111104618A CN 111104618 A CN111104618 A CN 111104618A CN 201911320994 A CN201911320994 A CN 201911320994A CN 111104618 A CN111104618 A CN 111104618A
Authority
CN
China
Prior art keywords
webpage
target
sample
skipping
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911320994.4A
Other languages
Chinese (zh)
Inventor
郭镔
高雅
刘远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miaozhen Information Technology Co Ltd
Original Assignee
Miaozhen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miaozhen Information Technology Co Ltd filed Critical Miaozhen Information Technology Co Ltd
Priority to CN201911320994.4A priority Critical patent/CN111104618A/en
Publication of CN111104618A publication Critical patent/CN111104618A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application provides a webpage skipping method and a webpage skipping device, which are applied to a webpage skipping system and comprise the following steps: the webpage skipping system receives a webpage clicking instruction of a user, and acquires webpage content of a target webpage according to a resource address of the target webpage carried in the webpage clicking instruction; the webpage skipping system confirms the characteristic information of the target webpage according to the resource address and the webpage content of the target webpage; the webpage skipping system inputs the characteristic information of the target webpage into a webpage identification model to obtain an illegal probability value of the target webpage, and judges whether the illegal probability value exceeds a preset threshold value or not; and if the illegal probability value exceeds a preset threshold value, interrupting webpage skipping.

Description

Webpage skipping method and device
Technical Field
The application relates to the technical field of computers, in particular to a webpage skipping method and device.
Background
At present, the internet becomes an indispensable part of people's life, and a lot of advertisement information exists in web pages, generally existing in the form of text links or picture links, and some illegal web page links are mixed in the web pages.
In the prior art, when monitoring that the linked target webpage is an illegal website, the illegal webpage is monitored by jumping to a temporary page to inform a user that the linked target webpage may have risks, so that the user can select whether to continue accessing. This approach does not avoid the user selecting to continue access for curiosity or other purposes and in the event that there is a false positive for some links to a target web page that are legitimate.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method and an apparatus for webpage jump, which are used to solve the problem of how to improve the security of webpage jump in the prior art.
In a first aspect, an embodiment of the present application provides a webpage skipping method, which is applied to a webpage skipping system, and the method includes:
the webpage skipping system receives a webpage clicking instruction of a user, and acquires webpage content of a target webpage according to a resource address of the target webpage carried in the webpage clicking instruction;
the webpage skipping system confirms the characteristic information of the target webpage according to the resource address and the webpage content of the target webpage;
the webpage skipping system inputs the characteristic information of the target webpage into a webpage identification model to obtain an illegal probability value of the target webpage, and judges whether the illegal probability value exceeds a preset threshold value or not;
and if the illegal probability value exceeds a preset threshold value, interrupting webpage skipping.
According to the first aspect, embodiments of the present application provide a first possible implementation manner of the first aspect, where the method further includes:
and if the illegal probability value does not exceed a preset threshold value, carrying out the jump operation of the target webpage according to the webpage click command.
According to the first aspect, an embodiment of the present application provides a second possible implementation manner of the first aspect, where constructing the webpage recognition model includes:
acquiring a plurality of sample web pages, identifying the legality of the sample web pages, wherein the legality comprises legality and illegally, and classifying the sample web pages into training sample web pages and testing sample web pages;
confirming the characteristic information of the sample webpage according to the resource address and the webpage content of the sample webpage;
training a training model by taking the characteristic information of the training sample webpage as input and taking the legality of the training sample webpage as output to obtain a to-be-trained model;
and verifying the model to be trained by using the characteristic information of the test sample webpage, and obtaining the webpage identification model after verification.
According to the first aspect, an embodiment of the present application provides a third possible implementation manner of the first aspect, wherein the determining the feature information of the target webpage includes:
confirming the redirection times of the resource address of the target webpage, and judging whether the redirection times exceed preset times;
and if the number of times exceeds the preset number of times, adding the redirection number into the feature information of the target webpage.
According to the first aspect, an embodiment of the present application provides a fourth possible implementation manner of the first aspect, where after the web page jump is interrupted, the method further includes:
and the webpage skipping system stores the resource address of the target webpage into an illegal webpage library.
In a second aspect, an embodiment of the present application provides a web page jumping device, which is applied to a web page jumping system, and the device includes:
the acquisition module is used for receiving a webpage click instruction of a user and acquiring webpage content of a target webpage according to a resource address of the target webpage carried in the webpage click instruction;
the processing module is used for confirming the characteristic information of the target webpage according to the resource address and the webpage content of the target webpage;
the analysis module is used for inputting the characteristic information of the target webpage into a webpage identification model to obtain an illegal probability value of the target webpage and judging whether the illegal probability value exceeds a preset threshold value or not; and if the illegal probability value exceeds a preset threshold value, interrupting webpage skipping.
According to a second aspect, the present embodiments provide a first possible implementation manner of the second aspect, wherein the analysis module further includes:
and if the illegal probability value does not exceed a preset threshold value, carrying out the jump operation of the target webpage according to the webpage click command.
According to a second aspect, an embodiment of the present application provides a second possible implementation of the second aspect, where the analysis module includes a model building unit, configured to obtain a plurality of sample web pages, identify legality of the sample web pages, where the legality includes legality and illegally, and classify the sample web pages into a training sample web page and a testing sample web page;
confirming the characteristic information of the sample webpage according to the resource address and the webpage content of the sample webpage;
training a training model by taking the characteristic information of the training sample webpage as input and taking the legality of the training sample webpage as output to obtain a to-be-trained model;
and verifying the model to be trained by using the characteristic information of the test sample webpage, and obtaining the webpage identification model after verification.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the steps of the method according to any one of the first aspect and possible implementation manners when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the method of any one of the above first aspect and possible implementations thereof.
According to the webpage skipping method and device, the resource address of the target webpage carried in the webpage click command and the webpage content of the target webpage obtained according to the resource address are analyzed to confirm the characteristic information of the target webpage, the characteristic information of the target webpage is input into the webpage distinguishing model to obtain the illegal probability value of the target webpage, whether the target webpage is the illegal webpage is judged according to the fact that whether the illegal probability value of the target webpage exceeds a preset threshold value, webpage skipping is directly interrupted if the target webpage is the illegal webpage, and webpage skipping is not conducted under the condition that the target webpage is judged to be the illegal webpage, so that the risk of accessing the illegal webpage is effectively reduced, and the webpage skipping safety is improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart of a web page jumping method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a web page jumping method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a web page jumping device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a webpage skipping method, which is applied to a webpage skipping system and comprises the following steps as shown in fig. 1:
step S101, the webpage skipping system receives a webpage clicking instruction of a user, and webpage content of a target webpage is obtained according to a resource address of the target webpage carried in the webpage clicking instruction;
step S102, the webpage skipping system confirms the characteristic information of the target webpage according to the resource address and the webpage content of the target webpage;
step S103, the webpage skipping system inputs the characteristic information of the target webpage into a webpage identification model to obtain an illegal probability value of the target webpage, and judges whether the illegal probability value exceeds a preset threshold value or not; and if the illegal probability value exceeds a preset threshold value, interrupting webpage skipping.
Specifically, after receiving a web page click instruction from a user, the web page jump system extracts a URL (Uniform Resource Locator) of a target web page corresponding to the web page click instruction, that is, the Resource address. And simulating a browser to crawl webpage content corresponding to the URL of the target webpage by using a crawler tool. The webpage content comprises content before rendering and content after rendering of the target webpage.
And then extracting characteristic information aiming at the URL and the webpage content of the target webpage, wherein the characteristic information comprises the hierarchy depth and the parameter quantity of the URL, the meta node quantity of the webpage, the maximum level of a DOM (Document Object Model) tree before rendering, the node quantity of the DOM tree before rendering, the maximum level of the DOM tree after rendering, the node quantity of the DOM tree after rendering, the maximum node quantity in the DOM tree level before rendering, the maximum node quantity in the DOM tree level after rendering, the same-domain link quantity, the cross-domain link quantity, the picture quantity, the quantity of non-repeated classes and the quantity of non-repeated ids. The content of the feature information includes, but is not limited to, the above-mentioned contents, which is not limited in the present application.
Inputting the extracted feature information into a webpage identification model for analysis, calculating to obtain an illegal probability value of the target webpage, wherein the illegal probability value is between 0 and 1, and when the illegal probability value is greater than a preset threshold value, judging that the target webpage is an illegal webpage. The preset threshold may be set to any value between 0 and 1, preferably, the preset threshold is set to 0.5, and the specific value of the preset threshold is not limited in this application.
And when the target webpage is judged to be an illegal webpage, the jump of the user to the target webpage is interrupted, and the direct experience given to the user is that a webpage jump instruction generated by the user to the target webpage is invalid.
For example, the webpage skipping system receives a webpage skipping instruction of a user, the webpage skipping system simulates a URL of a target webpage A carried by the webpage skipping instruction to crawl to the webpage content of the target webpage A by using a crawler tool, obtains characteristic information of the target webpage A by analyzing the URL of the target webpage A and the webpage content, inputs the characteristic information of the target webpage A into a webpage identification model, outputs an illegal probability value of 0.73 and a preset threshold value of 0.5, judges that the target webpage A is an illegal webpage and stops webpage skipping.
In an optional embodiment, further comprising:
and if the illegal probability value does not exceed a preset threshold value, carrying out the jump operation of the target webpage according to the webpage click command.
Specifically, when the illegal probability value is smaller than a preset threshold value, judging that the target webpage is a legal webpage, responding to a webpage jump instruction of a user, and performing jump operation corresponding to the webpage jump instruction for the user.
For example, the webpage skipping system receives a webpage skipping instruction of a user, the webpage skipping system simulates a URL of a target webpage B carried by the webpage skipping instruction to crawl to the webpage content of the target webpage B by using a crawler tool, obtains characteristic information of the target webpage B by analyzing the URL of the target webpage B and the webpage content, inputs the characteristic information of the target webpage B into a webpage identification model, judges that the target webpage B is a legal webpage when an illegal probability value output by the webpage identification model is 0.1 and a preset threshold value is 0.5, and judges that the target webpage B is a legal webpage, carries out webpage skipping operation for the user, and skips from the webpage of which the user sends the webpage skipping instruction to the target webpage B.
In an alternative embodiment, the constructing the web page identification model in step S103 includes, as shown in fig. 2:
step S201, obtaining a plurality of sample web pages, identifying the legality of the sample web pages, wherein the legality comprises legality and illegally, and classifying the sample web pages into training sample web pages and testing sample web pages;
step S202, confirming the characteristic information of the sample webpage according to the resource address and the webpage content of the sample webpage;
step S203, training a training model by taking the characteristic information of the training sample webpage as input and the legality of the training sample webpage as output to obtain a model to be trained;
and step S204, verifying the model to be trained by using the characteristic information of the test sample webpage, and obtaining the webpage identification model after verification.
Specifically, the sample webpages are both known legal webpages and known illegal webpages, the sample webpages are divided into training sample webpages and testing sample webpages, the number of the legal webpages and the number of the illegal webpages in the training sample webpages may be the same, and the number of the illegal webpages may be increased in order to further enhance the recognition capability of the trained model for the illegal webpages.
Before training, all sample web pages need to be subjected to characteristic information extraction according to resource addresses and web page contents. The characteristic information of the sample webpage also comprises the hierarchy depth and parameter quantity of the URL, the quantity of meta nodes of the webpage, the maximum hierarchy of a DOM tree before rendering, the quantity of nodes of the DOM tree before rendering, the maximum hierarchy of the DOM tree after rendering, the quantity of nodes of the DOM tree after rendering, the quantity of maximum nodes in the same level of the DOM tree before rendering, the quantity of maximum nodes in the same level of the DOM tree after rendering, the quantity of links in the same domain, the quantity of cross-domain links, the quantity of pictures, the quantity of non-repeated classes and the quantity of non-repeated ids.
And respectively inputting the characteristic information corresponding to each training sample webpage into a training model for training according to the legality of the training sample webpage, and training the training model by taking the output of 0 corresponding to the fact that the training sample webpage is a legal webpage and the output of 1 corresponding to the fact that the training sample webpage is an illegal webpage as a training purpose to obtain a model to be trained.
Since the accuracy of the trained model to be trained cannot be determined, it is necessary to confirm whether the requirements of use are met by the model operation test. The method comprises the steps of inputting characteristic information of a test sample webpage into a to-be-trained model to obtain a test illegal probability value of the test sample webpage, obtaining a test result according to a preset threshold, comparing the test result with a legality identification of the test sample webpage, and determining the to-be-trained model as a webpage identification model if the comparison result is that the test result is consistent with the legality identification of the test sample webpage, wherein the to-be-trained model meets the use requirement.
Specifically, when the feature information of a legal webpage in a test sample webpage is input into a model to be trained, the value of the illegal test probability output by the model to be trained is smaller than a preset threshold value of 0.5, the feature information of the illegal webpage in the test sample webpage is input into the model to be trained, and the value of the illegal test probability output by the model to be trained is larger than the preset threshold value of 0.5, the model to be trained is determined to be a webpage identification model.
It should be noted that the test sample web page and the training sample web page cannot have the same web page, otherwise the confidence of test verification is reduced.
In an optional embodiment, the step S102 of confirming the feature information of the target webpage includes:
step 1021, confirming the redirection times of the resource address of the target webpage, and judging whether the redirection times exceed the preset times;
and step 1022, if the preset times are exceeded, adding the redirection times to the feature information of the target webpage.
Specifically, the web page redirection means that after clicking a link for web page jump, the web pages jump sequentially through at least one intermediate address and finally jump to a target web page.
The webpage skipping system simulates skipping of a target webpage through a crawler tool simulation browser, records the redirection times of skipping to the target webpage, and records the redirection times as feature information if the redirection times exceed the preset times. Wherein the preset number of times is preferably 1.
Since many illegal web pages cover the final landing page of the web page by the aid of repeated redirection jumps, if repeated redirection occurs, the redirection times are also characteristic information for judging whether the target web page is legal or not.
In an optional embodiment, after the step S103, interrupting the web page jump, the method further includes:
step 1031, the webpage skipping system stores the resource address of the target webpage into an illegal webpage library.
Specifically, after the target webpage is judged to be an illegal webpage, the webpage skipping system interrupts webpage skipping corresponding to the webpage skipping instruction of the user, and stores the URL of the target webpage into an illegal webpage library, namely, adds the URL into a blacklist.
An embodiment of the present application further provides a webpage jumping device, which is applied to a webpage jumping system, and as shown in fig. 3, the device includes:
an obtaining module 30, configured to receive a webpage click instruction of a user, and obtain webpage content of a target webpage according to a resource address of the target webpage carried in the webpage click instruction;
a processing module 31, configured to determine feature information of the target web page according to the resource address and the web page content of the target web page;
the analysis module 32 is configured to input the feature information of the target web page into the web page identification model, obtain an illegal probability value of the target web page, and determine whether the illegal probability value exceeds a preset threshold; and if the illegal probability value exceeds a preset threshold value, interrupting webpage skipping.
In an optional embodiment, the analyzing module 32 further includes:
and if the illegal probability value does not exceed a preset threshold value, carrying out the jump operation of the target webpage according to the webpage click command.
In an optional embodiment, the analysis module 32 includes a model building unit 321, configured to obtain a plurality of sample webpages, identify the legality of the sample webpages, where the legality includes legality and illegally, and classify the sample webpages into training sample webpages and testing sample webpages;
confirming the characteristic information of the sample webpage according to the resource address and the webpage content of the sample webpage;
training a training model by taking the characteristic information of the training sample webpage as input and taking the legality of the training sample webpage as output to obtain a to-be-trained model;
and verifying the model to be trained by using the characteristic information of the test sample webpage, and obtaining the webpage identification model after the verification is passed.
Corresponding to the web page jumping method in fig. 1, an embodiment of the present application further provides a computer device 400, as shown in fig. 4, the device includes a memory 401, a processor 402, and a computer program stored in the memory 401 and executable on the processor 402, wherein the processor 402 implements the web page jumping method when executing the computer program.
Specifically, the memory 401 and the processor 402 can be general memories and processors, which are not limited in particular, and when the processor 402 runs a computer program stored in the memory 401, the above-mentioned web page jumping method can be executed, which solves the problem of how to improve the security of web page jumping in the prior art.
Corresponding to the webpage skipping method in fig. 1, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the webpage skipping method.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, when a computer program on the storage medium is executed, the above-mentioned web page skipping method can be executed, and the problem of how to improve the security of web page skipping in the prior art is solved, in the web page skipping method and apparatus provided by the embodiments of the present application, the resource address of the target web page carried in the web page click command and the web page content of the target web page obtained according to the resource address are analyzed to confirm the characteristic information of the target web page, the characteristic information of the target web page is input into the web page identification model to obtain the illegal probability value of the target web page, whether the target web page is an illegal web page is judged according to whether the illegal probability value of the target web page exceeds a preset threshold value, if the target web page is an illegal web page, the web page skipping is directly interrupted, and in case that the target web page is, and any webpage skipping is not carried out, so that the risk of accessing illegal webpages is effectively reduced, and the webpage skipping safety is improved.
In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A webpage skipping method is characterized in that the webpage skipping method is applied to a webpage skipping system and comprises the following steps:
the webpage skipping system receives a webpage clicking instruction of a user, and acquires webpage content of a target webpage according to a resource address of the target webpage carried in the webpage clicking instruction;
the webpage skipping system confirms the characteristic information of the target webpage according to the resource address and the webpage content of the target webpage;
the webpage skipping system inputs the characteristic information of the target webpage into a webpage identification model to obtain an illegal probability value of the target webpage, and judges whether the illegal probability value exceeds a preset threshold value or not;
and if the illegal probability value exceeds a preset threshold value, interrupting webpage skipping.
2. The method of claim 1, further comprising:
and if the illegal probability value does not exceed a preset threshold value, carrying out the jump operation of the target webpage according to the webpage click command.
3. The method of claim 1, wherein building the web page recognition model comprises:
acquiring a plurality of sample web pages, identifying the legality of the sample web pages, wherein the legality comprises legality and illegally, and classifying the sample web pages into training sample web pages and testing sample web pages;
confirming the characteristic information of the sample webpage according to the resource address and the webpage content of the sample webpage;
training a training model by taking the characteristic information of the training sample webpage as input and taking the legality of the training sample webpage as output to obtain a to-be-trained model;
and verifying the model to be trained by using the characteristic information of the test sample webpage, and obtaining the webpage identification model after verification.
4. The method of claim 1, wherein confirming the feature information of the target webpage comprises:
confirming the redirection times of the resource address of the target webpage, and judging whether the redirection times exceed preset times;
and if the number of times exceeds the preset number of times, adding the redirection number into the feature information of the target webpage.
5. The method of claim 1, after interrupting web page jumping, further comprising:
and the webpage skipping system stores the resource address of the target webpage into an illegal webpage library.
6. A webpage jump device is applied to a webpage jump system and comprises:
the acquisition module is used for receiving a webpage click instruction of a user and acquiring webpage content of a target webpage according to a resource address of the target webpage carried in the webpage click instruction;
the processing module is used for confirming the characteristic information of the target webpage according to the resource address and the webpage content of the target webpage;
the analysis module is used for inputting the characteristic information of the target webpage into a webpage identification model to obtain an illegal probability value of the target webpage and judging whether the illegal probability value exceeds a preset threshold value or not; and if the illegal probability value exceeds a preset threshold value, interrupting webpage skipping.
7. The apparatus of claim 6, wherein the analysis module further comprises:
and if the illegal probability value does not exceed a preset threshold value, carrying out the jump operation of the target webpage according to the webpage click command.
8. The apparatus of claim 6, wherein the analysis module comprises a model building unit configured to obtain a plurality of sample web pages, identify the legality of the sample web pages, the legality including legal and illegal, and classify the sample web pages into a training sample web page and a testing sample web page;
confirming the characteristic information of the sample webpage according to the resource address and the webpage content of the sample webpage;
training a training model by taking the characteristic information of the training sample webpage as input and taking the legality of the training sample webpage as output to obtain a to-be-trained model;
and verifying the model to be trained by using the characteristic information of the test sample webpage, and obtaining the webpage identification model after verification.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of the preceding claims 1-5 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1-5.
CN201911320994.4A 2019-12-19 2019-12-19 Webpage skipping method and device Pending CN111104618A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911320994.4A CN111104618A (en) 2019-12-19 2019-12-19 Webpage skipping method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911320994.4A CN111104618A (en) 2019-12-19 2019-12-19 Webpage skipping method and device

Publications (1)

Publication Number Publication Date
CN111104618A true CN111104618A (en) 2020-05-05

Family

ID=70423665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911320994.4A Pending CN111104618A (en) 2019-12-19 2019-12-19 Webpage skipping method and device

Country Status (1)

Country Link
CN (1) CN111104618A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016173200A1 (en) * 2015-04-30 2016-11-03 安一恒通(北京)科技有限公司 Malicious website detection method and system
CN108171049A (en) * 2017-12-27 2018-06-15 深圳豪客互联网有限公司 A kind of malicious link clicks control method and control system
CN108683666A (en) * 2018-05-16 2018-10-19 新华三信息安全技术有限公司 A kind of web page identification method and device
CN109981664A (en) * 2019-03-29 2019-07-05 北京致远互联软件股份有限公司 Website logging method, device and the realization device of page end
CN110020075A (en) * 2017-10-20 2019-07-16 南京烽火软件科技有限公司 Device is excavated in illegal website automatically
CN110046310A (en) * 2019-04-03 2019-07-23 北京字节跳动网络技术有限公司 The method and apparatus for analyzing the redirected link in the page
CN110572359A (en) * 2019-08-01 2019-12-13 杭州安恒信息技术股份有限公司 Phishing webpage detection method based on machine learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016173200A1 (en) * 2015-04-30 2016-11-03 安一恒通(北京)科技有限公司 Malicious website detection method and system
CN110020075A (en) * 2017-10-20 2019-07-16 南京烽火软件科技有限公司 Device is excavated in illegal website automatically
CN108171049A (en) * 2017-12-27 2018-06-15 深圳豪客互联网有限公司 A kind of malicious link clicks control method and control system
CN108683666A (en) * 2018-05-16 2018-10-19 新华三信息安全技术有限公司 A kind of web page identification method and device
CN109981664A (en) * 2019-03-29 2019-07-05 北京致远互联软件股份有限公司 Website logging method, device and the realization device of page end
CN110046310A (en) * 2019-04-03 2019-07-23 北京字节跳动网络技术有限公司 The method and apparatus for analyzing the redirected link in the page
CN110572359A (en) * 2019-08-01 2019-12-13 杭州安恒信息技术股份有限公司 Phishing webpage detection method based on machine learning

Similar Documents

Publication Publication Date Title
CN109922052B (en) Malicious URL detection method combining multiple features
CN111401416B (en) Abnormal website identification method and device and abnormal countermeasure identification method
CN107204960B (en) Webpage identification method and device and server
CN107908959B (en) Website information detection method and device, electronic equipment and storage medium
CN104168293B (en) The method and system of suspicious fishing webpage are recognized with reference to local content rule base
CN108566399B (en) Phishing website identification method and system
CN104766014A (en) Method and system used for detecting malicious website
CN103685307A (en) Method, system, client and server for detecting phishing fraud webpage based on feature library
CN104486140A (en) Device and method for detecting hijacking of web page
CN104158828B (en) The method and system of suspicious fishing webpage are identified based on cloud content rule base
CN104980404B (en) Method and system for protecting account information security
CN111737692B (en) Application program risk detection method and device, equipment and storage medium
CN109104421B (en) Website content tampering detection method, device, equipment and readable storage medium
CN103279710A (en) Method and system for detecting malicious codes of Internet information system
CN110782374A (en) Electronic evidence obtaining method and system based on block chain
CN105488400A (en) Comprehensive detection method and system of malicious webpage
CN111008405A (en) Website fingerprint identification method based on file Hash
CN107784107B (en) Dark chain detection method and device based on escape behavior analysis
CN104080058A (en) Information processing method and device
CN104978423A (en) Website type detection method and apparatus
CN108804501B (en) Method and device for detecting effective information
CN111131236A (en) Web fingerprint detection device, method, equipment and medium
CN109684844B (en) Webshell detection method and device, computing equipment and computer-readable storage medium
CN110457603B (en) User relationship extraction method and device, electronic equipment and readable storage medium
KR20190040046A (en) Information collection system, information collection method and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200505