CN107862050A - A kind of web site contents safety detecting system and method - Google Patents

A kind of web site contents safety detecting system and method Download PDF

Info

Publication number
CN107862050A
CN107862050A CN201711090519.3A CN201711090519A CN107862050A CN 107862050 A CN107862050 A CN 107862050A CN 201711090519 A CN201711090519 A CN 201711090519A CN 107862050 A CN107862050 A CN 107862050A
Authority
CN
China
Prior art keywords
module
network address
web site
site contents
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711090519.3A
Other languages
Chinese (zh)
Inventor
王电钢
龚艳
母继元
毛启均
常健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Sichuan Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Sichuan Electric Power Co Ltd
Original Assignee
State Grid Sichuan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Sichuan Electric Power Co Ltd filed Critical State Grid Sichuan Electric Power Co Ltd
Priority to CN201711090519.3A priority Critical patent/CN107862050A/en
Publication of CN107862050A publication Critical patent/CN107862050A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a kind of web site contents safety detecting system and method, including front end request module:URL network address to be detected is inputted, submits request to arrive reptile module;Reptile module:Crawl the pictorial information of target URL network address;Characteristic extracting module:The pictorial information of the pictorial information of reptile module and samples pictures module is extracted as characteristic vector;Model trainer:The characteristic vector of samples pictures is generated into grader by way of supervised learning;FPGA hardware accelerator:Function of hardware acceleration is provided to characteristic extracting module;Safe arbitration modules:Classification results according to grader to picture feature, calculate the safety coefficient of target URL network address.The present invention passes through above-mentioned principle, input using sample image feature as model trainer obtains grader, characteristic extracting module algorithm is accelerated using FPGA hardware accelerator with lifting system response speed, to realize the purpose of quick, efficient and accurate web site contents safety detection.

Description

A kind of web site contents safety detecting system and method
Technical field
The present invention relates to technical field of network security, and in particular to a kind of web site contents safety detecting system and method.
Background technology
With the development of Internet technology, web application brings great convenience for the life of people, greatly rich The rich circulation way of information.But some illegal molecules seek profit by making the websites such as fishing, gambling and pornographic for oneself Benefit, great potential safety hazard is brought to the safe and healthy online of people.Therefore, the detection of malicious websites has become one sternly The network security problem of weight.
Detection to malicious web pages at present mainly includes two methods of static nature detection and behavioral characteristics detection.It is static special Sign detection includes entering the DNS information of webpage, WHOIS information, URL syntax feature, HTML content and JavaScript code etc. Row analysis;Behavioral characteristics detection includes analyzing linking the relation that redirects, browser behavior and registration table change etc., uses machine It is also the supplement to above-mentioned two classes way that the mode of device study, which carries out classification and Detection to webpage,.In addition, using Honeypot Techniques to disliking Meaning webpage carries out detection and more ripe way.
In document《Beyond Blacklists:Learning to Detect Malicious Web Sites from Suspicious URLs》In, the researcher such as Justin is according to DNS information, WHOIS information and URL syntax feature, using machine The URL of malice is identified the mode of device study.Which has the following disadvantages:(1) some malice URL in grammar property and There is no express malice feature on WHOIS log-on messages, have a great similitude with normal URL, rate of false alarm is higher;(2) lack pair Webpage JavaScript and HTML content analysis, only judge URL security by analyzing DNS, WHOIS and URL information It is unilateral.
In document《Prophiler:A Fast Filter for the Large-Scale Detection of Malicious Web Pages》In, Davide is added on Justin Research foundation to webpage Javascript and HTML The analysis of feature, the recognition accuracy to malicious websites is improved by the detection to web page contents;In paper《Dug based on data The design of the Trojan horse detection system of pick and machine learning and realization》In, Shi Yu by extracting web page characteristics, and using machine learning and The mode of BP neural network is classified to webpage, so as to reach the identification to malicious websites.Both the above method is compared with Justin Research have and be extremely improved, but all ignore the problem of several important:(1) to the classification of web page contents, especially to figure The classification of piece, using performance when SVM models or BP neural network complicated classification image and bad, easily produce larger inclined Difference;(2) great expense can be brought to system using the mode of machine learning or deep learning web page contents of classifying, for present The popular measure by using hardware-accelerated mode lifting system response speed, the two does not do similar acceleration processing.
The content of the invention
The technical problems to be solved by the invention are to lift the response speed of website content safety detection, in webpage Appearance is analyzed, and reduces rate of false alarm, and it is an object of the present invention to provide a kind of web site contents safety detecting system and method, special with sample image Levy and obtain grader as the input of model trainer, characteristic extracting module algorithm is accelerated using FPGA hardware accelerator With lifting system response speed, the purpose of quick, efficient and accurate web site contents safety detection is realized.
The present invention is achieved through the following technical solutions:
A kind of web site contents safety detecting system, including
Front end request module:URL network address to be detected is inputted, submits request to arrive reptile module;
Reptile module:Crawl the pictorial information of target URL network address;
Characteristic extracting module:The pictorial information of the pictorial information of reptile module and samples pictures module is extracted and is characterized Vector;
Model trainer:The characteristic vector of samples pictures is generated into grader by way of supervised learning;
FPGA hardware accelerator:Function of hardware acceleration is provided to characteristic extracting module;
Safe arbitration modules:Classification results according to grader to picture feature, calculate the safety system of target URL network address Number;
Data memory module:The pictorial information that storage reptile module crawls, stores the testing result information to target URL;
Responsor:Forward end request module returns to target URL safety coefficient.
This programme carries out safety detection, characteristic extracting module extraction figure to web site contents by using the mode of machine learning As feature, model trainer obtains grader according to the sample image features training of extraction, and grader is according to characteristics of image to figure As being classified, realize and image be subjected to classification judgement malice URL will not had on grammar property and WHOIS log-on messages There is express malice feature, obscure with normal URL phases, erroneous judgement occurs, the determination methods deviation of this programme is small, rate of false alarm bottom, and Characteristic extracting module algorithm is accelerated using FPGA hardware accelerator with lifting system response speed, realized quick, efficient And the purpose of accurate web site contents safety detection.
Preferably, FPGA hardware accelerator uses the reconfigurable acceleration storehouses of Xilinx, with reference to Caffe machine learning frameworks It is achieved with Xilinx deep neural network DNN storehouses.
Preferably, Caffe machine learning framework is the integrated framework of a CNN convolutional neural networks deep learning.It is existing When technology uses SVM models or BP neural network complicated classification image, larger deviation is easily produced, and this programme grader Text and image content will be crawled to obtain, image feature vector is extracted by using the method for CNN convolutional neural networks deep learnings, Input using sample image feature as model trainer obtains the line of grader, when analyzing complicated image compared with SVM models Or BP neural network sorting algorithm is not likely to produce deviation, website the selection result is more accurate.This programme characteristic extracting module uses The reconfigurable acceleration for accelerating storehouse FPGA hardware accelerator to carry out core algorithm of Xilinx, greatly improves the response of system Speed.
Preferably, safe arbitration modules are by being labeled whether non-security number of pictures exceedes given threshold, to calculate Obtain targeted website safety coefficient.
Preferably, samples pictures module includes normal picture and improper picture, and improper picture, which refers to, gambling and pornographic The picture of feature.
A kind of web site contents safety detection method, comprises the following steps:
S1:The pictorial information of samples pictures module is extracted as the form of characteristic vector by characteristic extracting module;
S2:The sampling feature vectors that S1 is obtained are input, and model trainer generates classification using the mode of supervised learning Device;
S3:In front end, request module inputs URL network address to be detected, detects the legitimacy of the network address, the network address is submitted To reptile module;
S4:Reptile module receives the URL network address sent from front end request module, crawls the picture letter of target URL network address Breath, and content storage will be crawled and arrive data memory module;
S5:The characteristic vector for the picture that characteristic extracting module extraction S4 is crawled;
S6:The image crawled is classified as input, grader using the image feature vector of S5 extractions;
S7:Safe arbitration modules calculate the safety coefficient of target network address according to S6 classification results, and with target URL nets Location, local picture path, detection time and the safety coefficient for preserving targeted website are stored;
S8:The testing result of target network address is sent to front end request module by respond module.
Preferably, characteristic extracting module is accelerated using FPGA accelerators to picture feature extraction algorithm.
Preferably, FPGA hardware accelerator uses the reconfigurable acceleration storehouses of Xilinx, with reference to Caffe machine learning frameworks It is achieved with Xilinx deep neural network DNN storehouses, Caffe machine learning framework is a CNN convolutional neural networks depth The integrated framework of study.
The present invention compared with prior art, has the following advantages and advantages:
1st, input of the present invention using sample image feature as model trainer obtains grader, by using machine learning Mode safety detection is carried out to web site contents, and picture feature extraction algorithm is accelerated using FPGA accelerators, realized A kind of web site contents realize quick, efficient and accurate web site contents safety detection.
2nd, the text crawled and image content are carried out image spy by grader of the present invention using the mode of CNN deep learnings The extraction of sign, when analyzing complicated image compared with SVM models or BP neural network sorting algorithm, larger deviation is not likely to produce, is carried Take effect more preferable.
3rd, extraction module of the present invention accelerates storehouse FPGA hardware accelerator to carry out core algorithm using Xilinx is reconfigurable Acceleration, greatly improve the response speed of system.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding the embodiment of the present invention, forms one of the application Point, do not form the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is schematic structural view of the invention;
Fig. 2, which is that Xilinx is reconfigurable, accelerates protocol stack schematic diagram.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, with reference to embodiment and accompanying drawing, to this Invention is described in further detail, and exemplary embodiment of the invention and its explanation are only used for explaining the present invention, do not make For limitation of the invention.
Embodiment 1:
As shown in Figure 1-2, the present invention includes a kind of web site contents safety detecting system, including
Front end request module:URL network address to be detected is inputted, submits request to arrive reptile module;
Reptile module:Crawl the pictorial information of target URL network address;
Characteristic extracting module:The pictorial information of the pictorial information of reptile module and samples pictures module is extracted and is characterized Vector;
Model trainer:The characteristic vector of samples pictures is generated into grader by way of supervised learning;
FPGA hardware accelerator:Function of hardware acceleration is provided to characteristic extracting module;
Safe arbitration modules:Classification results according to grader to picture feature, calculate the safety system of target URL network address Number;
Data memory module:The pictorial information that storage reptile module crawls, stores the testing result information to target URL;
Responsor:Forward end request module returns to target URL safety coefficient.
The existing system to malicious websites detection does not have to some malice URL on grammar property and WHOIS log-on messages Express malice feature, there is the webpage of great similitude with normal URL, and rate of false alarm is higher;Lack simultaneously to webpage JavaScript and HTML content analysis, only judge URL security by analyzing DNS, WHOIS and URL information, judge It is very unilateral;Classification to web page contents, the especially classification to complicated image, larger deviation is easily produced, influenceed most Whole judged result;Classified by the way of machine learning or deep learning web page contents, system low-response, influence efficiency.
This programme carries out safety detection, characteristic extracting module extraction figure to web site contents by using the mode of machine learning As feature, model trainer obtains grader according to the sample image features training of extraction, and grader is according to characteristics of image to figure As being classified, realize and image be subjected to classification judgement malice URL will not had on grammar property and WHOIS log-on messages There is express malice feature, obscure with normal URL phases, erroneous judgement occurs, the determination methods deviation of this programme is small, rate of false alarm bottom, and Characteristic extracting module algorithm is accelerated using FPGA hardware accelerator with lifting system response speed, realized quick, efficient And the purpose of accurate web site contents safety detection.
Embodiment 2:
The present embodiment is preferably as follows on the basis of embodiment 1:FPGA hardware accelerator adds using Xilinx is reconfigurable Fast storehouse, it is achieved with reference to Caffe machine learning framework and Xilinx deep neural network DNN storehouses.
Caffe machine learning framework is the integrated framework of a CNN convolutional neural networks deep learning.Prior art uses When SVM models or BP neural network complicated classification image, larger deviation is easily produced, and this programme grader will crawl Text and image content, image feature vector is extracted by using the method for CNN convolutional neural networks deep learnings, with sample graph As input of the feature as model trainer obtains the line of grader, when analyzing complicated image compared with SVM models or BP nerves Meshsort algorithm is not likely to produce deviation, and website the selection result is more accurate.This programme characteristic extracting module can be weighed using Xilinx Configuration accelerates the acceleration of storehouse FPGA hardware accelerator progress core algorithm, greatly improves the response speed of system.
Safe arbitration modules are by being labeled whether non-security number of pictures exceedes given threshold, target is calculated Web portal security coefficient.
Samples pictures module includes normal picture and improper picture, and improper picture, which refers to, the features such as gambling and pornographic Picture.The grader generated by samples pictures module, for judging whether the picture of URL network address is that improper picture judges accurate True rate is high.
Embodiment 3:
A kind of web site contents safety detection method, comprises the following steps:
S1:The pictorial information of samples pictures module is extracted as the form of characteristic vector by characteristic extracting module;
S2:The sampling feature vectors that S1 is obtained are input, and model trainer generates classification using the mode of supervised learning Device;
S3:In front end, request module inputs URL network address to be detected, detects the legitimacy of the network address, the network address is submitted To reptile module;
S4:Reptile module receives the URL network address sent from front end request module, crawls the picture letter of target URL network address Breath, and content storage will be crawled and arrive data memory module;
S5:The characteristic vector for the picture that characteristic extracting module extraction S4 is crawled;
S6:The image crawled is classified as input, grader using the image feature vector of S5 extractions;
S7:Safe arbitration modules calculate the safety coefficient of target network address according to S6 classification results, and with target URL nets Location, local picture path, detection time and the safety coefficient for preserving targeted website are stored;
S8:The testing result of target network address is sent to front end request module by respond module.
Characteristic extracting module is accelerated using FPGA accelerators to picture feature extraction algorithm.
FPGA hardware accelerator uses the reconfigurable acceleration storehouses of Xilinx, with reference to Caffe machine learning framework and Xilinx deep neural network DNN storehouses are achieved, and Caffe machine learning framework is a CNN convolutional neural networks depth The integrated framework of habit.
This programme first step is converted training set samples pictures using the convert_imageset methods of Caffe frameworks For the .leveldb files that it can run ,-resize_width and-resize_height parameters are used when calling this method Option is consistent training set samples pictures size, and the resolution ratio after the image correction that this method uses is 256*256, and Training set samples pictures are all pre- to first pass through label process.
Second step, the extract_features methods of Caffe frameworks are continuing with to .leveldb generated above File extracts sample image feature in the form of characteristic vector, and calls Xilinx is reconfigurable to accelerate stack depth neutral net storehouse DNN is hardware-accelerated to process progress, to lift the speed of service of the module.
Third step, Boot Model training aids, by defining name.prototxt and name_solver.prototxt texts Part, using the model training train methods and its parameter of Caffe frameworks -- the characteristic vector that solver obtains to step 2 uses The mode training pattern of supervised learning, the process are constantly corrected to model using fine-turning operations, ultimately generated With number of tags identical and the grader that can be divided to sensitive (gambling, pornographic etc.) picture.
Four steps, using Html, CSS and written in JavaScript front-end interface, in front end, input frame, which is filled in, to detect Target URL, detect the legitimacy of the URL, whether the content of such as input may cause XSS, SQL injection security breaches.If The URL of input is legal, and the URL is sent into reptile module using ajax the post () methods in JQuery storehouses.
5th step, reptile module receive the URL detection requests of front end request module, use Python Scrapy frames Frame crawls pictorial information to target URL, and is preserved the picture crawled in a manner of local file stores.
6th step, similar to step 1, the picture crawled to step 5, which carries out size revision and generation Caffe, to be transported Capable .leveldb files.And use using the picture that step 5 crawls as test set characteristic extracting module extraction reptile image Characteristic vector, reptile image is classified according to this feature vector using the grader that step 3 generates, by sensitive image mark It is designated as non-security image.
7th step, safe arbitration modules are calculated by being labeled whether non-security number of pictures exceedes given threshold Targeted website safety coefficient is obtained, and with target URL network address, local picture path, detection time and the peace for preserving targeted website Overall coefficient etc. is field data storage memory module.
8th step, responsor forward end request module send this target URL safety detection data.
This method first captures the pictorial information for needing to detect website, after carrying out intelligent classification by grader, is calculated Accurately detection web portal security coefficient value, is then returned to front end request module and shows.This programme is by using machine learning Mode carries out safety detection, characteristic extracting module extraction characteristics of image, sample of the model trainer according to extraction to web site contents Characteristics of image is trained to obtain grader, and grader is classified according to characteristics of image to image, realizes that image is carried out into classification sentences Disconnected, deviation is small, rate of false alarm bottom, and characteristic extracting module algorithm is accelerated to ring with lifting system using FPGA hardware accelerator Speed is answered, realizes the purpose of quick, efficient and accurate web site contents safety detection.
Above-described embodiment, the purpose of the present invention, technical scheme and beneficial effect are carried out further Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc., all should include Within protection scope of the present invention.

Claims (8)

  1. A kind of 1. web site contents safety detecting system, it is characterised in that including
    Front end request module:URL network address to be detected is inputted, submits request to arrive reptile module;
    Reptile module:Crawl the pictorial information of target URL network address;
    Characteristic extracting module:The pictorial information of reptile module and the pictorial information of samples pictures module are extracted be characterized to Amount;
    Model trainer:The characteristic vector of samples pictures is generated into grader by way of supervised learning;
    FPGA hardware accelerator:Function of hardware acceleration is provided to characteristic extracting module;
    Safe arbitration modules:Classification results according to grader to picture feature, calculate the safety coefficient of target URL network address;
    Data memory module:The pictorial information that storage reptile module crawls, stores the testing result information to target URL;
    Responsor:Forward end request module returns to target URL safety coefficient.
  2. 2. a kind of web site contents safety detecting system according to claim 1, it is characterised in that FPGA hardware accelerator makes With the reconfigurable acceleration storehouses of Xilinx, give reality with reference to Caffe machine learning framework and Xilinx deep neural network DNN storehouses It is existing.
  3. A kind of 3. web site contents safety detecting system according to claim 2, it is characterised in that Caffe machine learning frames Frame is the integrated framework of a CNN convolutional neural networks deep learning.
  4. 4. a kind of web site contents safety detecting system according to claim 1, it is characterised in that safe arbitration modules pass through It is labeled whether non-security number of pictures exceedes given threshold, targeted website safety coefficient is calculated.
  5. 5. a kind of web site contents safety detecting system according to claim 1, it is characterised in that samples pictures module includes Normal picture and improper picture, improper picture refer to the picture for having gambling and pornographic feature.
  6. 6. a kind of web site contents safety detection method, it is characterised in that comprise the following steps:
    S1:The pictorial information of samples pictures module is extracted as the form of characteristic vector by characteristic extracting module;
    S2:The sampling feature vectors that S1 is obtained are input, and model trainer generates grader using the mode of supervised learning;
    S3:In front end, request module inputs URL network address to be detected, detects the legitimacy of the network address, the network address is submitted to and climbed Erpoglyph block;
    S4:Reptile module receives the URL network address sent from front end request module, crawls the pictorial information of target URL network address, and Content storage will be crawled and arrive data memory module;
    S5:The characteristic vector for the picture that characteristic extracting module extraction S4 is crawled;
    S6:The image crawled is classified as input, grader using the image feature vector of S5 extractions;
    S7:Safe arbitration modules calculate the safety coefficient of target network address according to S6 classification results, and with target URL network address, this Picture path, detection time and the safety coefficient that ground preserves targeted website are stored;
    S8:The testing result of target network address is sent to front end request module by respond module.
  7. 7. a kind of web site contents safety detection method according to claim 6, it is characterised in that characteristic extracting module uses FPGA hardware accelerator accelerates to picture feature extraction algorithm.
  8. 8. a kind of web site contents safety detection method according to claim 7, it is characterised in that FPGA hardware accelerator makes With the reconfigurable acceleration storehouses of Xilinx, give reality with reference to Caffe machine learning framework and Xilinx deep neural network DNN storehouses Existing, Caffe machine learning framework is the integrated framework of a CNN convolutional neural networks deep learning.
CN201711090519.3A 2017-11-08 2017-11-08 A kind of web site contents safety detecting system and method Pending CN107862050A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711090519.3A CN107862050A (en) 2017-11-08 2017-11-08 A kind of web site contents safety detecting system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711090519.3A CN107862050A (en) 2017-11-08 2017-11-08 A kind of web site contents safety detecting system and method

Publications (1)

Publication Number Publication Date
CN107862050A true CN107862050A (en) 2018-03-30

Family

ID=61701187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711090519.3A Pending CN107862050A (en) 2017-11-08 2017-11-08 A kind of web site contents safety detecting system and method

Country Status (1)

Country Link
CN (1) CN107862050A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110275958A (en) * 2019-06-26 2019-09-24 北京市博汇科技股份有限公司 Site information recognition methods, device and electronic equipment
CN110633226A (en) * 2018-06-22 2019-12-31 武汉海康存储技术有限公司 Fusion memory, storage system and deep learning calculation method
CN111091019A (en) * 2019-12-23 2020-05-01 支付宝(杭州)信息技术有限公司 Information prompting method, device and equipment
CN111401115A (en) * 2019-08-01 2020-07-10 江苏农林职业技术学院 Strawberry disease and pest hyperspectral data processing method and device based on FPGA
CN111475699A (en) * 2020-03-07 2020-07-31 咪咕文化科技有限公司 Website data crawling method and device, electronic equipment and readable storage medium
CN111626309A (en) * 2020-05-26 2020-09-04 北京墨云科技有限公司 Website fingerprint identification method based on deep learning
CN111651658A (en) * 2020-06-05 2020-09-11 杭州安恒信息技术股份有限公司 Method and computer equipment for automatically identifying website based on deep learning
CN112731305A (en) * 2020-12-17 2021-04-30 国网四川省电力公司信息通信公司 Direct wave suppression method and system based on adaptive Doppler domain beam cancellation
CN113657453A (en) * 2021-07-22 2021-11-16 珠海高凌信息科技股份有限公司 Harmful website detection method based on generation of countermeasure network and deep learning
US11609989B2 (en) 2019-03-26 2023-03-21 Proofpoint, Inc. Uniform resource locator classifier and visual comparison platform for malicious site detection

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968813A (en) * 2010-10-25 2011-02-09 华北电力大学 Method for detecting counterfeit webpage
US20140270350A1 (en) * 2013-03-14 2014-09-18 Xerox Corporation Data driven localization using task-dependent representations
CN106776946A (en) * 2016-12-02 2017-05-31 重庆大学 A kind of detection method of fraudulent website

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968813A (en) * 2010-10-25 2011-02-09 华北电力大学 Method for detecting counterfeit webpage
US20140270350A1 (en) * 2013-03-14 2014-09-18 Xerox Corporation Data driven localization using task-dependent representations
CN106776946A (en) * 2016-12-02 2017-05-31 重庆大学 A kind of detection method of fraudulent website

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633226A (en) * 2018-06-22 2019-12-31 武汉海康存储技术有限公司 Fusion memory, storage system and deep learning calculation method
US11609989B2 (en) 2019-03-26 2023-03-21 Proofpoint, Inc. Uniform resource locator classifier and visual comparison platform for malicious site detection
US11924246B2 (en) 2019-03-26 2024-03-05 Proofpoint, Inc. Uniform resource locator classifier and visual comparison platform for malicious site detection preliminary
US11799905B2 (en) 2019-03-26 2023-10-24 Proofpoint, Inc. Uniform resource locator classifier and visual comparison platform for malicious site detection
CN110275958A (en) * 2019-06-26 2019-09-24 北京市博汇科技股份有限公司 Site information recognition methods, device and electronic equipment
CN110275958B (en) * 2019-06-26 2021-07-27 北京市博汇科技股份有限公司 Website information identification method and device and electronic equipment
CN111401115A (en) * 2019-08-01 2020-07-10 江苏农林职业技术学院 Strawberry disease and pest hyperspectral data processing method and device based on FPGA
CN111401115B (en) * 2019-08-01 2024-02-27 江苏农林职业技术学院 Method and device for processing strawberry disease and pest hyperspectral data based on FPGA
CN111091019A (en) * 2019-12-23 2020-05-01 支付宝(杭州)信息技术有限公司 Information prompting method, device and equipment
CN111475699A (en) * 2020-03-07 2020-07-31 咪咕文化科技有限公司 Website data crawling method and device, electronic equipment and readable storage medium
CN111475699B (en) * 2020-03-07 2023-09-08 咪咕文化科技有限公司 Website data crawling method and device, electronic equipment and readable storage medium
CN111626309A (en) * 2020-05-26 2020-09-04 北京墨云科技有限公司 Website fingerprint identification method based on deep learning
CN111651658A (en) * 2020-06-05 2020-09-11 杭州安恒信息技术股份有限公司 Method and computer equipment for automatically identifying website based on deep learning
CN112731305A (en) * 2020-12-17 2021-04-30 国网四川省电力公司信息通信公司 Direct wave suppression method and system based on adaptive Doppler domain beam cancellation
CN113657453A (en) * 2021-07-22 2021-11-16 珠海高凌信息科技股份有限公司 Harmful website detection method based on generation of countermeasure network and deep learning
CN113657453B (en) * 2021-07-22 2023-08-01 珠海高凌信息科技股份有限公司 Detection method based on harmful website generating countermeasure network and deep learning

Similar Documents

Publication Publication Date Title
CN107862050A (en) A kind of web site contents safety detecting system and method
CN110210617B (en) Confrontation sample generation method and generation device based on feature enhancement
CN104077396B (en) Method and device for detecting phishing website
CN110808968B (en) Network attack detection method and device, electronic equipment and readable storage medium
CN105072089B (en) A kind of WEB malice scanning behavior method for detecting abnormality and system
CN103577755A (en) Malicious script static detection method based on SVM (support vector machine)
CN109005145A (en) A kind of malice URL detection system and its method extracted based on automated characterization
CN107786575A (en) A kind of adaptive malice domain name detection method based on DNS flows
CN109510815A (en) A kind of multistage detection method for phishing site and detection system based on supervised learning
CN107220386A (en) Information-pushing method and device
CN107659570A (en) Webshell detection methods and system based on machine learning and static and dynamic analysis
CN109873810B (en) Network fishing detection method based on goblet sea squirt group algorithm support vector machine
CN107957872A (en) A kind of full web site source code acquisition methods and illegal website detection method, system
Saini et al. A review of bot protection using CAPTCHA for web security
US20230385409A1 (en) Unstructured text classification
CN109858248B (en) Malicious Word document detection method and device
CN104504335B (en) Fishing APP detection methods and system based on page feature and URL features
CN103810425A (en) Method and device for detecting malicious website
EP3703329B1 (en) Webpage request identification
CN110191096A (en) A kind of term vector homepage invasion detection method based on semantic analysis
CN108337255A (en) A kind of detection method for phishing site learnt based on web automatic tests and width
WO2014029318A1 (en) Method and apparatus for identifying webpage type
CN107046586A (en) A kind of algorithm generation domain name detection method based on natural language feature
CN106169050B (en) A kind of PoC Program extraction method based on webpage Knowledge Discovery
CN104158828A (en) Method and system for identifying doubtful phishing webpage on basis of cloud content rule base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180330

RJ01 Rejection of invention patent application after publication