CN107862050A - A kind of web site contents safety detecting system and method - Google Patents
A kind of web site contents safety detecting system and method Download PDFInfo
- Publication number
- CN107862050A CN107862050A CN201711090519.3A CN201711090519A CN107862050A CN 107862050 A CN107862050 A CN 107862050A CN 201711090519 A CN201711090519 A CN 201711090519A CN 107862050 A CN107862050 A CN 107862050A
- Authority
- CN
- China
- Prior art keywords
- module
- network address
- web site
- site contents
- picture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention discloses a kind of web site contents safety detecting system and method, including front end request module:URL network address to be detected is inputted, submits request to arrive reptile module;Reptile module:Crawl the pictorial information of target URL network address;Characteristic extracting module:The pictorial information of the pictorial information of reptile module and samples pictures module is extracted as characteristic vector;Model trainer:The characteristic vector of samples pictures is generated into grader by way of supervised learning;FPGA hardware accelerator:Function of hardware acceleration is provided to characteristic extracting module;Safe arbitration modules:Classification results according to grader to picture feature, calculate the safety coefficient of target URL network address.The present invention passes through above-mentioned principle, input using sample image feature as model trainer obtains grader, characteristic extracting module algorithm is accelerated using FPGA hardware accelerator with lifting system response speed, to realize the purpose of quick, efficient and accurate web site contents safety detection.
Description
Technical field
The present invention relates to technical field of network security, and in particular to a kind of web site contents safety detecting system and method.
Background technology
With the development of Internet technology, web application brings great convenience for the life of people, greatly rich
The rich circulation way of information.But some illegal molecules seek profit by making the websites such as fishing, gambling and pornographic for oneself
Benefit, great potential safety hazard is brought to the safe and healthy online of people.Therefore, the detection of malicious websites has become one sternly
The network security problem of weight.
Detection to malicious web pages at present mainly includes two methods of static nature detection and behavioral characteristics detection.It is static special
Sign detection includes entering the DNS information of webpage, WHOIS information, URL syntax feature, HTML content and JavaScript code etc.
Row analysis;Behavioral characteristics detection includes analyzing linking the relation that redirects, browser behavior and registration table change etc., uses machine
It is also the supplement to above-mentioned two classes way that the mode of device study, which carries out classification and Detection to webpage,.In addition, using Honeypot Techniques to disliking
Meaning webpage carries out detection and more ripe way.
In document《Beyond Blacklists:Learning to Detect Malicious Web Sites from
Suspicious URLs》In, the researcher such as Justin is according to DNS information, WHOIS information and URL syntax feature, using machine
The URL of malice is identified the mode of device study.Which has the following disadvantages:(1) some malice URL in grammar property and
There is no express malice feature on WHOIS log-on messages, have a great similitude with normal URL, rate of false alarm is higher;(2) lack pair
Webpage JavaScript and HTML content analysis, only judge URL security by analyzing DNS, WHOIS and URL information
It is unilateral.
In document《Prophiler:A Fast Filter for the Large-Scale Detection of
Malicious Web Pages》In, Davide is added on Justin Research foundation to webpage Javascript and HTML
The analysis of feature, the recognition accuracy to malicious websites is improved by the detection to web page contents;In paper《Dug based on data
The design of the Trojan horse detection system of pick and machine learning and realization》In, Shi Yu by extracting web page characteristics, and using machine learning and
The mode of BP neural network is classified to webpage, so as to reach the identification to malicious websites.Both the above method is compared with Justin
Research have and be extremely improved, but all ignore the problem of several important:(1) to the classification of web page contents, especially to figure
The classification of piece, using performance when SVM models or BP neural network complicated classification image and bad, easily produce larger inclined
Difference;(2) great expense can be brought to system using the mode of machine learning or deep learning web page contents of classifying, for present
The popular measure by using hardware-accelerated mode lifting system response speed, the two does not do similar acceleration processing.
The content of the invention
The technical problems to be solved by the invention are to lift the response speed of website content safety detection, in webpage
Appearance is analyzed, and reduces rate of false alarm, and it is an object of the present invention to provide a kind of web site contents safety detecting system and method, special with sample image
Levy and obtain grader as the input of model trainer, characteristic extracting module algorithm is accelerated using FPGA hardware accelerator
With lifting system response speed, the purpose of quick, efficient and accurate web site contents safety detection is realized.
The present invention is achieved through the following technical solutions:
A kind of web site contents safety detecting system, including
Front end request module:URL network address to be detected is inputted, submits request to arrive reptile module;
Reptile module:Crawl the pictorial information of target URL network address;
Characteristic extracting module:The pictorial information of the pictorial information of reptile module and samples pictures module is extracted and is characterized
Vector;
Model trainer:The characteristic vector of samples pictures is generated into grader by way of supervised learning;
FPGA hardware accelerator:Function of hardware acceleration is provided to characteristic extracting module;
Safe arbitration modules:Classification results according to grader to picture feature, calculate the safety system of target URL network address
Number;
Data memory module:The pictorial information that storage reptile module crawls, stores the testing result information to target URL;
Responsor:Forward end request module returns to target URL safety coefficient.
This programme carries out safety detection, characteristic extracting module extraction figure to web site contents by using the mode of machine learning
As feature, model trainer obtains grader according to the sample image features training of extraction, and grader is according to characteristics of image to figure
As being classified, realize and image be subjected to classification judgement malice URL will not had on grammar property and WHOIS log-on messages
There is express malice feature, obscure with normal URL phases, erroneous judgement occurs, the determination methods deviation of this programme is small, rate of false alarm bottom, and
Characteristic extracting module algorithm is accelerated using FPGA hardware accelerator with lifting system response speed, realized quick, efficient
And the purpose of accurate web site contents safety detection.
Preferably, FPGA hardware accelerator uses the reconfigurable acceleration storehouses of Xilinx, with reference to Caffe machine learning frameworks
It is achieved with Xilinx deep neural network DNN storehouses.
Preferably, Caffe machine learning framework is the integrated framework of a CNN convolutional neural networks deep learning.It is existing
When technology uses SVM models or BP neural network complicated classification image, larger deviation is easily produced, and this programme grader
Text and image content will be crawled to obtain, image feature vector is extracted by using the method for CNN convolutional neural networks deep learnings,
Input using sample image feature as model trainer obtains the line of grader, when analyzing complicated image compared with SVM models
Or BP neural network sorting algorithm is not likely to produce deviation, website the selection result is more accurate.This programme characteristic extracting module uses
The reconfigurable acceleration for accelerating storehouse FPGA hardware accelerator to carry out core algorithm of Xilinx, greatly improves the response of system
Speed.
Preferably, safe arbitration modules are by being labeled whether non-security number of pictures exceedes given threshold, to calculate
Obtain targeted website safety coefficient.
Preferably, samples pictures module includes normal picture and improper picture, and improper picture, which refers to, gambling and pornographic
The picture of feature.
A kind of web site contents safety detection method, comprises the following steps:
S1:The pictorial information of samples pictures module is extracted as the form of characteristic vector by characteristic extracting module;
S2:The sampling feature vectors that S1 is obtained are input, and model trainer generates classification using the mode of supervised learning
Device;
S3:In front end, request module inputs URL network address to be detected, detects the legitimacy of the network address, the network address is submitted
To reptile module;
S4:Reptile module receives the URL network address sent from front end request module, crawls the picture letter of target URL network address
Breath, and content storage will be crawled and arrive data memory module;
S5:The characteristic vector for the picture that characteristic extracting module extraction S4 is crawled;
S6:The image crawled is classified as input, grader using the image feature vector of S5 extractions;
S7:Safe arbitration modules calculate the safety coefficient of target network address according to S6 classification results, and with target URL nets
Location, local picture path, detection time and the safety coefficient for preserving targeted website are stored;
S8:The testing result of target network address is sent to front end request module by respond module.
Preferably, characteristic extracting module is accelerated using FPGA accelerators to picture feature extraction algorithm.
Preferably, FPGA hardware accelerator uses the reconfigurable acceleration storehouses of Xilinx, with reference to Caffe machine learning frameworks
It is achieved with Xilinx deep neural network DNN storehouses, Caffe machine learning framework is a CNN convolutional neural networks depth
The integrated framework of study.
The present invention compared with prior art, has the following advantages and advantages:
1st, input of the present invention using sample image feature as model trainer obtains grader, by using machine learning
Mode safety detection is carried out to web site contents, and picture feature extraction algorithm is accelerated using FPGA accelerators, realized
A kind of web site contents realize quick, efficient and accurate web site contents safety detection.
2nd, the text crawled and image content are carried out image spy by grader of the present invention using the mode of CNN deep learnings
The extraction of sign, when analyzing complicated image compared with SVM models or BP neural network sorting algorithm, larger deviation is not likely to produce, is carried
Take effect more preferable.
3rd, extraction module of the present invention accelerates storehouse FPGA hardware accelerator to carry out core algorithm using Xilinx is reconfigurable
Acceleration, greatly improve the response speed of system.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding the embodiment of the present invention, forms one of the application
Point, do not form the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is schematic structural view of the invention;
Fig. 2, which is that Xilinx is reconfigurable, accelerates protocol stack schematic diagram.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, with reference to embodiment and accompanying drawing, to this
Invention is described in further detail, and exemplary embodiment of the invention and its explanation are only used for explaining the present invention, do not make
For limitation of the invention.
Embodiment 1:
As shown in Figure 1-2, the present invention includes a kind of web site contents safety detecting system, including
Front end request module:URL network address to be detected is inputted, submits request to arrive reptile module;
Reptile module:Crawl the pictorial information of target URL network address;
Characteristic extracting module:The pictorial information of the pictorial information of reptile module and samples pictures module is extracted and is characterized
Vector;
Model trainer:The characteristic vector of samples pictures is generated into grader by way of supervised learning;
FPGA hardware accelerator:Function of hardware acceleration is provided to characteristic extracting module;
Safe arbitration modules:Classification results according to grader to picture feature, calculate the safety system of target URL network address
Number;
Data memory module:The pictorial information that storage reptile module crawls, stores the testing result information to target URL;
Responsor:Forward end request module returns to target URL safety coefficient.
The existing system to malicious websites detection does not have to some malice URL on grammar property and WHOIS log-on messages
Express malice feature, there is the webpage of great similitude with normal URL, and rate of false alarm is higher;Lack simultaneously to webpage
JavaScript and HTML content analysis, only judge URL security by analyzing DNS, WHOIS and URL information, judge
It is very unilateral;Classification to web page contents, the especially classification to complicated image, larger deviation is easily produced, influenceed most
Whole judged result;Classified by the way of machine learning or deep learning web page contents, system low-response, influence efficiency.
This programme carries out safety detection, characteristic extracting module extraction figure to web site contents by using the mode of machine learning
As feature, model trainer obtains grader according to the sample image features training of extraction, and grader is according to characteristics of image to figure
As being classified, realize and image be subjected to classification judgement malice URL will not had on grammar property and WHOIS log-on messages
There is express malice feature, obscure with normal URL phases, erroneous judgement occurs, the determination methods deviation of this programme is small, rate of false alarm bottom, and
Characteristic extracting module algorithm is accelerated using FPGA hardware accelerator with lifting system response speed, realized quick, efficient
And the purpose of accurate web site contents safety detection.
Embodiment 2:
The present embodiment is preferably as follows on the basis of embodiment 1:FPGA hardware accelerator adds using Xilinx is reconfigurable
Fast storehouse, it is achieved with reference to Caffe machine learning framework and Xilinx deep neural network DNN storehouses.
Caffe machine learning framework is the integrated framework of a CNN convolutional neural networks deep learning.Prior art uses
When SVM models or BP neural network complicated classification image, larger deviation is easily produced, and this programme grader will crawl
Text and image content, image feature vector is extracted by using the method for CNN convolutional neural networks deep learnings, with sample graph
As input of the feature as model trainer obtains the line of grader, when analyzing complicated image compared with SVM models or BP nerves
Meshsort algorithm is not likely to produce deviation, and website the selection result is more accurate.This programme characteristic extracting module can be weighed using Xilinx
Configuration accelerates the acceleration of storehouse FPGA hardware accelerator progress core algorithm, greatly improves the response speed of system.
Safe arbitration modules are by being labeled whether non-security number of pictures exceedes given threshold, target is calculated
Web portal security coefficient.
Samples pictures module includes normal picture and improper picture, and improper picture, which refers to, the features such as gambling and pornographic
Picture.The grader generated by samples pictures module, for judging whether the picture of URL network address is that improper picture judges accurate
True rate is high.
Embodiment 3:
A kind of web site contents safety detection method, comprises the following steps:
S1:The pictorial information of samples pictures module is extracted as the form of characteristic vector by characteristic extracting module;
S2:The sampling feature vectors that S1 is obtained are input, and model trainer generates classification using the mode of supervised learning
Device;
S3:In front end, request module inputs URL network address to be detected, detects the legitimacy of the network address, the network address is submitted
To reptile module;
S4:Reptile module receives the URL network address sent from front end request module, crawls the picture letter of target URL network address
Breath, and content storage will be crawled and arrive data memory module;
S5:The characteristic vector for the picture that characteristic extracting module extraction S4 is crawled;
S6:The image crawled is classified as input, grader using the image feature vector of S5 extractions;
S7:Safe arbitration modules calculate the safety coefficient of target network address according to S6 classification results, and with target URL nets
Location, local picture path, detection time and the safety coefficient for preserving targeted website are stored;
S8:The testing result of target network address is sent to front end request module by respond module.
Characteristic extracting module is accelerated using FPGA accelerators to picture feature extraction algorithm.
FPGA hardware accelerator uses the reconfigurable acceleration storehouses of Xilinx, with reference to Caffe machine learning framework and
Xilinx deep neural network DNN storehouses are achieved, and Caffe machine learning framework is a CNN convolutional neural networks depth
The integrated framework of habit.
This programme first step is converted training set samples pictures using the convert_imageset methods of Caffe frameworks
For the .leveldb files that it can run ,-resize_width and-resize_height parameters are used when calling this method
Option is consistent training set samples pictures size, and the resolution ratio after the image correction that this method uses is 256*256, and
Training set samples pictures are all pre- to first pass through label process.
Second step, the extract_features methods of Caffe frameworks are continuing with to .leveldb generated above
File extracts sample image feature in the form of characteristic vector, and calls Xilinx is reconfigurable to accelerate stack depth neutral net storehouse
DNN is hardware-accelerated to process progress, to lift the speed of service of the module.
Third step, Boot Model training aids, by defining name.prototxt and name_solver.prototxt texts
Part, using the model training train methods and its parameter of Caffe frameworks -- the characteristic vector that solver obtains to step 2 uses
The mode training pattern of supervised learning, the process are constantly corrected to model using fine-turning operations, ultimately generated
With number of tags identical and the grader that can be divided to sensitive (gambling, pornographic etc.) picture.
Four steps, using Html, CSS and written in JavaScript front-end interface, in front end, input frame, which is filled in, to detect
Target URL, detect the legitimacy of the URL, whether the content of such as input may cause XSS, SQL injection security breaches.If
The URL of input is legal, and the URL is sent into reptile module using ajax the post () methods in JQuery storehouses.
5th step, reptile module receive the URL detection requests of front end request module, use Python Scrapy frames
Frame crawls pictorial information to target URL, and is preserved the picture crawled in a manner of local file stores.
6th step, similar to step 1, the picture crawled to step 5, which carries out size revision and generation Caffe, to be transported
Capable .leveldb files.And use using the picture that step 5 crawls as test set characteristic extracting module extraction reptile image
Characteristic vector, reptile image is classified according to this feature vector using the grader that step 3 generates, by sensitive image mark
It is designated as non-security image.
7th step, safe arbitration modules are calculated by being labeled whether non-security number of pictures exceedes given threshold
Targeted website safety coefficient is obtained, and with target URL network address, local picture path, detection time and the peace for preserving targeted website
Overall coefficient etc. is field data storage memory module.
8th step, responsor forward end request module send this target URL safety detection data.
This method first captures the pictorial information for needing to detect website, after carrying out intelligent classification by grader, is calculated
Accurately detection web portal security coefficient value, is then returned to front end request module and shows.This programme is by using machine learning
Mode carries out safety detection, characteristic extracting module extraction characteristics of image, sample of the model trainer according to extraction to web site contents
Characteristics of image is trained to obtain grader, and grader is classified according to characteristics of image to image, realizes that image is carried out into classification sentences
Disconnected, deviation is small, rate of false alarm bottom, and characteristic extracting module algorithm is accelerated to ring with lifting system using FPGA hardware accelerator
Speed is answered, realizes the purpose of quick, efficient and accurate web site contents safety detection.
Above-described embodiment, the purpose of the present invention, technical scheme and beneficial effect are carried out further
Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention
Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc., all should include
Within protection scope of the present invention.
Claims (8)
- A kind of 1. web site contents safety detecting system, it is characterised in that includingFront end request module:URL network address to be detected is inputted, submits request to arrive reptile module;Reptile module:Crawl the pictorial information of target URL network address;Characteristic extracting module:The pictorial information of reptile module and the pictorial information of samples pictures module are extracted be characterized to Amount;Model trainer:The characteristic vector of samples pictures is generated into grader by way of supervised learning;FPGA hardware accelerator:Function of hardware acceleration is provided to characteristic extracting module;Safe arbitration modules:Classification results according to grader to picture feature, calculate the safety coefficient of target URL network address;Data memory module:The pictorial information that storage reptile module crawls, stores the testing result information to target URL;Responsor:Forward end request module returns to target URL safety coefficient.
- 2. a kind of web site contents safety detecting system according to claim 1, it is characterised in that FPGA hardware accelerator makes With the reconfigurable acceleration storehouses of Xilinx, give reality with reference to Caffe machine learning framework and Xilinx deep neural network DNN storehouses It is existing.
- A kind of 3. web site contents safety detecting system according to claim 2, it is characterised in that Caffe machine learning frames Frame is the integrated framework of a CNN convolutional neural networks deep learning.
- 4. a kind of web site contents safety detecting system according to claim 1, it is characterised in that safe arbitration modules pass through It is labeled whether non-security number of pictures exceedes given threshold, targeted website safety coefficient is calculated.
- 5. a kind of web site contents safety detecting system according to claim 1, it is characterised in that samples pictures module includes Normal picture and improper picture, improper picture refer to the picture for having gambling and pornographic feature.
- 6. a kind of web site contents safety detection method, it is characterised in that comprise the following steps:S1:The pictorial information of samples pictures module is extracted as the form of characteristic vector by characteristic extracting module;S2:The sampling feature vectors that S1 is obtained are input, and model trainer generates grader using the mode of supervised learning;S3:In front end, request module inputs URL network address to be detected, detects the legitimacy of the network address, the network address is submitted to and climbed Erpoglyph block;S4:Reptile module receives the URL network address sent from front end request module, crawls the pictorial information of target URL network address, and Content storage will be crawled and arrive data memory module;S5:The characteristic vector for the picture that characteristic extracting module extraction S4 is crawled;S6:The image crawled is classified as input, grader using the image feature vector of S5 extractions;S7:Safe arbitration modules calculate the safety coefficient of target network address according to S6 classification results, and with target URL network address, this Picture path, detection time and the safety coefficient that ground preserves targeted website are stored;S8:The testing result of target network address is sent to front end request module by respond module.
- 7. a kind of web site contents safety detection method according to claim 6, it is characterised in that characteristic extracting module uses FPGA hardware accelerator accelerates to picture feature extraction algorithm.
- 8. a kind of web site contents safety detection method according to claim 7, it is characterised in that FPGA hardware accelerator makes With the reconfigurable acceleration storehouses of Xilinx, give reality with reference to Caffe machine learning framework and Xilinx deep neural network DNN storehouses Existing, Caffe machine learning framework is the integrated framework of a CNN convolutional neural networks deep learning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711090519.3A CN107862050A (en) | 2017-11-08 | 2017-11-08 | A kind of web site contents safety detecting system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711090519.3A CN107862050A (en) | 2017-11-08 | 2017-11-08 | A kind of web site contents safety detecting system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107862050A true CN107862050A (en) | 2018-03-30 |
Family
ID=61701187
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711090519.3A Pending CN107862050A (en) | 2017-11-08 | 2017-11-08 | A kind of web site contents safety detecting system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107862050A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110275958A (en) * | 2019-06-26 | 2019-09-24 | 北京市博汇科技股份有限公司 | Site information recognition methods, device and electronic equipment |
CN110633226A (en) * | 2018-06-22 | 2019-12-31 | 武汉海康存储技术有限公司 | Fusion memory, storage system and deep learning calculation method |
CN111091019A (en) * | 2019-12-23 | 2020-05-01 | 支付宝(杭州)信息技术有限公司 | Information prompting method, device and equipment |
CN111401115A (en) * | 2019-08-01 | 2020-07-10 | 江苏农林职业技术学院 | Strawberry disease and pest hyperspectral data processing method and device based on FPGA |
CN111475699A (en) * | 2020-03-07 | 2020-07-31 | 咪咕文化科技有限公司 | Website data crawling method and device, electronic equipment and readable storage medium |
CN111626309A (en) * | 2020-05-26 | 2020-09-04 | 北京墨云科技有限公司 | Website fingerprint identification method based on deep learning |
CN111651658A (en) * | 2020-06-05 | 2020-09-11 | 杭州安恒信息技术股份有限公司 | Method and computer equipment for automatically identifying website based on deep learning |
CN112731305A (en) * | 2020-12-17 | 2021-04-30 | 国网四川省电力公司信息通信公司 | Direct wave suppression method and system based on adaptive Doppler domain beam cancellation |
CN113657453A (en) * | 2021-07-22 | 2021-11-16 | 珠海高凌信息科技股份有限公司 | Harmful website detection method based on generation of countermeasure network and deep learning |
US11609989B2 (en) | 2019-03-26 | 2023-03-21 | Proofpoint, Inc. | Uniform resource locator classifier and visual comparison platform for malicious site detection |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968813A (en) * | 2010-10-25 | 2011-02-09 | 华北电力大学 | Method for detecting counterfeit webpage |
US20140270350A1 (en) * | 2013-03-14 | 2014-09-18 | Xerox Corporation | Data driven localization using task-dependent representations |
CN106776946A (en) * | 2016-12-02 | 2017-05-31 | 重庆大学 | A kind of detection method of fraudulent website |
-
2017
- 2017-11-08 CN CN201711090519.3A patent/CN107862050A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968813A (en) * | 2010-10-25 | 2011-02-09 | 华北电力大学 | Method for detecting counterfeit webpage |
US20140270350A1 (en) * | 2013-03-14 | 2014-09-18 | Xerox Corporation | Data driven localization using task-dependent representations |
CN106776946A (en) * | 2016-12-02 | 2017-05-31 | 重庆大学 | A kind of detection method of fraudulent website |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633226A (en) * | 2018-06-22 | 2019-12-31 | 武汉海康存储技术有限公司 | Fusion memory, storage system and deep learning calculation method |
US11609989B2 (en) | 2019-03-26 | 2023-03-21 | Proofpoint, Inc. | Uniform resource locator classifier and visual comparison platform for malicious site detection |
US11924246B2 (en) | 2019-03-26 | 2024-03-05 | Proofpoint, Inc. | Uniform resource locator classifier and visual comparison platform for malicious site detection preliminary |
US11799905B2 (en) | 2019-03-26 | 2023-10-24 | Proofpoint, Inc. | Uniform resource locator classifier and visual comparison platform for malicious site detection |
CN110275958A (en) * | 2019-06-26 | 2019-09-24 | 北京市博汇科技股份有限公司 | Site information recognition methods, device and electronic equipment |
CN110275958B (en) * | 2019-06-26 | 2021-07-27 | 北京市博汇科技股份有限公司 | Website information identification method and device and electronic equipment |
CN111401115A (en) * | 2019-08-01 | 2020-07-10 | 江苏农林职业技术学院 | Strawberry disease and pest hyperspectral data processing method and device based on FPGA |
CN111401115B (en) * | 2019-08-01 | 2024-02-27 | 江苏农林职业技术学院 | Method and device for processing strawberry disease and pest hyperspectral data based on FPGA |
CN111091019A (en) * | 2019-12-23 | 2020-05-01 | 支付宝(杭州)信息技术有限公司 | Information prompting method, device and equipment |
CN111475699A (en) * | 2020-03-07 | 2020-07-31 | 咪咕文化科技有限公司 | Website data crawling method and device, electronic equipment and readable storage medium |
CN111475699B (en) * | 2020-03-07 | 2023-09-08 | 咪咕文化科技有限公司 | Website data crawling method and device, electronic equipment and readable storage medium |
CN111626309A (en) * | 2020-05-26 | 2020-09-04 | 北京墨云科技有限公司 | Website fingerprint identification method based on deep learning |
CN111651658A (en) * | 2020-06-05 | 2020-09-11 | 杭州安恒信息技术股份有限公司 | Method and computer equipment for automatically identifying website based on deep learning |
CN112731305A (en) * | 2020-12-17 | 2021-04-30 | 国网四川省电力公司信息通信公司 | Direct wave suppression method and system based on adaptive Doppler domain beam cancellation |
CN113657453A (en) * | 2021-07-22 | 2021-11-16 | 珠海高凌信息科技股份有限公司 | Harmful website detection method based on generation of countermeasure network and deep learning |
CN113657453B (en) * | 2021-07-22 | 2023-08-01 | 珠海高凌信息科技股份有限公司 | Detection method based on harmful website generating countermeasure network and deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107862050A (en) | A kind of web site contents safety detecting system and method | |
CN110210617B (en) | Confrontation sample generation method and generation device based on feature enhancement | |
CN104077396B (en) | Method and device for detecting phishing website | |
CN110808968B (en) | Network attack detection method and device, electronic equipment and readable storage medium | |
CN105072089B (en) | A kind of WEB malice scanning behavior method for detecting abnormality and system | |
CN103577755A (en) | Malicious script static detection method based on SVM (support vector machine) | |
CN109005145A (en) | A kind of malice URL detection system and its method extracted based on automated characterization | |
CN107786575A (en) | A kind of adaptive malice domain name detection method based on DNS flows | |
CN109510815A (en) | A kind of multistage detection method for phishing site and detection system based on supervised learning | |
CN107220386A (en) | Information-pushing method and device | |
CN107659570A (en) | Webshell detection methods and system based on machine learning and static and dynamic analysis | |
CN109873810B (en) | Network fishing detection method based on goblet sea squirt group algorithm support vector machine | |
CN107957872A (en) | A kind of full web site source code acquisition methods and illegal website detection method, system | |
Saini et al. | A review of bot protection using CAPTCHA for web security | |
US20230385409A1 (en) | Unstructured text classification | |
CN109858248B (en) | Malicious Word document detection method and device | |
CN104504335B (en) | Fishing APP detection methods and system based on page feature and URL features | |
CN103810425A (en) | Method and device for detecting malicious website | |
EP3703329B1 (en) | Webpage request identification | |
CN110191096A (en) | A kind of term vector homepage invasion detection method based on semantic analysis | |
CN108337255A (en) | A kind of detection method for phishing site learnt based on web automatic tests and width | |
WO2014029318A1 (en) | Method and apparatus for identifying webpage type | |
CN107046586A (en) | A kind of algorithm generation domain name detection method based on natural language feature | |
CN106169050B (en) | A kind of PoC Program extraction method based on webpage Knowledge Discovery | |
CN104158828A (en) | Method and system for identifying doubtful phishing webpage on basis of cloud content rule base |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180330 |
|
RJ01 | Rejection of invention patent application after publication |