CN106355095B - Method for distinguishing is known to fraud webpage using fuzzy theory - Google Patents
Method for distinguishing is known to fraud webpage using fuzzy theory Download PDFInfo
- Publication number
- CN106355095B CN106355095B CN201611046454.8A CN201611046454A CN106355095B CN 106355095 B CN106355095 B CN 106355095B CN 201611046454 A CN201611046454 A CN 201611046454A CN 106355095 B CN106355095 B CN 106355095B
- Authority
- CN
- China
- Prior art keywords
- webpage
- matrix
- fraud
- fraud webpage
- fuzzy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention discloses a kind of method that fraud webpage is identified using fuzzy theory, it is related to a kind of fraud webpage identification technology not depending on web page characteristics, solve the problems, such as that fraud webpage identifies using the thinking and fuzzy theory shared out the work and helped one another, the quality of webpage is determined by different users, data set after user makes label is analyzed by computer, to solve the existing fraud web page identification method technical problem big to the dependence of webpage.This technical solution is simple and effective, has important practical value in future searches engine.
Description
Technical field
The present invention discloses a kind of method that fraud webpage is identified using fuzzy theory, is related to one kind and not depending on webpage
The fraud webpage identification technology of feature, belongs to internet security and service technology field.
Background technology
Search engine has become the indispensable tool of Internet user, but due to the driving of interests, and fraud webpage is big
Amount mixes in internet.Tricker takes improper means, is carried out to webpage sorting for search engine ordering strategy artificial
Intervene, with acquisition and the disproportionate high ranking in its status, acquisition of the interference user to information, or even damage user benefit, these
Webpage is referred to as cheating webpage, and the mode that tricker takes can be divided into four kinds:Mode based on content, the side based on link
Formula, the mode based on concealing technique and the mode based on redirection, previous anti-fraud research are carried out for four kinds of deception modes
Webpage itself is depended in identification unduly, and recognition result is of short duration effectively, and the fraud web page identification method that searching does not depend on web page characteristics is
A current major issue urgently to be resolved hurrily.
Invention content
A kind of utilization fuzzy theory of the present invention does not depend on the fraud net of web page characteristics to cheating web page identification method
Page recognition methods, solve previous identification fraud web-page approach depends on that webpage itself, recognition result are of short duration effectively to ask unduly
Topic.
For a kind of utilization fuzzy theory of the present invention to cheating web page identification method, technical solution includes following step
Suddenly:
Step 1:
User has browsed webpage, and carrying out evaluation to webpage makes user's mark:Respectively " non-fraud webpage F ", " fraud net
Page S ", " equivocal B " or " not knowing U ";
Step 2:
Each the end of month is downloaded the data set of of that month whole user's marks by search engine;
Step 3:
Several matrix M is divided by the quantity that each webpage different user marks to data seti, wherein i=1,2 ...,
n;
Step 4:
To each matrix Mi:It is denoted as N, changes into each element R of fuzzy similarity matrix R, Rij, wherein i, j=1,2 ...,
N, n ∈ R, calculation formula include:
Wherein, i, j=1,2 ..., n;N is the line number of N;
Wherein, i, j=1,2 ..., n;N is the line number of N, and m is the columns of N;
Step 5:
Fuzzy similarity matrix changes into fuzzy equivalent matrix, and formula is as follows:
N is self-heating number;P is the line number of R;
Until meeting Rb*Rb!=RbCondition, matrix reach convergence;
Step 6:
Convergent matrix is chosen into all confidence values [0,1], calculates Level Matrix;
Step 7:
For each Level Matrix, cluster generates multiple set, it is artificial to select first website from each set successively
Judgement is that fraud webpage is also non-fraud webpage, if fraud webpage then thinks that the set belongs to fraud webpage;If being non-fraud
Webpage then thinks that the set belongs to non-fraud webpage.
The positive effect of the present invention is:It is asked using the thinking and fuzzy theory shared out the work and helped one another to solve fraud webpage identification
Topic, the quality of webpage is determined by different users, and the data set after user makes label is analyzed by computer, existing to solve
The technical problem for having fraud web page identification method big to the dependence of webpage.This technical solution is simple and effective, in future searches
There is important practical value in engine.
Specific implementation mode
In order to illustrate more clearly of technical solution of the present invention, will be described below according to technology described in technical solution to
Go out three embodiments, for those of ordinary skill in the art, without having to pay creative labor, can also incite somebody to action
The technical solution applies in Practical Project.
Embodiment 1
Step 1:After user has browsed webpage, according to the evaluation to webpage, four kinds pre-set from webpage mark
The selection of oneself is provided in (F, S, B, U), such as:What 362F U were indicated is that there are two the labels of user point for website that id is 362
It Wei not F and U.
Step 2:In order to meet the requirement of embodiment, we use data set webspam-uk2007 (" WebSpam
Collections ", http://chato.cl/webspam/datasets/, Crawled by the Laboratory of
Web Algo rithmics, University of Milan, http://law.di.unimi.it/) verify the reality of cluster
The discrimination tested.
Step 3:50 datas that number of users is 2 are chosen from data set, generate the matrix M of 50*2.
Step 4:Fuzzy similarity matrix is calculated according to formula to the matrix and obtains the matrix R of 50*50.
Calculation formula includes:
Wherein, i, j=1,2 ..., n.N is the line number of N;
Wherein, i, j=1,2 ..., n.N is the line number of N, and m is the columns of N;
Step 5:To matrix R caused by step 4, fuzzy equivalent matrix is calculated using formula, result of calculation is m=8,
That is R8·R8=R8, at this moment R is still the matrix of 50*50.
Formula is as follows:
N is self-heating number;P is the line number of R;
Until meeting Rb*Rb!=RbCondition, matrix reach convergence;
Step 6:The sequential organization of element included in matrix from big to small is as follows:It is denoted as λ:1>0.9>0.8.According to
Secondary to take λ=1,0.9,0.8 calculates separately its cut set matrix, and as λ=1, all 1 values of being less than all are substituted for 0 in matrix, generate
First Level Matrix;As λ=0.9, all 0.9 values of being more than or equal to all are substituted for 1 in matrix, all in matrix to be less than 0.9
Value be all substituted for 0, generate second Level Matrix;As λ=0.8, all 0.8 values of being more than or equal to all are substituted for 1 in matrix,
Generate third Level Matrix.
Step 7:
As λ=1,
Cluster generate 5 set, successively from each set choose first website artificial judgment be fraud webpage or
Non- fraud webpage thinks the set category if fraud webpage then thinks that the set belongs to fraud webpage if being non-fraud webpage
In non-fraud webpage, embodiment result such as following table:(the judgement that we provide according to data set for each website in each set
It carries out verifying its corresponding discrimination)
As λ=0.9, cluster generates 4 set, and it is to take advantage of that first website artificial judgment is chosen from each set successively
It is also non-fraud webpage to cheat webpage, if fraud webpage then thinks that the set belongs to fraud webpage, is recognized if being non-fraud webpage
Belong to non-fraud webpage, embodiment result such as following table for the set:(for each website in each set we according to data set
The judgement provided carries out verifying its corresponding discrimination)
As λ=0.8, cluster generates 1 set, and embodiment 1 completes embodiment 1 as mark.
Embodiment 2
Step 1:After user has browsed webpage, according to the evaluation to webpage, four kinds pre-set from webpage mark
The selection of oneself is provided in (F, S, B, U), such as:What 362F U were indicated is that there are two the labels of user point for website that id is 362
It Wei not F and U.
Step 2:In order to meet the requirement of embodiment, we use data set webspam-uk2007 (" WebSpam
Collections ", http://chato.cl/webspam/datasets/, Crawled by the Laboratory of
Web Algorithmics, University of Milan, http://law.di.unimi.it/) verify the experiment of cluster
Discrimination.
Step 3:100 datas that number of users is 2 are chosen from data set, generate the matrix M of 100*2.
Step 4:Fuzzy similarity matrix is calculated according to formula to the matrix and obtains the matrix R of 100*100.
Calculation formula includes:
Wherein, i, j=1,2 ..., n.N is the line number of N;
Wherein, i, j=1,2 ..., n.N is the line number of N, and m is the columns of N;
Step 5:To matrix R caused by step 4, fuzzy equivalent matrix is calculated using formula, result of calculation is m=
16, i.e. R16·R16=R16, at this moment R is still the matrix of 100*100.
Formula is as follows:
N is self-heating number;P is the line number of R;
Until meeting Rb*Rb!=RbCondition, matrix reach convergence;
Step 6:The sequential organization of element included in matrix from big to small is as follows:It is denoted as λ:1>0.9>0.8.According to
Secondary to take λ=1,0.9,0.8 calculates separately its cut set matrix, and as λ=1, all 1 values of being less than all are substituted for 0 in matrix, generate
First Level Matrix;As λ=0.9, all 0.9 values of being more than or equal to all are substituted for 1 in matrix, all in matrix to be less than 0.9
Value be all substituted for 0, generate second Level Matrix;As λ=0.8, all 0.8 values of being more than or equal to all are substituted for 1 in matrix,
Generate third Level Matrix.
Step 7:
As λ=1,
Cluster generate 8 set, successively from each set choose first website artificial judgment be fraud webpage or
Non- fraud webpage thinks the set category if fraud webpage then thinks that the set belongs to fraud webpage if being non-fraud webpage
In non-fraud webpage, embodiment result such as following table:(the judgement that we provide according to data set for each website in each set
It carries out verifying its corresponding discrimination)
As λ=0.9,
Cluster generate 2 set, successively from each set choose first website artificial judgment be fraud webpage or
Non- fraud webpage thinks the set category if fraud webpage then thinks that the set belongs to fraud webpage if being non-fraud webpage
In non-fraud webpage, embodiment result such as following table:(the judgement that we provide according to data set for each website in each set
It carries out verifying its corresponding discrimination)
As λ=0.8, cluster generates 1 set, and embodiment 2 completes embodiment 2 as mark.
Embodiment 3
Step 1:After user has browsed webpage, according to the evaluation to webpage, four kinds pre-set from webpage mark
The selection of oneself is provided in (F, S, B, U), such as:What 362F U were indicated is that there are two the labels of user point for website that id is 362
It Wei not F and U.
Step 2:In order to meet the requirement of embodiment, we use data set webspam-uk2007 (" WebSpam
Collections ", http://chato.cl/webspam/datasets/, Crawled by the Laboratory of
Web Algo rithmics, University of Milan, http://law.di.unimi.it/) verify the reality of cluster
The discrimination tested.
Step 3:200 datas that number of users is 2 are chosen from data set, generate the matrix M of 200*2.
Step 4:Fuzzy similarity matrix is calculated according to formula to the matrix and obtains the matrix R of 200*200.
Calculation formula includes:
Wherein, i, j=1,2 ..., n.N is the line number of N;
Wherein, i, j=1,2 ..., n.N is the line number of N, and m is the columns of N;
Step 5:To matrix R caused by step 4, fuzzy equivalent matrix is calculated using formula, result of calculation is m=8,
That is R8·R8=R8, at this moment R is still the matrix of 200*200.
Formula is as follows:
N is self-heating number;P is the line number of R;
Until meeting Rb*Rb!=RbCondition, matrix reach convergence;
Step 6:The sequential organization of element included in matrix from big to small is as follows:It is denoted as λ:1>0.9>0.8.According to
Secondary to take λ=1,0.9,0.8 calculates separately its cut set matrix, and as λ=1, all 1 values of being less than all are substituted for 0 in matrix, generate
First Level Matrix;As λ=0.9, all 0.9 values of being more than or equal to all are substituted for 1 in matrix, all in matrix to be less than 0.9
Value be all substituted for 0, generate second Level Matrix;As λ=0.8, all 0.8 values of being more than or equal to all are substituted for 1 in matrix,
Generate third Level Matrix.
Step 7:
As λ=1,
Cluster generate 9 set, successively from each set choose first website artificial judgment be fraud webpage or
Non- fraud webpage thinks the set category if fraud webpage then thinks that the set belongs to fraud webpage if being non-fraud webpage
In non-fraud webpage, embodiment result such as following table:(the judgement that we provide according to data set for each website in each set
It carries out verifying its corresponding discrimination)
As λ=0.9,
Cluster generate 3 set, successively from each set choose first website artificial judgment be fraud webpage or
Non- fraud webpage thinks the set category if fraud webpage then thinks that the set belongs to fraud webpage if being non-fraud webpage
In non-fraud webpage, embodiment result such as following table:(the judgement that we provide according to data set for each website in each set
It carries out verifying its corresponding discrimination)
As λ=0.8, cluster generates 1 set, and embodiment 3 completes embodiment 3 as mark.
Claims (1)
1. a kind of knowing method for distinguishing using fuzzy theory to fraud webpage, include the following steps:
Step 1:
User has browsed webpage, and carrying out evaluation to webpage makes user's mark:Respectively " non-fraud webpage F ", " fraud webpage
S ", " equivocal B " or " not knowing U ";
Step 2:
Each the end of month is downloaded the data set of of that month whole user's marks by search engine;
Step 3:
Several matrix M is divided by the quantity that each webpage different user marks to data seti, wherein i=1,2 ..., n;
Step 4:
To each matrix Mi:It is denoted as N, changes into each element R of fuzzy similarity matrix R, Rij, wherein i, j=1,2 ..., n, n ∈
R, calculation formula include:
Wherein, i, j=1,2 ..., n;N is the line number of N;
Wherein, i, j=1,2 ..., n;N is the line number of N, and m is the columns of N;
Step 5:
Fuzzy similarity matrix changes into fuzzy equivalent matrix, and formula is as follows:
B=1,2 ..., n;N is natural number;P is the line number of R;
Until meeting Rb*Rb!=RbCondition, matrix reach convergence;
Step 6:
Convergent matrix is chosen into all confidence values [0,1], calculates Level Matrix;
Step 7:
For each Level Matrix, cluster generates multiple set, selects first website artificial judgment from each set successively
Be fraud webpage be also non-fraud webpage, if fraud webpage then think that the set belongs to fraud webpage;If being non-fraud webpage
Then think that the set belongs to non-fraud webpage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611046454.8A CN106355095B (en) | 2016-11-23 | 2016-11-23 | Method for distinguishing is known to fraud webpage using fuzzy theory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611046454.8A CN106355095B (en) | 2016-11-23 | 2016-11-23 | Method for distinguishing is known to fraud webpage using fuzzy theory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106355095A CN106355095A (en) | 2017-01-25 |
CN106355095B true CN106355095B (en) | 2018-10-19 |
Family
ID=57862809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611046454.8A Expired - Fee Related CN106355095B (en) | 2016-11-23 | 2016-11-23 | Method for distinguishing is known to fraud webpage using fuzzy theory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106355095B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194281B (en) * | 2017-05-25 | 2019-07-16 | 成都知道创宇信息技术有限公司 | A kind of anti-fake system based on block chain technology |
CN108985815A (en) * | 2018-06-06 | 2018-12-11 | 阿里巴巴集团控股有限公司 | A kind of user identification method, device and equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102592067A (en) * | 2011-01-17 | 2012-07-18 | 腾讯科技(深圳)有限公司 | Webpage recognition method, device and system |
CN103634306A (en) * | 2013-11-18 | 2014-03-12 | 北京奇虎科技有限公司 | Security detection method and security detection server for network data |
CN104486461A (en) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | Domain name classification method and device and domain name recognition method and system |
CN103425736B (en) * | 2013-06-24 | 2016-02-17 | 腾讯科技(深圳)有限公司 | A kind of web information recognition, Apparatus and system |
CN105827611A (en) * | 2016-04-06 | 2016-08-03 | 清华大学 | Distributed rejection service network attack detection method and system based on fuzzy inference |
CN106021487A (en) * | 2016-05-19 | 2016-10-12 | 浙江工业大学 | Internet of Things semantic event detection method based on fuzzy theory |
-
2016
- 2016-11-23 CN CN201611046454.8A patent/CN106355095B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102592067A (en) * | 2011-01-17 | 2012-07-18 | 腾讯科技(深圳)有限公司 | Webpage recognition method, device and system |
CN103425736B (en) * | 2013-06-24 | 2016-02-17 | 腾讯科技(深圳)有限公司 | A kind of web information recognition, Apparatus and system |
CN103634306A (en) * | 2013-11-18 | 2014-03-12 | 北京奇虎科技有限公司 | Security detection method and security detection server for network data |
CN104486461A (en) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | Domain name classification method and device and domain name recognition method and system |
CN105827611A (en) * | 2016-04-06 | 2016-08-03 | 清华大学 | Distributed rejection service network attack detection method and system based on fuzzy inference |
CN106021487A (en) * | 2016-05-19 | 2016-10-12 | 浙江工业大学 | Internet of Things semantic event detection method based on fuzzy theory |
Non-Patent Citations (2)
Title |
---|
一种基于模糊等价矩阵传递闭包的聚类算法;赵磊;《电脑知识与技术》;20100930;全文 * |
直觉模糊等价矩阵构造方法;雷英杰等;《系统工程理论与实践》;20070731;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN106355095A (en) | 2017-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104346370B (en) | Picture search, the method and device for obtaining image text information | |
CN106294883B (en) | Based on user behavior data to the method and system analyzed on user behavior figure | |
CN103365839B (en) | The recommendation searching method and device of a kind of search engine | |
CN103020164B (en) | Semantic search method based on multi-semantic analysis and personalized sequencing | |
US8682882B2 (en) | System and method for automatically identifying classified websites | |
US20140195348A1 (en) | Method and apparatus for composing search phrases, distributing ads and searching product information | |
CA2612895A1 (en) | Systems and methods for providing search results | |
CN103389974B (en) | Carry out the method and server of information search | |
CN107315841A (en) | A kind of information search method, apparatus and system | |
CN106407349A (en) | Product recommendation method and device | |
US8489604B1 (en) | Automated resource selection process evaluation | |
EP2649542A2 (en) | Ranking product information | |
CN104636407B (en) | Parameter value training and searching request treating method and apparatus | |
CN101957845B (en) | On-line application system and implementation method thereof | |
CN106776609A (en) | Reprint the statistical method and device of quantity in website | |
CN104881472A (en) | Combined recommendation method of traveling scenic spots based on network data collection | |
CN106777295A (en) | Method and system is recommended in a kind of position search based on semantic matches | |
CN106355095B (en) | Method for distinguishing is known to fraud webpage using fuzzy theory | |
CN107220358A (en) | The recommendation method and device of point of interest | |
CN107203558A (en) | Object recommendation method and apparatus, recommendation information treating method and apparatus | |
CN105630937A (en) | Method and device for searching answers to exam questions | |
CN101308507B (en) | Internet information issue and search method | |
CN104615621B (en) | Correlation treatment method and system in search | |
CN103617221B (en) | Software recommendation method and software recommendation system | |
CN103942698A (en) | Product information comparing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20181019 Termination date: 20201123 |
|
CF01 | Termination of patent right due to non-payment of annual fee |