CN106355095A - Method for identifying fraud website by utilizing fuzzy theory - Google Patents
Method for identifying fraud website by utilizing fuzzy theory Download PDFInfo
- Publication number
- CN106355095A CN106355095A CN201611046454.8A CN201611046454A CN106355095A CN 106355095 A CN106355095 A CN 106355095A CN 201611046454 A CN201611046454 A CN 201611046454A CN 106355095 A CN106355095 A CN 106355095A
- Authority
- CN
- China
- Prior art keywords
- webpage
- matrix
- fraud
- website
- fraud webpage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The invention discloses a method for identifying a fraud website by utilizing a fuzzy theory and relates to a technology for identifying a fraud website independent of website characteristics. The fraud website identifying problem is solved by utilizing the thought of division and coordination of labor and the fuzzy theory. The website quality is decided by different users, and data sets marked by the users are analyzed by a computer to solve the technical problem that an existing fraud website identifying method has large website dependency. The method is simple and effective and has an important practical value in a future search engine.
Description
Technical field
The present invention discloses a kind of method fraud webpage being identified using fuzzy theory, is related to one kind and is independent of webpage
The fraud webpage technology of identification of feature, belongs to internet security and service technology field.
Background technology
Search engine has become as the indispensable instrument of Internet user, but the driving due to interests, cheat webpage big
Amount mixes in the Internet.Tricker takes improper means, for search engine ordering strategy, webpage sorting is carried out manually
Intervene, to obtain and the disproportionate high ranking in its status, disturb the acquisition to information for the user, or even infringement user benefit, these
Webpage is referred to as cheating webpage, and the mode that tricker takes can be divided into four kinds: the mode based on content, the side based on link
Formula, the mode based on concealing technique and the mode based on redirection, anti-fraud research in the past was all carried out for four kinds of deception modes
Identification, depends on webpage itself unduly, and recognition result is of short duration effectively, and the fraud web page identification method that searching is independent of web page characteristics is
A major issue currently urgently to be resolved hurrily.
Content of the invention
One kind of the present invention is independent of the fraud net of web page characteristics using fuzzy theory to fraud web page identification method
Page recognition methodss, solve conventional identification fraud web-page approach depend on webpage itself unduly, recognition result is of short duration effectively asks
Topic.
Using fuzzy theory to fraud web page identification method, its technical scheme includes following step to one kind of the present invention
Rapid:
Step one:
User has browsed webpage, webpage is carried out with evaluation and makes user's mark: be respectively " non-fraud webpage f ", " fraud net
Page s ", " equivocal b " or " not knowing u ";
Step 2:
Each the end of month passes through search engine and downloads the data set of of that month all user's marks;
Step 3:
By the quantity of each webpage different user labelling, some matrix m are divided into data seti, wherein, i=1,2 ...,
n;
Step 4:
To each matrix mi: it is denoted as n, change into fuzzy similarity matrix r, each element r of rij, wherein i, j=1,2 ...,
N, n ∈ r, computing formula includes:
Wherein, i, j=1,2 ..., n;N is the line number of n;
Wherein, i, j=1,2 ..., n;N is the line number of n, and m is the columns of n;
Step 5:
Fuzzy similarity matrix changes into fuzzy equivalent matrix, and formula is as follows:
N is self-heating number;P is the line number of r;
Until meeting rb*rb!=rbCondition, matrix reaches convergence;
Step 6:
The matrix of convergence is chosen all of confidence value [0,1], calculates Level Matrix;
Step 7:
For each Level Matrix, cluster produces multiple set, selects first website successively artificial from each set
Judgement is fraud webpage is also non-fraud webpage, if fraud webpage then thinks that this set belongs to fraud webpage;If being non-fraud
Webpage then thinks that this set belongs to non-fraud webpage.
The positive effect of the present invention is: solves fraud webpage identification using the thinking shared out the work and helped one another and fuzzy theory and asks
Topic, to be determined the quality of webpage, to analyze user by computer and to make the data set after labelling by different users, existing to solve
There is the fraud web page identification method technical problem big to the dependency of webpage.This technical scheme is simply effective, in future searches
There is in engine important practical be worth.
Specific embodiment
In order to be illustrated more clearly that technical solution of the present invention, will be described according to technology below described in technical scheme to
Go out three embodiments, for those of ordinary skill in the art, without having to pay creative labor, can also be by
This technical scheme applies in Practical Project.
Embodiment 1
Step one: after user has browsed webpage, according to the evaluation to webpage, the four kinds of labellings pre-setting from webpage
The selection of oneself is given, for example: what 362f u represented is that the labelling that the website that id is 362 has two users divides in (f, s, b, u)
Wei not f and u.
Step 2: in order to meet the requirement of embodiment, we use data set webspam-uk2007 (" webspam
Collections ", http://chato.cl/webspam/datasets/, crawled by the laboratory of
Web algo rithmics, university of milan, http://law.di.unimi.it/) verifying the reality of cluster
The discrimination tested.
Step 3: choose 50 data that number of users is 2 from data set, produce the matrix m of 50*2.
Step 4: according to formula, the matrix r that fuzzy similarity matrix obtains 50*50 is calculated to this matrix.
Computing formula includes:
Wherein, i, j=1,2 ..., n.N is the line number of n;
Wherein, i, j=1,2 ..., n.N is the line number of n, and m is the columns of n;
Step 5: to matrix r produced by step 4, calculate fuzzy equivalent matrix using formula, result of calculation is m=8,
I.e. r8·r8=r8, at this moment r is still the matrix of 50*50.
Formula is as follows:
N is self-heating number;P is the line number of r;
Until meeting rb*rb!=rbCondition, matrix reaches convergence;
Step 6: as follows for the sequential organization from big to small of the element included in matrix: be designated as λ: 1 > 0.9 > 0.8.According to
Secondary take λ=1,0.9,0.8 calculates its cut set matrix respectively, and when λ=1, in matrix, all values being less than 1 are all substituted for 0, produce
First Level Matrix;When λ=0.9, in matrix, all values being more than or equal to 0.9 are all substituted for 1, all in matrix are less than 0.9
Value be all substituted for 0, produce second Level Matrix;When λ=0.8, in matrix, all values being more than or equal to 0.8 are all substituted for 1,
Produce the 3rd Level Matrix.
Step 7:
When λ=1,
Cluster produces 5 set, choose from each set successively first website artificial judgment be fraud webpage or
Non- fraud webpage, if fraud webpage then thinks that this set belongs to fraud webpage, if being non-fraud webpage, thinks that this set belongs to
In non-fraud webpage, embodiment result is as follows: the (judgement that we provide for each website in each set according to data set
Carry out verifying its corresponding discrimination)
When λ=0.9, cluster produces 4 set, and choosing first website artificial judgment from each set successively is to take advantage of
Swindleness webpage is also non-fraud webpage, if fraud webpage then thinks that this set belongs to fraud webpage, if being non-fraud webpage, recognizes
Belong to non-fraud webpage for this set, embodiment result be as follows: (for each set in each website we according to data set
The judgement being given carries out verifying its corresponding discrimination)
When λ=0.8, cluster produces 1 set, and embodiment 1 completes embodiment 1 as mark.
Embodiment 2
Step one: after user has browsed webpage, according to the evaluation to webpage, the four kinds of labellings pre-setting from webpage
The selection of oneself is given, for example: what 362f u represented is that the labelling that the website that id is 362 has two users divides in (f, s, b, u)
Wei not f and u.
Step 2: in order to meet the requirement of embodiment, we use data set webspam-uk2007 (" webspam
Collections ", http://chato.cl/webspam/datasets/, crawled by the laboratory of
Web algorithmics, university of milan, http://law.di.unimi.it/) verifying the experiment of cluster
Discrimination.
Step 3: choose 100 data that number of users is 2 from data set, produce the matrix m of 100*2.
Step 4: according to formula, the matrix r that fuzzy similarity matrix obtains 100*100 is calculated to this matrix.
Computing formula includes:
Wherein, i, j=1,2 ..., n.N is the line number of n;
Wherein, i, j=1,2 ..., n.N is the line number of n, and m is the columns of n;
Step 5: to matrix r produced by step 4, calculate fuzzy equivalent matrix using formula, result of calculation is m=
16, i.e. r16·r16=r16, at this moment r is still the matrix of 100*100.
Formula is as follows:
N is self-heating number;P is the line number of r;
Until meeting rb*rb!=rbCondition, matrix reaches convergence;
Step 6: as follows for the sequential organization from big to small of the element included in matrix: be designated as λ: 1 > 0.9 > 0.8.According to
Secondary take λ=1,0.9,0.8 calculates its cut set matrix respectively, and when λ=1, in matrix, all values being less than 1 are all substituted for 0, produce
First Level Matrix;When λ=0.9, in matrix, all values being more than or equal to 0.9 are all substituted for 1, all in matrix are less than 0.9
Value be all substituted for 0, produce second Level Matrix;When λ=0.8, in matrix, all values being more than or equal to 0.8 are all substituted for 1,
Produce the 3rd Level Matrix.
Step 7:
When λ=1,
Cluster produces 8 set, choose from each set successively first website artificial judgment be fraud webpage or
Non- fraud webpage, if fraud webpage then thinks that this set belongs to fraud webpage, if being non-fraud webpage, thinks that this set belongs to
In non-fraud webpage, embodiment result is as follows: the (judgement that we provide for each website in each set according to data set
Carry out verifying its corresponding discrimination)
When λ=0.9,
Cluster produces 2 set, choose from each set successively first website artificial judgment be fraud webpage or
Non- fraud webpage, if fraud webpage then thinks that this set belongs to fraud webpage, if being non-fraud webpage, thinks that this set belongs to
In non-fraud webpage, embodiment result is as follows: the (judgement that we provide for each website in each set according to data set
Carry out verifying its corresponding discrimination)
When λ=0.8, cluster produces 1 set, and embodiment 2 completes embodiment 2 as mark.
Embodiment 3
Step one: after user has browsed webpage, according to the evaluation to webpage, the four kinds of labellings pre-setting from webpage
The selection of oneself is given, for example: what 362f u represented is that the labelling that the website that id is 362 has two users divides in (f, s, b, u)
Wei not f and u.
Step 2: in order to meet the requirement of embodiment, we use data set webspam-uk2007 (" webspam
Collections ", http://chato.cl/webspam/datasets/, crawled by the laboratory of
Web algo rithmics, university of milan, http://law.di.unimi.it/) verifying the reality of cluster
The discrimination tested.
Step 3: choose 200 data that number of users is 2 from data set, produce the matrix m of 200*2.
Step 4: according to formula, the matrix r that fuzzy similarity matrix obtains 200*200 is calculated to this matrix.
Computing formula includes:
Wherein, i, j=1,2 ..., n.N is the line number of n;
Wherein, i, j=1,2 ..., n.N is the line number of n, and m is the columns of n;
Step 5: to matrix r produced by step 4, calculate fuzzy equivalent matrix using formula, result of calculation is m=8,
I.e. r8·r8=r8, at this moment r is still the matrix of 200*200.
Formula is as follows:
N is self-heating number;P is the line number of r;
Until meeting rb*rb!=rbCondition, matrix reaches convergence;
Step 6: as follows for the sequential organization from big to small of the element included in matrix: be designated as λ: 1 > 0.9 > 0.8.According to
Secondary take λ=1,0.9,0.8 calculates its cut set matrix respectively, and when λ=1, in matrix, all values being less than 1 are all substituted for 0, produce
First Level Matrix;When λ=0.9, in matrix, all values being more than or equal to 0.9 are all substituted for 1, all in matrix are less than 0.9
Value be all substituted for 0, produce second Level Matrix;When λ=0.8, in matrix, all values being more than or equal to 0.8 are all substituted for 1,
Produce the 3rd Level Matrix.
Step 7:
When λ=1,
Cluster produces 9 set, choose from each set successively first website artificial judgment be fraud webpage or
Non- fraud webpage, if fraud webpage then thinks that this set belongs to fraud webpage, if being non-fraud webpage, thinks that this set belongs to
In non-fraud webpage, embodiment result is as follows: the (judgement that we provide for each website in each set according to data set
Carry out verifying its corresponding discrimination)
When λ=0.9,
Cluster produces 3 set, choose from each set successively first website artificial judgment be fraud webpage or
Non- fraud webpage, if fraud webpage then thinks that this set belongs to fraud webpage, if being non-fraud webpage, thinks that this set belongs to
In non-fraud webpage, embodiment result is as follows: the (judgement that we provide for each website in each set according to data set
Carry out verifying its corresponding discrimination)
When λ=0.8, cluster produces 1 set, and embodiment 3 completes embodiment 3 as mark.
Claims (1)
1. one kind knows method for distinguishing using fuzzy theory to fraud webpage, comprises the steps:
Step one:
User has browsed webpage, webpage is carried out with evaluation and makes user's mark: be respectively " non-fraud webpage f ", " fraud webpage
S ", " equivocal b " or " not knowing u ";
Step 2:
Each the end of month passes through search engine and downloads the data set of of that month all user's marks;
Step 3:
By the quantity of each webpage different user labelling, some matrix m are divided into data seti, wherein, i=1,2 ..., n;
Step 4:
To each matrix mi: it is denoted as n, change into fuzzy similarity matrix r, each element r of rij, wherein i, j=1,2 ..., n, n ∈
R, computing formula includes:
Wherein, i, j=1,2 ..., n;N is the line number of n;
Wherein, i, j=1,2 ..., n;N is the line number of n, and m is the columns of n;
Step 5:
Fuzzy similarity matrix changes into fuzzy equivalent matrix, and formula is as follows:
B=1,2 ..., n;N is self-heating number;P is the line number of r;
Until meeting rb*rb!=rbCondition, matrix reaches convergence;
Step 6:
The matrix of convergence is chosen all of confidence value [0,1], calculates Level Matrix;
Step 7:
For each Level Matrix, cluster produces multiple set, selects first website artificial judgment successively from each set
Be fraud webpage be also non-fraud webpage, if fraud webpage then think that this set belongs to fraud webpage;If being non-fraud webpage
Then think that this set belongs to non-fraud webpage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611046454.8A CN106355095B (en) | 2016-11-23 | 2016-11-23 | Method for distinguishing is known to fraud webpage using fuzzy theory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611046454.8A CN106355095B (en) | 2016-11-23 | 2016-11-23 | Method for distinguishing is known to fraud webpage using fuzzy theory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106355095A true CN106355095A (en) | 2017-01-25 |
CN106355095B CN106355095B (en) | 2018-10-19 |
Family
ID=57862809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611046454.8A Expired - Fee Related CN106355095B (en) | 2016-11-23 | 2016-11-23 | Method for distinguishing is known to fraud webpage using fuzzy theory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106355095B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194281A (en) * | 2017-05-25 | 2017-09-22 | 成都知道创宇信息技术有限公司 | A kind of anti-fake system based on block chain technology |
CN108985815A (en) * | 2018-06-06 | 2018-12-11 | 阿里巴巴集团控股有限公司 | A kind of user identification method, device and equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102592067A (en) * | 2011-01-17 | 2012-07-18 | 腾讯科技(深圳)有限公司 | Webpage recognition method, device and system |
CN103634306A (en) * | 2013-11-18 | 2014-03-12 | 北京奇虎科技有限公司 | Security detection method and security detection server for network data |
CN104486461A (en) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | Domain name classification method and device and domain name recognition method and system |
CN103425736B (en) * | 2013-06-24 | 2016-02-17 | 腾讯科技(深圳)有限公司 | A kind of web information recognition, Apparatus and system |
CN105827611A (en) * | 2016-04-06 | 2016-08-03 | 清华大学 | Distributed rejection service network attack detection method and system based on fuzzy inference |
CN106021487A (en) * | 2016-05-19 | 2016-10-12 | 浙江工业大学 | Internet of Things semantic event detection method based on fuzzy theory |
-
2016
- 2016-11-23 CN CN201611046454.8A patent/CN106355095B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102592067A (en) * | 2011-01-17 | 2012-07-18 | 腾讯科技(深圳)有限公司 | Webpage recognition method, device and system |
CN103425736B (en) * | 2013-06-24 | 2016-02-17 | 腾讯科技(深圳)有限公司 | A kind of web information recognition, Apparatus and system |
CN103634306A (en) * | 2013-11-18 | 2014-03-12 | 北京奇虎科技有限公司 | Security detection method and security detection server for network data |
CN104486461A (en) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | Domain name classification method and device and domain name recognition method and system |
CN105827611A (en) * | 2016-04-06 | 2016-08-03 | 清华大学 | Distributed rejection service network attack detection method and system based on fuzzy inference |
CN106021487A (en) * | 2016-05-19 | 2016-10-12 | 浙江工业大学 | Internet of Things semantic event detection method based on fuzzy theory |
Non-Patent Citations (2)
Title |
---|
赵磊: "一种基于模糊等价矩阵传递闭包的聚类算法", 《电脑知识与技术》 * |
雷英杰等: "直觉模糊等价矩阵构造方法", 《系统工程理论与实践》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194281A (en) * | 2017-05-25 | 2017-09-22 | 成都知道创宇信息技术有限公司 | A kind of anti-fake system based on block chain technology |
CN107194281B (en) * | 2017-05-25 | 2019-07-16 | 成都知道创宇信息技术有限公司 | A kind of anti-fake system based on block chain technology |
CN108985815A (en) * | 2018-06-06 | 2018-12-11 | 阿里巴巴集团控股有限公司 | A kind of user identification method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN106355095B (en) | 2018-10-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103020164B (en) | Semantic search method based on multi-semantic analysis and personalized sequencing | |
CN104123332B (en) | The display methods and device of search result | |
CN106294883B (en) | Based on user behavior data to the method and system analyzed on user behavior figure | |
CN101894134B (en) | Spatial layout-based fishing webpage detection and implementation method | |
CN109934619A (en) | User's portrait tag modeling method, apparatus, electronic equipment and readable storage medium storing program for executing | |
CN105653562B (en) | The calculation method and device of correlation between a kind of content of text and inquiry request | |
CN106407349A (en) | Product recommendation method and device | |
CN106021374A (en) | Underlay recall method and device for query result | |
CN104462611A (en) | Modeling method, ranking method, modeling device and ranking device for information ranking model | |
CN103279879A (en) | Method for online valuation of used cars | |
CN104166732A (en) | Project collaboration filtering recommendation method based on global scoring information | |
CN106021329A (en) | A user similarity-based sparse data collaborative filtering recommendation method | |
CN105893585A (en) | Label data-based bipartite graph model academic paper recommendation method | |
CN103778262A (en) | Information retrieval method and device based on thesaurus | |
CN103164537B (en) | A kind of method of search engine logs data mining of user oriented information requirement | |
CN103365842B (en) | A kind of page browsing recommends method and device | |
Wu et al. | How Web 1.0 fails: the mismatch between hyperlinks and clickstreams | |
CN106355095A (en) | Method for identifying fraud website by utilizing fuzzy theory | |
CN103353865A (en) | Barter electronic trading commodity recommendation method based on position | |
CN104123321B (en) | A kind of determining method and device for recommending picture | |
CN104933149B (en) | A kind of information search method and device | |
CN109034908A (en) | A kind of film ranking prediction technique of combination sequence study | |
CN105653600A (en) | Generation method and device of test question digest information | |
CN103093236B (en) | A kind of pornographic filter method of mobile terminal analyzed based on image, semantic | |
CN101639856B (en) | Webpage correlation evaluation device for detecting internet information spreading |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20181019 Termination date: 20201123 |