CN1728655A - Method and system for detecting and discriminating counterfeit web page - Google Patents

Method and system for detecting and discriminating counterfeit web page Download PDF

Info

Publication number
CN1728655A
CN1728655A CN 200410009873 CN200410009873A CN1728655A CN 1728655 A CN1728655 A CN 1728655A CN 200410009873 CN200410009873 CN 200410009873 CN 200410009873 A CN200410009873 A CN 200410009873A CN 1728655 A CN1728655 A CN 1728655A
Authority
CN
China
Prior art keywords
webpage
similarity
similar
network address
true
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200410009873
Other languages
Chinese (zh)
Other versions
CN1319331C (en
Inventor
刘文印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Donghua faster Software Co. Ltd.
Original Assignee
刘文印
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 刘文印 filed Critical 刘文印
Priority to CNB2004100098735A priority Critical patent/CN1319331C/en
Publication of CN1728655A publication Critical patent/CN1728655A/en
Application granted granted Critical
Publication of CN1319331C publication Critical patent/CN1319331C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The method includes steps: network address generation step in use for generating network address or all network address needed similar to true network address; a step for calculating degree of similarity between web pages is in use for calculating degree of similarity between possible similar web page and true web page in order to determine whether it is similarity between possible similar web page and true web page; the possible similar web page is web page corresponding to similar network address generated, and the true web page is a web page corresponding to true network address; a alarming step in use for carrying out alarming operation when similar web page appears. Degree of similarity of web pages is calculated through comparing structures and/or styles and/or contents and/or colors and/or fonts of web pages. Further, detailed similarity and style similarity of web pages are calculated. Integral similarity is weighted mean of detailed similarity and style similarity.

Description

A kind of method and system of differentiating counterfeit web page that detect
Technical field
The present invention relates to a kind of computer technology of information security field, particularly relate to and a kind ofly can detect the method and system that counterfeit web page is differentiated in detection that whether webpage palmed off on the internet.
Background technology
Fast development along with the Internet, its safety problem constantly occurs, except virus and assault, website on-line finance websites such as (particularly) banks palm off in order to the trick client provides the case of individual's (identity) confidential information such as password or credit card number also to get more and more.Hong Kong Monetary Authority claims, since June in this year, at least 6 personation Web bank cases that relate to Hong Kong bank has taken place.And global similar case is innumerable especially.According to nonprofit organization " anti-online inveigle working group " (Anti-Phishing WorkingGroup) report, this kind be online inveigles case with every month 50% speed increase.5% people can be taken in generally speaking.The recent survey report that relates to 1335 US Internet users that TRUSTe (company of a tame network security) subsidizes says that the trick mail number that 75% user feels that they receive has increased, 35% user can receive the trick mail weekly, 70% user once attracted to related counterfeit web page, 15% has filled in the personal-machine confidential information, and has 2% to be subjected to actual monetary loss.This survey report estimates that the U.S. loses 500,000,000 dollars because of counterfeit web page national every year.Most interviewees think the obligated brand of taking measures to protect oneself of the company of being palmed off, and take precautions against this type of case, as authenticate the mail of sending out and network address thereof with technological means.
Existing anti-correlation technique of inveigling and strategy mainly lay particular emphasis on inveigling used " bait ", and---promptly send out and invite mail---handles, as detect and filter such harmful mail (as filtering spam) in client or at the gateway place, or the digital signature of checking mail, or the IP address of the checking mail that sends is to determine its authenticity.This class methods reliability is not high, can not deal with problems fully, and bring very big burden to the client, needs to install and the study related software.Double verification is taked in being included in that one end can be done in the website when landing, the software (as digital certificates) or the hardware device (as smart card) that promptly adopt the website before to provide.But these class methods not only management cost are very high, and the facility of Online activities is had a greatly reduced quality.
Summary of the invention
Technical problem to be solved by this invention is to provide a kind of method and system of differentiating counterfeit web page that detect, and these method and system can make an initiative sally on the internet and search the webpage similar to certain true web page contents to determine whether this true webpage is palmed off.All websites can be used the system of this invention and technology to detect the situation of being palmed off own website automatically and be prevented potential loss to take countermeasure.
To achieve these goals, the invention provides a kind of method of differentiating counterfeit web page that detects, be used for detecting webpage and whether palmed off on the internet, wherein, comprising:
One network address generates step, and with generating required network address, described network address comprises the network address similar with true network address or all network address that can obtain or appears at all network address the mail from what mail server obtained from name server;
One webpage similarity calculation procedure, being used for calculating may similar web page and the similarity of true webpage, and whether described may similar web page similar with true webpage to judge, wherein, described possibility similar web page be the webpage of the network address correspondence of described generation; Described true webpage is the webpage of described true network address correspondence;
One alarming step is used for reporting to the police when similar webpage occurring.
The method of counterfeit web page is differentiated in described detection, and wherein, described network address generates step and further comprises:
The symbol that substitutes in the true network address with similarity sign carries out conversion; Or
Add that on true network address crucial front and back sew; Or
With the word order transposing that occurs in the true network address; Or
From name server, obtain all domain names, therefrom select the domain name similar to true network address and on all network address; Or
Obtain all network address that appear at the mail from mail server, therefrom select and true all similar network address of network address.
The method of counterfeit web page is differentiated in described detection, wherein, described webpage similarity calculation procedure be by more described may similar web page and the structure of described true webpage and/or the vision similarity of style and/or content and/or color and/or font calculate, further comprise:
Steps A is calculated the similar webpage of described possibility and the details similarity and the style similarity of described true webpage;
Step B calculates the similar webpage of described possibility and the overall similarity of described true webpage, and wherein, described overall similarity is the weighted average of described details similarity and described style similarity.
The method of counterfeit web page is differentiated in described detection, and wherein, the webpage that the described possibility of described calculating is similar and the step of described true webpage details similarity further comprise:
Step 1, cutting apart described possibility similar web page and described true webpage is basic block, the classification piecemeal that forms described webpage is represented structure;
Step 2, calculate described may similar web page basic block and the similarity of the basic block of described true webpage, find out described may similar web page on the piece the most similar to each basic block on the described true webpage mate, wherein, the similarity of described basic block is its weighted average in the similarity of each characteristic aspect;
Step 3 is calculated the similarity of the piece of described webpage on each rank, and described high level similarity is the weighted average of the described low-level similarity that is complementary;
Step 4 is calculated the similar webpage of described possibility and the details similarity of described true webpage, and the details similarity of described webpage is the weighted average of the similarity of the crucial piece of all appointments on the described true webpage; Described crucial piece comprises single or multiple basic blocks;
Wherein, described weights can generate or preestablish by hand generation automatically according to the rule of setting.
The method of counterfeit web page is differentiated in described detection, and wherein, the step of described computational details similarity further comprises judges that described two basic blocks of comparing are the text or the step of image; When described basic block all was text, described characteristic aspect comprised content and/or the color of text and text and/or size and/or the border of text and/or font and/or the arrangement mode of text and/or the chained address of text of text of text of text; When described basic block all was image, described characteristic aspect comprised picture material and/or color of image and/or picture size and/or image source file and/or image links address; When in the described basic block one be one in text during for image, further comprise the mode of calling OCR or by hand the mode of mark discern the literal of described image.
The method of counterfeit web page is differentiated in described detection, and wherein, the step of described computational details similarity further comprises the step of the layout structure of more described webpage; The coupling of described basic block is based upon on the basis of the position relationship between the crucial piece of described basic block or described appointment.
The method of counterfeit web page is differentiated in described detection, and wherein, the step of the similarity of described each characteristic aspect of calculating further comprises the type of the value of judging described characteristic aspect; When the value of described characteristic aspect was discrete type, the similarity of described characteristic aspect can get 1 or 0; When the value of described characteristic aspect was continuous type, the similarity of described characteristic aspect depended on the poor of characteristic value, and the more little similarity of difference is big more.
The method of counterfeit web page is differentiated in described detection, it is characterized in that the style similarity of described webpage replaces with the coefficient correlation of the distribution block diagram of characteristic value on webpage of each characteristic aspect of described webpage, and as less than 0, then establishing similarity is 0; Wherein, described each characteristic aspect comprises that the color of the content of described webpage and/or webpage comprises the border of the piece that occurs in the dominant hue of webpage and/or the webpage and/or the arrangement and the modification of font and/or line space and/or text.
The method of counterfeit web page is differentiated in described detection, it is characterized in that, the step of described calculating style similarity further comprises, calculates the weighted average of similarity of each characteristic aspect of described webpage; Wherein, described weights can generate or preestablish by hand generation automatically according to the rule of setting.
The method of counterfeit web page is differentiated in described detection, and wherein, the described webpage of cutting apart is that the step of basic block is further to comprise:
Step 1 accesses the DOM Document Object Model of described webpage;
Step 2 is removed the useless node in the described DOM Document Object Model;
Step 3 is identified for separating the separator of zones of different;
Step 4 determines that in descendants's node of described DOM Document Object Model, the highest level node that does not comprise separator is the basic block of described webpage;
Step 5, by close, similar, and relevant principle upwards merge step by step, the new classification piecemeal that forms described webpage is represented structure.
The method of counterfeit web page is differentiated in described detection, and wherein, described crucial piece comprises password and/or needs the text and/or the corporate logo zone of input information.
The method of counterfeit web page is differentiated in described detection, and wherein, the appointment of described crucial piece can mark or mark by hand by the user automatically.
The method of counterfeit web page is differentiated in described detection, wherein, the described webpage and the overall similarity of described true webpage during that may be similar greater than a setting threshold, described alarm module is reported to the police; The described webpage and the details similarity of described true webpage during that may be similar greater than a setting threshold, described alarm module is reported to the police; The described webpage and the style similarity of described true webpage during that may be similar greater than a setting threshold, described alarm module is reported to the police; Perhaps, when the crucial piece that described weights are big appeared on the described webpage that may be similar, described alarm module was reported to the police; Perhaps, occur on the described webpage that may be similar reaching one when setting threshold values regional with the big crucial piece similarity of described weights, described alarm module is reported to the police.
Whether the present invention also provides a kind of system that differentiates counterfeit web page that detects, be used for detecting webpage and palmed off on the internet, wherein, comprising:
One network address generation module, with generating required network address, described network address comprises the network address similar with true network address or all network address that can obtain or appears at all network address the mail from what mail server obtained from name server;
One webpage similarity calculation module, being used for calculating may similar web page and the similarity of true webpage, and whether described may similar web page similar with true webpage to judge, wherein, described possibility similar web page be the webpage of the network address correspondence of described generation; Described true webpage is the webpage of described true network address correspondence;
One alarm module is used for reporting to the police when similar webpage occurring.
The system of counterfeit web page is differentiated in described detection, and wherein, described webpage similarity calculation module is to calculate by the structure that compares the page and/or the vision similarity of style and/or content and/or color and/or font, further comprises:
One details similarity calculation module is used to calculate the similar webpage of described possibility and the details similarity of described true webpage;
One style similarity calculation module is used to calculate the similar webpage of described possibility and the style similarity of described true webpage;
One overall similarity computing module is used to calculate the similar webpage of described possibility and the overall similarity of described true webpage, and wherein, described overall similarity is the weighted average of described details similarity and described style similarity.
A kind of providing is provided detects the e-commerce website of differentiating counterfeit web page, wherein, comprising:
One customer consignment module is used for receiving client's trust;
One client's authentication module is used for checking whether the client pays the fees and client identity is authenticated;
One network address generation module detects the similar network address of differentiating of true network address with generating to customer requirement;
One webpage similarity calculation module, be used for calculating the similarity of described possibility similar web page and described true webpage, whether similar to judge described possibility similar web page with true webpage, wherein, described may similar web page be the webpage of the similar network address correspondence of described generation, the webpage of the true network address correspondence that described true webpage provides for described client;
One testing result is reported module, is used for reporting the result who detects discriminating to the client.
Therefore method and system of the present invention are widely used, and any enterprise or individual's webpage all might be palmed off by the people, all need to use these method and system to protect the Proprietary Information content of own issue, as brand, indicate related news, product information etc.Whether any enterprise or individual can use system of the present invention to come automatic inspection to have people's malice to palm off its webpage on the machine of oneself, and in client any software are installed without the client, can not bring any burden to the client.Simultaneously, whether any service intermediary can use this system to come automatic inspection to have people's malice to palm off its client (tissue, enterprise or individual's) webpage on the machine of oneself, takes measures on customs clearance then.Method of the present invention has objectivity, and webpage measuring similarity module is by webpage layout structure relatively, whole style, and content (text or picture material), and the vision similarity of color, font comes webpage is measured.Method and system of the present invention have the characteristics of many granularities (at many levels).The webpage layout structure is cut apart by carrying out the page earlier, remerges, and extracts significant region unit on a plurality of ranks.Earlier to each piece (being called basic block) on the bottom on the true webpage, on false webpage, look for its in color, font, the most similar pieces in aspect such as content mate, then, on this basis, the piece of two webpages on each rank calculated its similarity, finally calculate the similarity of two webpages.Method and system of the present invention can be specified the focal point zone, can mark some key areas automatically and (as comprise some sensitive blocks, zone as text such as various information that need input such as password and corporate logo etc.), the user also can mark some key areas and crux word by hand.When calculating similarity, can pay close attention to these key areas and crux word, strengthen weights.Even as finding that these key areas or crux word are comprised on the false webpage, just report to the police, can be made as its weights very big this moment, even be 1.If the emphasis territory is not set, system is then according to area size or the average weights that distribute automatically.
Describe the present invention below in conjunction with the drawings and specific embodiments, but not as a limitation of the invention.
Description of drawings
Fig. 1 is a structural representation of the present invention;
Fig. 2 is the flow chart of similarity calculation module of the present invention;
Fig. 3 is a flow chart of cutting apart Web page module of the present invention;
Fig. 4 is a part and the segmentation result thereof of the true webpage of eBay;
Fig. 5 is a part and the segmentation result thereof of eBay counterfeit web page.
Embodiment
As shown in Figure 1, system of the present invention comprises a module that automatically generates similar network address or all possible network address, a webpage similarity calculation module and an alarm module of judging that two webpages are whether similar.This system can be installed on the server of enterprise oneself or on the server of intermediary.This system calls the network address generation module earlier and generates and true similar all network address or all possible network address of network address when operation.The generation method of the similar network address in this module has a lot, substitute (as " 1 " and " 1 " etc.) as the symbol in the true network address with similarity sign and carry out various conversion, sew the front and back that add some keys, as " bank " and " card " or with the transposing of the word order of the appearance in the true network address, make " onlinebank " etc. into as " bankonline ".All possible network address comprises that all and true network address may be relevant, through the network address that generates after certain conversion.Generate required network address, even can be after domain name registration company obtains all domain names and obtains all network address then or obtain all domain names from name server, therefrom select to the domain name of true network address similar (being that editing distance is less than certain threshold value) and on all network address.Perhaps can also obtain all network address that appear at the mail, therefrom select and true all similar network address of network address from mail server.For the network address of arbitrary generation, access its webpage (to call false webpage in the following text) and true webpage (to call true webpage in the following text) relatively, call similarity calculation module, as judge similarly, then report to the police.
As shown in Figure 2, when judging that two webpages are whether similar, call earlier and cut apart Web page module webpage is divided into smallest meaningful unit is basic block (step 201), all be divided into a plurality of basic blocks as Fig. 4 and webpage shown in Figure 5, outward appearance (vision) feature of the whole style and features of the webpage of reentrying and each piece (scape color before and after comprising, font etc.) and content (text or picture material) feature.Then, on the basis of basic block coupling, calculate two pages details similarity, style similarity, and overall similarity (step 202).Judge that overall similarity whether greater than the threshold value (step 203) that sets, then reports to the police (step 204) during greater than threshold value, then finish when being not more than threshold value.Or on another webpage, have highly similar piece to match at a certain crucial piece on the true webpage (can artificially specify certain basic block is that crucial piece or the bulk that comprises a plurality of basic blocks are crucial piece), also to report to the police.
Fig. 3 is a flow chart of cutting apart Web page module.We access the DOM Document Object Model (DOM) of this webpage earlier, and it is a tree-shaped representation.Because a variety of causes, some node does not wherein occupy the effective coverage, as long and wide all be 0, or the union in the zone of its zone and its all child nodes is identical.These useless nodes all will remove (step 301) earlier, determine the separator of separating zones of different that is useful on then.Separator itself does not comprise child node, does not comprise literal etc. yet, just the elongated image of some fillets.After knowing all separator nodes, the highest level node that does not comprise separator in those descendants's nodes is taken as significant unit minimum on the webpage, is called basic block.For example, basic block 9 among Fig. 4, two sections literal and a button do not contain separator, but they are not basic blocks, because basic block 9 does not also contain separator and rank is higher, be the highest level node that does not comprise separator recited above, but the father of piece 9 has just comprised the separator on the right, so piece 9 is basic blocks.We carry out upwards merging step by step (step 302) belonging to basic block same father, adjacent, that only comprise literal according to the principle of similar (having identical font, color), close (present position is approaching) or relevant (have between basic block and the basic block certain related) then, and a new classification piecemeal that forms this webpage is represented structure.
Fig. 4 is a part and the segmentation result thereof of the true webpage of eBay.In Fig. 4, the part of true webpage be divided into a plurality of basic blocks, each basic block is partitioned into square frame, comes mark with 0 to 22 numeral respectively.Basic block is the tool smallest meaningful unit, as basic block 4 expression be the piece of requirement input user account number, the piece of the requirement input user cipher of basic block 5 expressions.Wherein, some basic blocks can be combined into bulk respectively, and to bulk of basic block 2 one-tenth capable of being combined, basic block 3 is to bulk of basic block 8 one-tenth also capable of being combined as basic block 0.Some bulks, bulk and the basic block 9 formed to basic block 8 as basic block 3 can be combined into more bulk again, the classification piecemeal that so just can form webpage is represented structure, and what be in the bottom in this hierarchy is exactly basic block, and promptly basic block is the lowermost level of this hierarchy.
Fig. 5 is webpage and the segmentation result thereof of a certain personation eBay, and is same, and a plurality of basic blocks shown in Fig. 5 also is divided into use 0 ' to 22 ' numeral to come mark respectively.
The overall similarity of two webpages then is the details similarity of two webpages and the weighted average of style similarity.The details calculation of similarity degree of webpage is based on the similarity of basic block.The details similarity of webpage is the weighted average of similarity of the crucial piece (as specifying the basic block 3 among Fig. 4, basic block 4 and basic block 21) of all appointments on the true webpage.The weights of each piece can calculate according to rule (as being directly proportional with this piece area occupied) automatically, also can set by manual in advance mark.Similarity between two basic blocks is with its weighted average calculation in the similarity of each characteristic aspect (weights are represented the importance of certain feature, manual in advance usually setting); And depend on the type of the value of this feature in the similarity account form of a certain characteristic aspect, and discrete type in this way, its similarity depends on whether its characteristic value is identical, is 1 as identical then similarity, otherwise similarity is 0.For example, calculate the similarity of certain font characteristic aspect of two, if two font is all identical, as all being the Song typeface, then the similarity value is 1, is the Song typeface and the font of another piece is a black matrix as one font wherein, then similarity is 0, that is these two in font characteristic aspect dissmilarity.If characteristic value is continuous type (as color or a font size), its similarity depends on the poor of its characteristic value, and the more little similarity of difference is big more.By the similarity of calculating basic block can find out on the false webpage with true webpage on the basic block that is complementary of basic block, i.e. the piece of similarity maximum (as required, or the piece of the similarity maximum on the basis of structure of web page coupling).
Table one has been listed the piece of the several crucial piece among Fig. 4 (basic block 3, basic block 4, basic block 5, basic block 11, basic block 12 and the basic block 21 set in the true webpage are crucial piece) match (the most similar) in Fig. 5, and corresponding similarity value.The most similar to the basic block 3 among Fig. 4 as the basic block 3 ' among Fig. 5, its similarity is 0.81.Wherein, the weight of setting content aspect is 20%, and the weight of font aspect is 5%.In 10 speech 3 speech " bidding " are arranged in the basic block 3 in Fig. 4, " selling ", " activities " appear in the basic block 3 ' among Fig. 5, so similarity is 0.3.Aspect font, the font size of the basic block 3 among Fig. 4 is " medium ", the font size of the basic block 3 ' among Fig. 5 is " 9pt " (promptly 9 pounds), so its similarity aspect font size is 0, other characteristic aspect is all identical, and similarity is 1, so the relative degree of these two basic blocks calculates according to its weighted average for the similarity of each characteristic aspect, its value is 0.3 * 20%+1 * 75%=0.81.Equally, for the basic block 4 ' among the basic block among Fig. 44 and Fig. 5, basic block 5 among Fig. 4 and the basic block 5 ' among Fig. 5, the similarity of its font size is 0, other characteristic aspect is all identical, similarity is 1, so the similarity of basic block is accordingly, 1-1 * 5%=0.95, and the basic block 21 ' among Fig. 5 is also the most similar to the basic block 21 among Fig. 4, its each characteristic aspect is all identical, so its similarity value is 1.Correspondingly can calculate the similarity of other basic block.Can assert that webpage shown in Figure 5 is a counterfeit web page this moment, and alarm module is reported to the police.The setting of described crucial piece can be automatic the mark, determines as the area that takies according to piece automatically, or manual mark, be crucial piece as marking basic block 21 by hand.
Basic block among Fig. 4 Basic block among Fig. 5 Similarity
?3 ?3 ?0.81
?4 ?4 ?0.95
?5 ?5 ?0.95
?11 ?12 ?1
?12 ?18 ?0.93
?21 ?21 ?1
Table one
Detect certain whether comprise on the false webpage on the true webpage a bulk of (bulk that is combined into to basic block 8 as the basic block among Fig. 4 3) at need, can specify bulk is crucial piece.Will be when calculating the similarity of bulk based on the similarity of basic block, the relative degree of bulk is the weighted average of the similarity of the basic block that is complementary.Perhaps, when the bigger piece that is combined into when the bulk of specifying basic block 3 to basic block 8 to be combined into and basic block 9 was crucial piece, the similarity of described bigger piece was the weighted average of the similarity of the similarity of the bulk that is combined into of described basic block 3 to basic block 8 and described basic block 9.
In addition, the piece coupling also can be based on layout structure, promptly want position relationship between the maintainance block (the position relation also can be mated) when coupling, at this moment, some crucial pieces can match earlier and be used for two webpages in location (or alignment align) so that mate other piece.
When calculating the details similarity of webpage, each basic block on the true webpage finds basic block the most similar with it on false webpage.The characteristic aspect that can consider when two basic blocks all are text as shown in Table 2.
Characteristic type The feature title The scope of feature description and possibility value
The piece content Inner literal Literal in the piece
The piece color Background color The background color of piece
Foreground color The color of piece Chinese version
Position color The color of piece hyperlink
The dominant hue of background image The dominant hue of background image is not if having background image then the value of getting background color
Border color The block boundary color
Block boundary Boundary types For example, dotted line type, some type, solid line type or other
The thickness on border The width of boundary line
The piece font Word size For example small size, medium size etc.
Word family For example be Arial
Font type Runic for example, italic
Block text The text justification mode For example left-justify, justified
The text decoration mode For example be with underscore, last line, or text flicker
Piece navigation (block navigation) Relative path (href) The address that block chaining is arrived
Table two
When the size of only considering text block, color, content, font, its similarity Sim (Bt, Bf) computing formula is as follows:
Sim (Bt, Bf)=ws*Ss (Bt, Bf)+wc*Sc (Bt, Bf)+and wf*Sf (Bt, Bf)+wt*St (Bt, Bf), wherein, Bt represents the piece on the true webpage, and Bf represents the piece on the false webpage, (Bt Bf) is the block size similarity to Ss, for (min (0, (Wt-Wf)/Wt)+min (0, (Ht-Hf)/Ht))/2, W is that piece is wide, and H is the piece height.(Bt Bf) is the color similarity degree to Sc, is 1 if color is identical, otherwise is zero, and (Bt Bf) is the similarity of font size to Sf, is 1 if font size is identical, and it is that 1 duration is 0.5 that size differs, otherwise is zero.(Bt Bf) is the word content similarity to St, and for appearing at the ratio on the Bf in the crux word on the Bt, ws+wc+wf+wt=1 is every weights.The span of every weights herein is [0,1], is to represent that this did not participate in calculating at 0 o'clock, is to represent only to participate in calculating with this in 1 o'clock.
In two basic blocks, there is one to be image, when another piece is text, then call OCR (OpticalCharacter Recognition OCR, existing a lot of off-the-shelfs) and discern its literal, its calculation of similarity degree is with two situations that basic block all is a text then.Wherein, if Bt is an image, also can mark keyword by hand.
When two basic blocks all were image, the characteristic aspect that can consider as shown in Table 3.
Characteristic type The feature title The scope of feature description and possibility value
The piece content Replace content (alt) The alternative word content of image is usually used in describing image
The piece color The image dominant hue Maximum colors appears in the image
The piece size The image display width The width that image demonstrates
Image shows height The height that image demonstrates
The image developed width The width of image reality
The image actual height The height of image reality
The image source file Image type Gif for example, jpg, bmp
Document size The size of image file is represented with byte
The time of document creation The date of image creation
Source file (src) The image source file name
Piece navigation (block navigation) Relative path (href) The address that block chaining is arrived
Table three
When the size of only considering image block and content, its similarity Sim (Bt, Bf) computing formula is as follows:
Sim(Bt,Bf)=ws*Ss(Bt,Bf)+wg*Sg(Bt,Bf),
(Bt, computational methods Bf) are identical with the computational methods of corresponding entry in the text block for Ss.(Bt Bf) is the similarity of image content features to Sg, can adopt existing CBIR (CBIR) method to calculate.Wherein, ws+wg=1 is every weights.
The details similarity of two webpages be all appointments on the true webpage crucial piece (Bi) similarity weighted average Sim (Pt, Pf)=wi*Sim (Bi, Bf (i)).Bf (i) is a piece the most similar to Bi on the false webpage, the piece that is complementary with Bi on the promptly false webpage, also be on the false webpage with the piece of Bi similarity maximum.
The whole style and features of webpage comprises dominant hue, font, line space of literal and image on the webpage etc.The similarity of the style aspect of two webpages is exactly the weighted average in the similarity of each characteristic aspect.Table four has specifically illustrated admissible each characteristic aspect.(or histogram--coefficient correlation (correlation coefficient) histogram) replaces, and as less than 0, then establishing similarity is 0 with the distribution block diagram of its characteristic value on two webpages in the similarity of each characteristic aspect.For example, the histogram of the characteristic value of font on true webpage be, uses 5 basic blocks that have of the Song typeface, and that uses black matrix has 8; Histogram aspect color is, with red have 3, black has 8, and green has 5, or the like.For the bulk of forming by a plurality of basic blocks, can calculate the style similarity of bulk when needing with reference to web page style calculation of similarity degree method.
Characteristic type The feature title The scope of feature description and possibility value
Content of pages The information relevant with login Remind the user to import sensitive information
Copyright information The text message of relevant copyright in the webpage
Web page title The title of webpage
Page color Background color The distribution of webpage background color
Foreground color The distribution of web page text color
Position color The distribution of webpage hyperlink color
The image dominant hue The distribution of the dominant hue of image in the webpage
The main color of sign The main color that occurs in the sign image
Block boundary Boundary types The distribution of boundary types in the webpage, dotted line for example, dotted line, solid line etc.
Border width The distribution of border width in the webpage
Page font Font size The distribution of font size in the webpage
Word family The distribution of word family in the webpage
Font type The distribution of font type in the webpage (as black matrix, italic)
Page text Text is arranged The distribution that the webpage Chinese version is arranged
Text decoration The distribution that the webpage Chinese version is modified (as underlining the word flicker)
Table four
The overall similarity of two webpages then is the details similarity of two webpages and the weighted average of style similarity.Can set when the value of the details similarity of the overall similarity of webpage or webpage or style similarity surpasses specified threshold value, assert that described webpage is counterfeit web page really, alarm module is reported to the police.Perhaps also can set the big crucial piece of some weights zone of attaching most importance to, when the key area appeared on the false webpage, described alarm module was reported to the police; Perhaps, reach one when setting threshold values regional when occurring similarity with the key area on the false webpage, described alarm module is reported to the police.Accordingly, also can set the condition that alarm module is reported to the police as required.
Utilize the method for detection discriminating counterfeit web page of the present invention that an e-commerce website that provides detection to differentiate the counterfeit web page service can be provided, this e-commerce website comprises: a customer consignment module is used for receiving the trust that customer requirement is differentiated webpage; One client's authentication module is used for checking whether the client pays the fees and client identity is authenticated; One network address generation module detects the similar network address of differentiating of true network address with generating to customer requirement; One webpage similarity calculation module, be used for calculating the similarity of described possibility similar web page and described true webpage, whether similar to judge described possibility similar web page with true webpage, wherein, described may similar web page be the webpage of the similar network address correspondence of described generation, the webpage of the true network address correspondence that described true webpage provides for described client; One testing result is reported module, is used for reporting the result who detects discriminating to the client.
Certainly; the present invention also can have other various embodiments; under the situation that does not deviate from spirit of the present invention and essence thereof; those of ordinary skill in the art work as can make various corresponding changes and distortion according to the present invention, but these corresponding changes and distortion all should belong to the protection range of the appended claim of the present invention.

Claims (16)

1, whether a kind of method of differentiating counterfeit web page that detects is used for detecting webpage and is palmed off on the internet, it is characterized in that, comprising:
One network address generates step, and with generating required network address, described network address comprises the network address similar with true network address or all network address that can obtain or appears at all network address the mail from what mail server obtained from name server;
One webpage similarity calculation procedure, being used for calculating may similar web page and the similarity of true webpage, and whether described may similar web page similar with true webpage to judge, wherein, described possibility similar web page be the webpage of the network address correspondence of described generation; Described true webpage is the webpage of described true network address correspondence;
One alarming step is used for reporting to the police when similar webpage occurring.
2, the method for counterfeit web page is differentiated in detection according to claim 1, it is characterized in that, described network address generates step and further comprises:
The symbol that substitutes in the true network address with similarity sign carries out conversion; Or
Add that on true network address crucial front and back sew; Or
With the word order transposing that occurs in the true network address; Or
From name server, obtain all domain names, therefrom select the domain name similar to true network address and on all network address; Or
Obtain all network address that appear at the mail from mail server, therefrom select and true all similar network address of network address.
3, the method for counterfeit web page is differentiated in detection according to claim 1, it is characterized in that, described webpage similarity calculation procedure be by more described may similar web page and the structure of described true webpage and/or the vision similarity of style and/or content and/or color and/or font calculate, further comprise:
Steps A is calculated the similar webpage of described possibility and the details similarity and the style similarity of described true webpage;
Step B calculates the similar webpage of described possibility and the overall similarity of described true webpage, and wherein, described overall similarity is the weighted average of described details similarity and described style similarity.
4, the method for counterfeit web page is differentiated in detection according to claim 3, it is characterized in that, the webpage that the described possibility of described calculating is similar and the step of described true webpage details similarity further comprise:
Step 1, cutting apart described possibility similar web page and described true webpage is basic block, the classification piecemeal that forms described webpage is represented structure;
Step 2, calculate described may similar web page basic block and the similarity of the basic block of described true webpage, find out described may similar web page on the piece the most similar to each basic block on the described true webpage mate, wherein, the similarity of described basic block is its weighted average in the similarity of each characteristic aspect;
Step 3 is calculated the similarity of the piece of described webpage on each rank, and described high level similarity is the weighted average of the described low-level similarity that is complementary;
Step 4 is calculated the similar webpage of described possibility and the details similarity of described true webpage, and the details similarity of described webpage is the weighted average of the similarity of the crucial piece of all appointments on the described true webpage; Described crucial piece comprises single or multiple basic blocks;
Wherein, described weights can generate or preestablish by hand generation automatically according to the rule of setting.
5, the method for counterfeit web page is differentiated in detection according to claim 4, it is characterized in that, the step of described computational details similarity further comprises judges that described two basic blocks of comparing are the text or the step of image; When described basic block all was text, described characteristic aspect comprised content and/or the color of text and text and/or size and/or the border of text and/or font and/or the arrangement mode of text and/or the chained address of text of text of text of text; When described basic block all was image, described characteristic aspect comprised picture material and/or color of image and/or picture size and/or image source file and/or image links address; When in the described basic block one be one in text during for image, further comprise the mode of calling OCR or by hand the mode of mark discern the literal of described image.
6, the method for counterfeit web page is differentiated in detection according to claim 5, it is characterized in that the step of described computational details similarity further comprises the step of the layout structure of more described webpage; The coupling of described basic block is based upon on the basis of the position relationship between the crucial piece of described basic block or described appointment.
7, according to the method for claim 5 or 6 described detections discriminating counterfeit web pages, it is characterized in that the step of the similarity of described each characteristic aspect of calculating further comprises the type of the value of judging described characteristic aspect; When the value of described characteristic aspect was discrete type, the similarity of described characteristic aspect can get 1 or 0; When the value of described characteristic aspect was continuous type, the similarity of described characteristic aspect depended on the poor of characteristic value, and the more little similarity of difference is big more.
8, the method for counterfeit web page is differentiated in detection according to claim 3, it is characterized in that, the style similarity of described webpage replaces with the coefficient correlation of the distribution block diagram of characteristic value on webpage of each characteristic aspect of described webpage, and as less than 0, then establishing similarity is 0; Wherein, described each characteristic aspect comprises that the color of the content of described webpage and/or webpage comprises the border of the piece that occurs in the dominant hue of webpage and/or the webpage and/or the arrangement and the modification of font and/or line space and/or text.
9, the method for counterfeit web page is differentiated in detection according to claim 8, it is characterized in that, the step of described calculating style similarity further comprises, calculates the weighted average of similarity of each characteristic aspect of described webpage; Wherein, described weights can generate or preestablish by hand generation automatically according to the rule of setting.
10, the method for counterfeit web page is differentiated in detection according to claim 4, it is characterized in that, the described webpage of cutting apart is that the step of basic block is further to comprise:
Step 1 accesses the DOM Document Object Model of described webpage;
Step 2 is removed the useless node in the described DOM Document Object Model;
Step 3 is identified for separating the separator of zones of different;
Step 4 determines that in descendants's node of described DOM Document Object Model, the highest level node that does not comprise separator is the basic block of described webpage;
Step 5, by close, similar, and relevant principle upwards merge step by step, the new classification piecemeal that forms described webpage is represented structure.
11, the method for counterfeit web page is differentiated in detection according to claim 4, it is characterized in that described crucial piece comprises password and/or needs the text and/or the corporate logo zone of input information.
12, the method for counterfeit web page is differentiated in detection according to claim 11, it is characterized in that the appointment of described crucial piece can mark or mark by hand by the user automatically.
13, differentiate the method for counterfeit web page according to claim 4,5,6,8,9,10,11 or 12 described detections, it is characterized in that, the described webpage and the overall similarity of described true webpage during that may be similar greater than a setting threshold, described alarm module is reported to the police; The described webpage and the details similarity of described true webpage during that may be similar greater than a setting threshold, described alarm module is reported to the police; The described webpage and the style similarity of described true webpage during that may be similar greater than a setting threshold, described alarm module is reported to the police; Perhaps, when the crucial piece that described weights are big appeared on the described webpage that may be similar, described alarm module was reported to the police; Perhaps, occur on the described webpage that may be similar reaching one when setting threshold values regional with the big crucial piece similarity of described weights, described alarm module is reported to the police.
14, whether a kind of system that differentiates counterfeit web page that detects is used for detecting webpage and is palmed off on the internet, it is characterized in that, comprising:
One network address generation module, with generating required network address, described network address comprises the network address similar with true network address or all network address that can obtain or appears at all network address the mail from what mail server obtained from name server.
One webpage similarity calculation module, being used for calculating may similar web page and the similarity of true webpage, and whether described may similar web page similar with true webpage to judge, wherein, described possibility similar web page be the webpage of the network address correspondence of described generation; Described true webpage is the webpage of described true network address correspondence;
One alarm module is used for reporting to the police when similar webpage occurring.
15, the system of counterfeit web page is differentiated in detection according to claim 14, it is characterized in that, described webpage similarity calculation module is to calculate by the structure that compares the page and/or the vision similarity of style and/or content and/or color and/or font, further comprises:
One details similarity calculation module is used to calculate the similar webpage of described possibility and the details similarity of described true webpage;
One style similarity calculation module is used to calculate the similar webpage of described possibility and the style similarity of described true webpage;
One overall similarity computing module is used to calculate the similar webpage of described possibility and the overall similarity of described true webpage, and wherein, described overall similarity is the weighted average of described details similarity and described style similarity.
16, a kind of providing detected the e-commerce website of differentiating counterfeit web page, it is characterized in that, comprising:
One customer consignment module is used for receiving client's trust;
One client's authentication module is used for checking whether the client pays the fees and client identity is authenticated;
One network address generation module detects the similar network address of differentiating of true network address with generating to customer requirement;
One webpage similarity calculation module, be used for calculating the similarity of described possibility similar web page and described true webpage, whether similar to judge described possibility similar web page with true webpage, wherein, described may similar web page be the webpage of the similar network address correspondence of described generation, the webpage of the true network address correspondence that described true webpage provides for described client;
One testing result is reported module, is used for reporting the result who detects discriminating to the client.
CNB2004100098735A 2004-11-25 2004-11-25 Method and system for detecting and discriminating counterfeit web page Active CN1319331C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100098735A CN1319331C (en) 2004-11-25 2004-11-25 Method and system for detecting and discriminating counterfeit web page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100098735A CN1319331C (en) 2004-11-25 2004-11-25 Method and system for detecting and discriminating counterfeit web page

Publications (2)

Publication Number Publication Date
CN1728655A true CN1728655A (en) 2006-02-01
CN1319331C CN1319331C (en) 2007-05-30

Family

ID=35927680

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100098735A Active CN1319331C (en) 2004-11-25 2004-11-25 Method and system for detecting and discriminating counterfeit web page

Country Status (1)

Country Link
CN (1) CN1319331C (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030847B (en) * 2007-03-30 2010-06-02 刘文印 Method and system for discriminating cheat by unified code
CN101594261B (en) * 2008-05-28 2011-06-15 北京百问百答网络技术有限公司 Forgery website monitoring method and system thereof
CN102436564A (en) * 2011-12-30 2012-05-02 奇智软件(北京)有限公司 Method and device for identifying falsified webpage
CN101667979B (en) * 2009-10-12 2012-06-06 哈尔滨工程大学 System and method for anti-phishing emails based on link domain name and user feedback
CN102521369A (en) * 2011-12-16 2012-06-27 山东师范大学 Multi-view web spam detection method
WO2012089005A1 (en) * 2010-12-31 2012-07-05 成都市华为赛门铁克科技有限公司 Method and apparatus for phishing web page detection
CN102611691A (en) * 2012-01-12 2012-07-25 深信服网络科技(深圳)有限公司 Method, system and gateway device for detecting phishing websites
CN102622435A (en) * 2012-02-29 2012-08-01 百度在线网络技术(北京)有限公司 Method and device for detecting black chain
CN102664878A (en) * 2012-04-10 2012-09-12 中国科学院计算机网络信息中心 Method and equipment for detection of counterfeit domain names
CN102957664A (en) * 2011-08-17 2013-03-06 阿里巴巴集团控股有限公司 Method and device for identifying phishing websites
CN103136251A (en) * 2011-11-29 2013-06-05 星云融创(北京)科技有限公司 Method and device of webpage identification
CN103218358A (en) * 2012-01-18 2013-07-24 百度在线网络技术(北京)有限公司 Diff scoring method and system
CN103365879A (en) * 2012-03-29 2013-10-23 北京百度网讯科技有限公司 Method and device for obtaining page similarity
CN103778371A (en) * 2012-10-22 2014-05-07 腾讯科技(深圳)有限公司 Plug-in installation monitoring method and terminal
CN103927480A (en) * 2013-01-14 2014-07-16 腾讯科技(深圳)有限公司 Method, device and system for identifying malicious web page
CN104111960A (en) * 2013-04-22 2014-10-22 阿里巴巴集团控股有限公司 Page matching method and device
CN104133870A (en) * 2014-07-22 2014-11-05 哈尔滨工业大学(威海) Web page similarity calculation method and web page similarity calculation device
CN104143008A (en) * 2014-08-11 2014-11-12 北京奇虎科技有限公司 Method and device for detecting phishing webpage based on picture matching
CN104281703A (en) * 2014-10-22 2015-01-14 小米科技有限责任公司 Method and device for calculating similarity among uniform resource locators (URL)
CN105656704A (en) * 2014-11-12 2016-06-08 腾讯数码(天津)有限公司 Page abnormity detection method, device and system
CN106127042A (en) * 2016-07-06 2016-11-16 苏州仙度网络科技有限公司 Webpage visual similarity recognition method
CN106156053A (en) * 2015-03-27 2016-11-23 阿里巴巴集团控股有限公司 Webpage skin change method, Apparatus and system
CN106686192A (en) * 2015-11-09 2017-05-17 中国移动通信集团公司 Counterfeit number identification method and counterfeit number identification device
CN108021692A (en) * 2017-12-18 2018-05-11 北京天融信网络安全技术有限公司 A kind of method of web page monitored, server and computer-readable recording medium
CN108664584A (en) * 2018-05-07 2018-10-16 秦德玉 Infringement site search recognition methods and device
CN109977337A (en) * 2019-02-25 2019-07-05 北京三快在线科技有限公司 A kind of webpage design control methods, device, equipment and readable storage medium storing program for executing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2357599B (en) * 1999-12-23 2004-08-04 Ibm Method for preventing parasitic usage of web page embedded files
JP4817550B2 (en) * 2001-07-31 2011-11-16 株式会社ブロードリーフ Design support method, design support program, design support system
CN1536510A (en) * 2003-04-10 2004-10-13 邱金龙 Method and system for filtering column tube information by utilizing picture and characters identification technique

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030847B (en) * 2007-03-30 2010-06-02 刘文印 Method and system for discriminating cheat by unified code
CN101594261B (en) * 2008-05-28 2011-06-15 北京百问百答网络技术有限公司 Forgery website monitoring method and system thereof
CN101667979B (en) * 2009-10-12 2012-06-06 哈尔滨工程大学 System and method for anti-phishing emails based on link domain name and user feedback
WO2012089005A1 (en) * 2010-12-31 2012-07-05 成都市华为赛门铁克科技有限公司 Method and apparatus for phishing web page detection
US9218482B2 (en) 2010-12-31 2015-12-22 Huawei Technologies Co., Ltd. Method and device for detecting phishing web page
CN102957664A (en) * 2011-08-17 2013-03-06 阿里巴巴集团控股有限公司 Method and device for identifying phishing websites
CN102957664B (en) * 2011-08-17 2015-10-14 阿里巴巴集团控股有限公司 A kind of method and device identifying fishing website
CN103136251A (en) * 2011-11-29 2013-06-05 星云融创(北京)科技有限公司 Method and device of webpage identification
CN102521369A (en) * 2011-12-16 2012-06-27 山东师范大学 Multi-view web spam detection method
CN102521369B (en) * 2011-12-16 2014-01-22 山东师范大学 Multi-view web spam detection method
CN102436564A (en) * 2011-12-30 2012-05-02 奇智软件(北京)有限公司 Method and device for identifying falsified webpage
CN102611691A (en) * 2012-01-12 2012-07-25 深信服网络科技(深圳)有限公司 Method, system and gateway device for detecting phishing websites
CN102611691B (en) * 2012-01-12 2015-06-03 深信服网络科技(深圳)有限公司 Method, system and gateway device for detecting phishing websites
CN103218358A (en) * 2012-01-18 2013-07-24 百度在线网络技术(北京)有限公司 Diff scoring method and system
CN102622435B (en) * 2012-02-29 2017-12-12 百度在线网络技术(北京)有限公司 A kind of method and apparatus for detecting black chain
CN102622435A (en) * 2012-02-29 2012-08-01 百度在线网络技术(北京)有限公司 Method and device for detecting black chain
CN103365879A (en) * 2012-03-29 2013-10-23 北京百度网讯科技有限公司 Method and device for obtaining page similarity
CN102664878B (en) * 2012-04-10 2014-09-03 中国科学院计算机网络信息中心 Method and equipment for detection of counterfeit domain names
CN102664878A (en) * 2012-04-10 2012-09-12 中国科学院计算机网络信息中心 Method and equipment for detection of counterfeit domain names
CN103778371A (en) * 2012-10-22 2014-05-07 腾讯科技(深圳)有限公司 Plug-in installation monitoring method and terminal
CN103927480A (en) * 2013-01-14 2014-07-16 腾讯科技(深圳)有限公司 Method, device and system for identifying malicious web page
CN104111960A (en) * 2013-04-22 2014-10-22 阿里巴巴集团控股有限公司 Page matching method and device
CN104133870B (en) * 2014-07-22 2017-06-09 哈尔滨工业大学(威海) A kind of webpage similarity calculating method and device
CN104133870A (en) * 2014-07-22 2014-11-05 哈尔滨工业大学(威海) Web page similarity calculation method and web page similarity calculation device
CN104143008A (en) * 2014-08-11 2014-11-12 北京奇虎科技有限公司 Method and device for detecting phishing webpage based on picture matching
CN104143008B (en) * 2014-08-11 2017-10-27 北京奇虎科技有限公司 The method and device of fishing webpage is detected based on picture match
CN104281703B (en) * 2014-10-22 2018-10-23 小米科技有限责任公司 The method and device of similarity calculation between uniform resource position mark URL
CN104281703A (en) * 2014-10-22 2015-01-14 小米科技有限责任公司 Method and device for calculating similarity among uniform resource locators (URL)
CN105656704A (en) * 2014-11-12 2016-06-08 腾讯数码(天津)有限公司 Page abnormity detection method, device and system
CN105656704B (en) * 2014-11-12 2020-02-18 腾讯数码(天津)有限公司 Page abnormity detection method, device and system
CN106156053A (en) * 2015-03-27 2016-11-23 阿里巴巴集团控股有限公司 Webpage skin change method, Apparatus and system
CN106156053B (en) * 2015-03-27 2020-01-10 阿里巴巴集团控股有限公司 Webpage skin changing method, device and system
CN106686192A (en) * 2015-11-09 2017-05-17 中国移动通信集团公司 Counterfeit number identification method and counterfeit number identification device
CN106686192B (en) * 2015-11-09 2019-12-06 中国移动通信集团公司 counterfeit number identification method and device
CN106127042A (en) * 2016-07-06 2016-11-16 苏州仙度网络科技有限公司 Webpage visual similarity recognition method
CN108021692A (en) * 2017-12-18 2018-05-11 北京天融信网络安全技术有限公司 A kind of method of web page monitored, server and computer-readable recording medium
CN108021692B (en) * 2017-12-18 2022-03-11 北京天融信网络安全技术有限公司 Method for monitoring webpage, server and computer readable storage medium
CN108664584A (en) * 2018-05-07 2018-10-16 秦德玉 Infringement site search recognition methods and device
CN109977337A (en) * 2019-02-25 2019-07-05 北京三快在线科技有限公司 A kind of webpage design control methods, device, equipment and readable storage medium storing program for executing
CN109977337B (en) * 2019-02-25 2022-08-09 北京三快在线科技有限公司 Webpage design comparison method, device and equipment and readable storage medium

Also Published As

Publication number Publication date
CN1319331C (en) 2007-05-30

Similar Documents

Publication Publication Date Title
CN1728655A (en) Method and system for detecting and discriminating counterfeit web page
KR101702614B1 (en) Online fraud detection dynamic scoring aggregation systems and methods
US9621566B2 (en) System and method for detecting phishing webpages
CN108737423B (en) Phishing website discovery method and system based on webpage key content similarity analysis
TWI437452B (en) Web spam page classification using query-dependent data
Zhang et al. A domain-feature enhanced classification model for the detection of Chinese phishing e-Business websites
Blum et al. Lexical feature based phishing URL detection using online learning
KR101863172B1 (en) Document classification using multiscale text fingerprints
CN102790762A (en) Phishing website detection method based on uniform resource locator (URL) classification
WO2012101623A1 (en) Web element spoofing prevention system and method
CN110222695B (en) Certificate picture processing method and device, medium and electronic equipment
RU2676247C1 (en) Web resources clustering method and computer device
CN104504335A (en) Fishing APP detection method and system based on page feature and URL feature
CN112333185A (en) Domain name shadow detection method and device based on DNS (Domain name Server) resolution
CN106357682A (en) Phishing website detecting method
Tan et al. Hybrid phishing detection using joint visual and textual identity
CN113763057B (en) User identity portrait data processing method and device
Mohammed et al. Phishing Detection Using Machine Learning Algorithms
CN1198223C (en) Sexy file judging system and method
Yazhmozhi et al. Natural language processing and Machine learning based phishing website detection system
Layton et al. Determining provenance in phishing websites using automated conceptual analysis
CN113112323A (en) Abnormal order identification method, device, equipment and medium based on data analysis
CN111429110A (en) Store standardization auditing method, device, equipment and storage medium
RU2778460C1 (en) Method and apparatus for clustering phishing web resources based on an image of the visual content
Cui Detection and Analysis of Phishing Attacks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: BEIJING BAIWEN BAIDA NETWORK TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: LIU WENYIN

Effective date: 20070824

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20070824

Address after: 100872, 1207F building, building 59, Zhongguancun Avenue, Haidian District, Beijing

Patentee after: Beijing Baiwenbaida Network Technologies Co., Ltd.

Address before: 100083, No. 35, Wanquan new home, Wanquan Road, Haidian District, Beijing, No. 2, -2-202

Patentee before: Liu Wenyin

ASS Succession or assignment of patent right

Owner name: ZHUHAI FASTER SOFTWARE TECHNOLOGY CO.,LTD.

Free format text: FORMER OWNER: BEIJING BAIWEN BAIDA NETWORK TECHNOLOGIES CO., LTD.

Effective date: 20100730

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100872 ROOM 1207F, CULTURE BUILDING, NO.59, ZHONGGUANCUN STREET, HAIDIAN DISTRICT, BEIJING CITY TO: 519080 ROOM 202-204, BUILDING D1, XIYUAN, NANFANG SOFTWARE PARK, ZHUHAI CITY, GUANGDONG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20100730

Address after: 519080 room 202-204, D1 building, South Software Park, Guangdong, Zhuhai, Xiyuan

Patentee after: Meng Shengguang

Address before: 100872, 1207F building, building 59, Zhongguancun Avenue, Haidian District, Beijing

Patentee before: Beijing Baiwenbaida Network Technologies Co., Ltd.

C56 Change in the name or address of the patentee
CP03 Change of name, title or address

Address after: 519000 Guangdong province Zhuhai Jida Waterfront Road No. 9 third floor 3016, 3018

Patentee after: Guangdong Donghua faster Software Co. Ltd.

Address before: 519080 room 202-204, D1 building, South Software Park, Guangdong, Zhuhai, Xiyuan

Patentee before: Meng Shengguang