CN104580254B - A kind of fishing website identifying system and method - Google Patents

A kind of fishing website identifying system and method Download PDF

Info

Publication number
CN104580254B
CN104580254B CN201510051628.9A CN201510051628A CN104580254B CN 104580254 B CN104580254 B CN 104580254B CN 201510051628 A CN201510051628 A CN 201510051628A CN 104580254 B CN104580254 B CN 104580254B
Authority
CN
China
Prior art keywords
domain name
itself
website
character
target domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510051628.9A
Other languages
Chinese (zh)
Other versions
CN104580254A (en
Inventor
陈营营
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongxiang Technical Service Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510051628.9A priority Critical patent/CN104580254B/en
Priority claimed from CN201210224485.3A external-priority patent/CN102801709B/en
Publication of CN104580254A publication Critical patent/CN104580254A/en
Application granted granted Critical
Publication of CN104580254B publication Critical patent/CN104580254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/51Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of fishing website identifying system and method, it is related to network safety filed.The system includes:Domain Name acquisition unit, domain name statistic unit and web site recognizing unit;Domain name acquiring unit, suitable for collecting the all-links occurred in website to be identified, obtains the corresponding domain name of the link;Domain name statistic unit, the number of times occurred suitable for statistics domain name in the website to be identified, finds the most domain name of occurrence number, is denoted as target domain name;The web site recognizing unit, suitable for judging whether the website to be identified is fishing website according to itself domain name of the target domain name and the website to be identified.The system and method, the identification of fishing website is carried out based on the linking relationship in website, the fishing website of new type can be effectively recognized;Meanwhile, be conducive to the number amount and type of fishing website in abundant fishing website storehouse, be easy to further fishing website to recognize and search, be with a wide range of applications in network safety filed.

Description

A kind of fishing website identifying system and method
Present patent application be the applying date on 06 28th, 2012, it is Application No. 201210224485.3, entitled The divisional application of the Chinese invention patent application of " a kind of fishing website identifying system and method ".
Technical field
The present invention relates to technical field of network security, more particularly to a kind of fishing website identifying system and method.
Background technology
With the development of internet, netizen's quantity increases year by year.In online, except traditional wooden horse, the threat of virus, The quantity of nearly 2 years fishing websites is significantly increased.
Current main fishing website identification technology is by collecting common fishing website, being fabricated to knowledge base, then count The similarity of newfound webpage and the fishing website in knowledge base is calculated, so that it is fishing website to judge whether.
The method that fishing website is recognized above by fishing website knowledge base, is typically only capable to recognize the Fishing net of known class Stand, for the fishing website then None- identified of new type, such as the only related fishing of Bank of China in fishing website knowledge base During website, for counterfeit industrial and commercial bank fishing website with regard to None- identified.
The content of the invention
The technical problem to be solved in the present invention is:How a kind of fishing website identifying system and method are provided, effectively to know The fishing website of other new type.
In order to solve the above technical problems, the present invention provides a kind of fishing website identifying system, it includes:Domain Name acquisition list Member, domain name statistic unit and web site recognizing unit;
Domain name acquiring unit, suitable for collecting the all-links occurred in website to be identified, obtains the link correspondence Domain name;
Domain name statistic unit, the number of times occurred suitable for statistics domain name in the website to be identified, finds out The most domain name of occurrence number, is denoted as target domain name;
The web site recognizing unit, suitable for judging institute according to itself domain name of the target domain name and the website to be identified Whether state website to be identified is fishing website.
Wherein, the web site recognizing unit includes:Comparing subunit and identification subelement;
The comparing subunit, institute is shown suitable for relatively more described target domain name and itself domain name, and in comparative result State target domain name and the own domain famous prime minister simultaneously, judge that the website to be identified is not fishing website;
The identification subelement, suitable for when the target domain name is different from itself domain name, calculating the aiming field Ratio between the occurrence number of name and the occurrence number of itself domain name, and calculate the target domain name with it is described itself Similarity between domain name, and then judge whether the website to be identified is Fishing net according to the ratio and the similarity Stand.
Wherein, the identification subelement includes:Ratio computing module, similarity calculation module and judge module;
The ratio computing module, occurrence is gone out suitable for calculate the occurrence number of the target domain name and itself domain name Ratio between number;
The similarity calculation module, suitable for calculating the similarity between the target domain name and itself domain name;
The judge module, suitable for judging whether the ratio and the similarity meet condition:The ratio is more than pre- Certainty ratio, and the similarity is more than predetermined threshold;If it is satisfied, judging that the website to be identified is fishing website;Otherwise, It is not fishing website to judge the website to be identified.
Wherein, the similarity calculation module includes:Character string contrast submodule, initial value calculating sub module and final value are calculated Submodule;
The character string contrasts submodule, suitable for building the character string of the target domain name and the character of itself domain name The contrast array of string, is arranged on the first row of the contrast array by the character string of the target domain name and holding position is fixed, The character string of itself domain name is arranged on the second row of the contrast array and moved from left to right, in two line character strings Overlapping character is contrasted;
The initial value calculating sub module, suitable for the initial character when the target domain name and the trailing character pair of itself domain name Qi Shi, calculates the first similarity value calculation Q between the target domain name and itself domain name1;When the target domain name When second character aligns with the trailing character of itself domain name, second between the target domain name and itself domain name is calculated Similarity value calculation Q2;The like, when the trailing character of the target domain name aligns with the initial character of itself domain name, meter Calculate the m similarity value calculations Q between the target domain name and itself domain namem;Wherein, m=n1+n2- 1, n1Represent described The string length of target domain name, n2Represent the string length of itself domain name;
The final value calculating sub module, the target domain name and itself domain name are obtained suitable for being calculated according to following formula Between similarity Qmax
Qmax=max { Q1, Q2, Q3... ... Qm}。
Wherein, in the initial value calculating sub module, the i-th similarity value calculation Q is calculated using equation belowi
Qi=Mi 2×Li
Wherein, i is natural number, also, 1≤i≤m;Also,
Mi=si/nmax
Li=ri/nmax
Wherein, riRepresent when ith is contrasted, the character string of the character string of itself domain name and the target domain name In, overlapping character number;nmaxRepresent the character string of itself domain name and longer character in the character string of the target domain name The character number of string;LiRepresent when ith is contrasted, the character string of itself domain name and the character string of the target domain name Duplication;siRepresent when ith is contrasted, it is overlapping in the character string of the character string of itself domain name and the target domain name And identical character number;MiRepresent when ith is contrasted, the word of the character string of itself domain name and the target domain name Accord with the matching rate of string.
Wherein, in the initial value calculating sub module, the i-th similarity value calculation Q is calculated using following manneri
When ith is contrasted, the character string for calculating the target domain name is overlapping with the character string of itself domain name simultaneously And identical character number, it regard described overlapping and identical character number as the i-th similarity value calculation Qi
Wherein, the system also includes:Supplement recognition unit;
The supplement recognition unit, suspicious net is denoted as suitable for will determine that result is shown as the website to be identified of fishing website Stand, and supplement identification is carried out to described the suspected site, show that described the suspected site is still the situation of fishing website in recognition result Under, described the suspected site is sent into fishing website storehouse.
Wherein, the corresponding domain name of the link is the absolute address of the link.
Wherein, the system also includes:Website acquiring unit;
The website acquiring unit, suitable for searching newly-built website to be used as website to be identified.
The present invention also provides a kind of fishing website recognition methods, and it includes step:
The all-links occurred in website to be identified are collected, the corresponding domain name of the link is obtained;
The number of times that statistics domain name occurs in the website to be identified, finds the most domain name of occurrence number, is denoted as Target domain name;
Judge whether the website to be identified is to fish according to itself domain name of the target domain name and the website to be identified Fishnet station.
Wherein, itself domain name according to the target domain name and the website to be identified judges the website to be identified Whether it is fishing website, further comprises step:
Judge whether the target domain name is identical with itself domain name, if it is, judging that the website to be identified is not Fishing website, terminates flow;Otherwise, next step is performed;
The ratio between the occurrence number of the target domain name and the occurrence number of itself domain name is calculated, and it is described Similarity between target domain name and itself domain name, the website to be identified is judged according to the ratio and the similarity Whether it is fishing website.
Wherein, the ratio between the occurrence number of the occurrence number for calculating the target domain name and itself domain name Example, and the similarity between the target domain name and itself domain name, institute is judged according to the ratio and the similarity Whether be fishing website, further comprise step if stating website to be identified:
Calculate the ratio between the occurrence number of the target domain name and the occurrence number of itself domain name;
Calculate the similarity between the target domain name and itself domain name;
Judge whether to meet following condition:The ratio is more than predetermined ratio, and the similarity is more than predetermined threshold; If it is, judging that the website to be identified is fishing website;Otherwise, it is determined that the website to be identified is not fishing website.
Wherein, the similarity calculated between the target domain name and itself domain name, further comprises step:
The contrast array of the character string of the target domain name and the character string of itself domain name is built, by the aiming field The character string of name is arranged on the first row of the contrast array and holding position is fixed, and the character string of itself domain name is set Move in second row for contrasting array and from left to right, character overlapping in two line character strings is contrasted;
When the initial character of the target domain name aligns with the trailing character of itself domain name, calculate the target domain name with The first similarity value calculation Q between itself domain name1;When the second character and itself domain name of the target domain name When trailing character aligns, the second similarity value calculation Q between the target domain name and itself domain name is calculated2;The like, When the trailing character of the target domain name aligns with the initial character of itself domain name, calculate the target domain name with it is described itself M similarity value calculations Q between domain namem;Wherein, m=n1+n2- 1, n1Represent the string length of the target domain name, n2Table Show the string length of itself domain name;
The similarity Q obtained between the target domain name and itself domain name is calculated according to following formulamax
Qmax=max { Q1, Q2, Q3... ... Qm}。
Wherein, it is described when the initial character of the target domain name aligns with the trailing character of itself domain name, calculate described The first similarity value calculation Q between target domain name and itself domain name1;When the target domain name the second character with it is described During the trailing character alignment of itself domain name, the second similarity value calculation between the target domain name and itself domain name is calculated Q2;The like, when the trailing character of the target domain name aligns with the initial character of itself domain name, calculate the aiming field M similarity value calculations Q between name and itself domain namemIn, the i-th similarity value calculation QiCalculation formula it is as follows:
Qi=Mi 2×Li
Wherein, i is natural number, also, 1≤i≤m;Also,
Mi=si/nmax
Li=ri/nmax
Wherein, riRepresent when ith is contrasted, the character string of the character string of itself domain name and the target domain name In, overlapping character number;nmaxRepresent the character string of itself domain name and longer character in the character string of the target domain name The character number of string;LiRepresent when ith is contrasted, the character string of itself domain name and the character string of the target domain name Duplication;siRepresent when ith is contrasted, it is overlapping in the character string of the character string of itself domain name and the target domain name And identical character number;MiRepresent when ith is contrasted, the word of the character string of itself domain name and the target domain name Accord with the matching rate of string.
Wherein, it is described when the initial character of the target domain name aligns with the trailing character of itself domain name, calculate described The first similarity value calculation Q between target domain name and itself domain name1;When the target domain name the second character with it is described During the trailing character alignment of itself domain name, the second similarity value calculation between the target domain name and itself domain name is calculated Q2;The like, when the trailing character of the target domain name aligns with the initial character of itself domain name, calculate the aiming field M similarity value calculations Q between name and itself domain namemIn, calculate the i-th similarity value calculation Q using following manneri
When ith is contrasted, the character string for calculating the target domain name is overlapping with the character string of itself domain name simultaneously And identical character number, it regard described overlapping and identical character number as the i-th similarity value calculation Qi
Wherein, the net to be identified is judged in itself domain name according to the target domain name and the website to be identified Whether stand is also to include step after fishing website:It will determine that result is shown as the website to be identified of fishing website and is denoted as suspicious net Stand, and supplement identification is carried out to described the suspected site, show that described the suspected site is still the situation of fishing website in recognition result Under, described the suspected site is sent into fishing website storehouse.
Wherein, the corresponding domain name of the link is the absolute address of the link.
Wherein, in the all-links collected and occurred in website to be identified, before obtaining the corresponding domain name of the link Also include step:Newly-built website is searched to be used as website to be identified.
The fishing website identifying system and method for the present invention, fishing website is carried out based on the linking relationship in website Identification, can effectively recognize the fishing website of new type;Meanwhile, be conducive in abundant fishing website storehouse the quantity of fishing website and Type, is easy to further fishing website to recognize and search, is with a wide range of applications in network safety filed.
Brief description of the drawings
Fig. 1 is the modular structure schematic diagram of fishing website identifying system described in the embodiment of the present invention one;
Fig. 2 is the modular structure schematic diagram of the web site recognizing unit;
Fig. 3 is the modular structure schematic diagram of the identification subelement;
Fig. 4 is the modular structure schematic diagram of the similarity calculation module;
Fig. 5 is the modular structure schematic diagram of fishing website identifying system described in the embodiment of the present invention two;
Fig. 6 is the flow chart of fishing website recognition methods described in the embodiment of the present invention three;
Fig. 7 is the flow chart of fishing website recognition methods described in the embodiment of the present invention four.
Embodiment
With reference to the accompanying drawings and examples, the embodiment to the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
Fig. 1 is the modular structure schematic diagram of fishing website identifying system described in the embodiment of the present invention one, as shown in figure 1, institute The system of stating includes:Domain Name acquisition unit 100, domain name statistic unit 200 and web site recognizing unit 300.
Domain name acquiring unit 100, suitable for collecting the all-links occurred in website to be identified, obtains the link pair The domain name answered.It is described here to link the absolute address that corresponding domain name is the link, if occurred in the website to be identified Link using relative address, it is necessary to be converted into absolute address.
Domain name statistic unit 200, the number of times occurred suitable for statistics domain name in the website to be identified, finds The most domain name of occurrence number, is denoted as target domain name.Domain name statistic unit 200 can using domain name as key, using occurrence number as Value, generates a key-value form, then according to the numerical value of value in form, domain name is ranked up, occurred The most domain name of number of times.
The web site recognizing unit 300, suitable for being sentenced according to the target domain name and itself domain name of the website to be identified Whether the disconnected website to be identified is fishing website.
Fig. 2 is the modular structure schematic diagram of the web site recognizing unit, as shown in Fig. 2 the web site recognizing unit 300 enters One step includes:Comparing subunit 310 and identification subelement 320.
The comparing subunit 310, suitable for relatively more described target domain name and itself domain name, and shows in comparative result The target domain name simultaneously, judges that the website to be identified is not fishing website with the own domain famous prime minister.
The identification subelement 320, suitable for when the target domain name is different from itself domain name, calculating the target Ratio between the occurrence number of the occurrence number of domain name and itself domain name, and calculate the target domain name with it is described from Similarity between body domain name, and then judge whether the website to be identified is Fishing net according to the ratio and the similarity Stand.
Fig. 3 is the modular structure schematic diagram of the identification subelement, as shown in figure 3, the identification subelement 320 is further Including:Ratio computing module 321, similarity calculation module 322 and judge module 323.
The ratio computing module 321, suitable for calculating the occurrence number of the target domain name and going out for itself domain name Ratio between occurrence number.
The similarity calculation module 322, suitable for calculating the similarity between the target domain name and itself domain name.
Fig. 4 is the modular structure schematic diagram of the similarity calculation module, as shown in figure 4, the similarity calculation module 322 further comprise:Character string contrast submodule 322a, initial value calculating sub module 322b and final value calculating sub module 322c.
The character string contrasts submodule 322a, character string and itself domain name suitable for building the target domain name The contrast array of character string, is arranged on the first row of the contrast array by the character string of the target domain name and holding position is consolidated It is fixed, the character string of itself domain name is arranged on the second row of the contrast array and moved from left to right, to two line characters Overlapping character is contrasted in string.
The initial value calculating sub module 322b, suitable for the initial character when the target domain name and the tail word of itself domain name During symbol alignment, the first similarity value calculation Q between the target domain name and itself domain name is calculated1;When the aiming field When second character of name aligns with the trailing character of itself domain name, calculate between the target domain name and itself domain name Second similarity value calculation Q2;The like, when the trailing character of the target domain name aligns with the initial character of itself domain name When, calculate the m similarity value calculations Q between the target domain name and itself domain namem;Wherein, m=n1+n2- 1, n1Table Show the string length of the target domain name, n2Represent the string length of itself domain name.
Wherein, in the initial value calculating sub module 322b, the i-th similarity value calculation Q is calculated using equation belowi
Qi=Mi 2×Li
Wherein, i is natural number, also, 1≤i≤m;Also,
Mi=si/nmax
Li=ri/nmax
Wherein, riRepresent when ith is contrasted, the character string of the character string of itself domain name and the target domain name In, overlapping character number;nmaxRepresent the character string of itself domain name and longer character in the character string of the target domain name The character number of string;LiRepresent when ith is contrasted, the character string of itself domain name and the character string of the target domain name Duplication;siRepresent when ith is contrasted, it is overlapping in the character string of the character string of itself domain name and the target domain name And identical character number;MiRepresent when ith is contrasted, the word of the character string of itself domain name and the target domain name Accord with the matching rate of string.
For example, it is assumed that the entitled boc.cn of own domain is moved from left to right, the entitled cocc.cn holding positions of aiming field are consolidated It is fixed.When contrasting for the 1st time, only character n is overlapping with character c, correspondingly r1=1, s1=0;The 2nd time contrast when, character n with Character o is overlapping, and character c is overlapping with character c, correspondingly r2=2, s2=1.
In addition, in the initial value calculating sub module, the i-th similarity value calculation Q can also be calculated using following manneri
When ith is contrasted, the character string for calculating the target domain name is overlapping with the character string of itself domain name simultaneously And identical character number, it regard described overlapping and identical character number as the i-th similarity value calculation Qi
For the i-th similarity value calculation QiCalculation, can also use some known existing methods, due to its non- Invention emphasis, will not be repeated here.
The final value calculating sub module 322c, suitable for according to following formula calculate obtain the target domain name with it is described itself Similarity Q between domain namemax
Qmax=max { Q1, Q2, Q3... ... Qm}。
The judge module 323, suitable for judging whether the ratio and the similarity meet condition:The ratio is more than Predetermined ratio, and the similarity is more than predetermined threshold;If it is satisfied, judging that the website to be identified is fishing website;It is no Then, judge that the website to be identified is not fishing website.The predetermined ratio and the predetermined threshold can be according to actual uses Situation is configured and adjusted, the present embodiment, and the predetermined ratio is preferably 1.0, and the predetermined threshold is preferably 80%.
Fig. 5 is the modular structure schematic diagram of fishing website identifying system described in the embodiment of the present invention two, as shown in figure 5, this System described in embodiment and system described in embodiment one are essentially identical, and it the difference is that only, system described in the present embodiment is also Including:Website acquiring unit 000 and supplement recognition unit 400.
The website acquiring unit 000, suitable for searching newly-built website to be used as website to be identified.Generally, go fishing Website is mostly newly-built website, therefore, by setting the website acquiring unit 000, only using newly-built website as website to be identified, The identification range of fishing website can be reduced, the degree of accuracy and the speed of identification is improved.Lookup for newly-built website can be used Following method:Search-engine results page is monitored by particular keywords;Or, few by client discovery netizen visit capacity Website.
It is described supplement recognition unit 400, suitable for will determine that result be shown as fishing website website to be identified be denoted as it is suspicious Website, and supplement identification is carried out to described the suspected site, show that described the suspected site is still the feelings of fishing website in recognition result Under condition, described the suspected site is sent into fishing website storehouse.The supplement identification can be by the way of manual review.By setting The supplement recognition unit 400, can further improve the degree of accuracy of fishing website identification.
Fig. 6 is the flow chart of fishing website recognition methods described in the embodiment of the present invention three, as shown in fig. 6, methods described bag Include step:
A:The all-links occurred in website to be identified are collected, the corresponding domain name of the link is obtained.The link correspondence Domain name be the link absolute address.
B:The number of times that statistics domain name occurs in the website to be identified, finds the most domain name of occurrence number, remembers Make target domain name.
C:According to itself domain name of the target domain name and the website to be identified judge the website to be identified whether be Fishing website.
The step C further comprises step:
C1:Judge whether the target domain name is identical with itself domain name, if it is, judging the website to be identified not It is fishing website, terminates flow;Otherwise, step C2 is performed;
C2:The ratio between the occurrence number of the target domain name and the occurrence number of itself domain name is calculated, and Similarity between the target domain name and itself domain name, judges described to be identified according to the ratio and the similarity Whether website is fishing website.
The step C2 further comprises step:
C21:Calculate the ratio between the occurrence number of the target domain name and the occurrence number of itself domain name.
C22:Calculate the similarity between the target domain name and itself domain name.
The step C22 further comprises step:
C221:The contrast array of the character string of the target domain name and the character string of itself domain name is built, will be described The character string of target domain name is arranged on the first row of the contrast array and holding position is fixed, by the character of itself domain name String is arranged on the second row of the contrast array and moved from left to right, and character overlapping in two line character strings is contrasted.
C222:When the initial character of the target domain name aligns with the trailing character of itself domain name, the target is calculated The first similarity value calculation Q between domain name and itself domain name1;When the target domain name the second character with it is described itself During the trailing character alignment of domain name, the second similarity value calculation Q between the target domain name and itself domain name is calculated2;According to It is secondary to analogize, when the trailing character of the target domain name aligns with the initial character of itself domain name, calculate the target domain name with M similarity value calculations Q between itself domain namem;Wherein, m=n1+n2- 1, n1Represent the character string of the target domain name Length, n2Represent the string length of itself domain name.
In the step C222, the i-th similarity value calculation QiCalculation formula it is as follows:
Qi=Mi 2×Li
Wherein, i is natural number, also, 1≤i≤m;Also,
Mi=si/nmax
Li=ri/nmax
Wherein, riRepresent when ith is contrasted, the character string of the character string of itself domain name and the target domain name In, overlapping character number;nmaxRepresent the character string of itself domain name and longer character in the character string of the target domain name The character number of string;LiRepresent when ith is contrasted, the character string of itself domain name and the character string of the target domain name Duplication;siRepresent when ith is contrasted, it is overlapping in the character string of the character string of itself domain name and the target domain name And identical character number;MiRepresent when ith is contrasted, the word of the character string of itself domain name and the target domain name Accord with the matching rate of string.
In addition, in the step C222, the i-th similarity value calculation Q can also be calculated using following manneri
When ith is contrasted, the character string for calculating the target domain name is overlapping with the character string of itself domain name simultaneously And identical character number, it regard described overlapping and identical character number as the i-th similarity value calculation Qi
C223:The similarity Q obtained between the target domain name and itself domain name is calculated according to following formulamax
Qmax=max { Q1, Q2, Q3... ... Qm}。
C23:Judge whether to meet following condition:The ratio is more than predetermined ratio, and the similarity is more than predetermined Threshold value;If it is, judging that the website to be identified is fishing website;Otherwise, it is determined that the website to be identified is not fishing website.
Fig. 7 is the flow chart of fishing website recognition methods described in the embodiment of the present invention four, as shown in fig. 7, the present embodiment institute State method and the methods described of embodiment three is essentially identical, it the difference is that only:
Also include step A ' before the step A:Newly-built website is searched to be used as website to be identified.For newly-built website Lookup can adopt with the following method:Search-engine results page is monitored by particular keywords;Or, net is found by client The few website of people's visit capacity.
Also include step D after the step C:It will determine that and result be shown as the website to be identified of fishing website be denoted as can Website is doubted, and supplement identification is carried out to described the suspected site, it is still fishing website to show described the suspected site in recognition result In the case of, described the suspected site is sent into fishing website storehouse.The supplement identification can be by the way of manual review.
Fishing website identifying system and method described in the embodiment of the present invention, Fishing net is carried out based on the linking relationship in website The identification stood, can effectively recognize the fishing website of new type;Meanwhile, be conducive to the number of fishing website in abundant fishing website storehouse Amount and type, are easy to further fishing website to recognize and search, are with a wide range of applications in network safety filed.
Embodiment of above is merely to illustrate the present invention, and not limitation of the present invention, about the common of technical field Technical staff, without departing from the spirit and scope of the present invention, can also make a variety of changes and modification, therefore all Equivalent technical scheme falls within scope of the invention, and scope of patent protection of the invention should be defined by the claims.

Claims (18)

1. a kind of fishing website identifying system, it includes:Domain Name acquisition unit, domain name statistic unit and web site recognizing unit;
Domain name acquiring unit, suitable for collecting the all-links occurred in website to be identified, obtains the corresponding domain of the link Name;
Domain name statistic unit, the number of times occurred suitable for statistics domain name in the website to be identified, finds out occurrence The most domain name of number, is denoted as target domain name;
The web site recognizing unit, suitable for being treated described in itself domain name judgement according to the target domain name and the website to be identified Recognize whether website is fishing website.
2. the system as claimed in claim 1, it is characterised in that the web site recognizing unit includes:Comparing subunit and identification Subelement;
The comparing subunit, suitable for relatively more described target domain name and itself domain name, and shows the mesh in comparative result Mark domain name and the own domain famous prime minister simultaneously, judge that the website to be identified is not fishing website;
The identification subelement, suitable for when the target domain name is different from itself domain name, calculating the target domain name Ratio between the occurrence number of occurrence number and itself domain name, and calculate the target domain name and itself domain name Between similarity, and then judge whether the website to be identified is fishing website according to the ratio and the similarity.
3. system as claimed in claim 2, it is characterised in that the identification subelement includes:Ratio computing module, similarity Computing module and judge module;
The ratio computing module, suitable for calculate the occurrence number of the target domain name and itself domain name occurrence number it Between ratio;
The similarity calculation module, suitable for calculating the similarity between the target domain name and itself domain name;
The judge module, suitable for judging whether the ratio and the similarity meet condition:The ratio is more than predetermined ratio Example, and the similarity is more than predetermined threshold;If it is satisfied, judging that the website to be identified is fishing website;Otherwise, it is determined that The website to be identified is not fishing website.
4. system as claimed in claim 3, it is characterised in that the similarity calculation module includes:Character string contrasts submodule Block, initial value calculating sub module and final value calculating sub module;
The character string contrasts submodule, character string and the character string of itself domain name suitable for the structure target domain name Array is contrasted, the character string of the target domain name is arranged on the first row of the contrast array and holding position is fixed, by institute The character string for stating itself domain name is arranged on the second row of the contrast array and moved from left to right, to overlapping in two line character strings Character contrasted;
The initial value calculating sub module, aligns suitable for the initial character when the target domain name with the trailing character of itself domain name When, calculate the first similarity value calculation Q between the target domain name and itself domain name1;When the of the target domain name When two characters align with the trailing character of itself domain name, the second phase between the target domain name and itself domain name is calculated Like degree calculated value Q2;The like, when the trailing character of the target domain name aligns with the initial character of itself domain name, calculate M similarity value calculations Q between the target domain name and itself domain namem;Wherein, m=n1+n2- 1, n1Represent the mesh Mark the string length of domain name, n2Represent the string length of itself domain name;
The final value calculating sub module, is obtained between the target domain name and itself domain name suitable for being calculated according to following formula Similarity Qmax
Qmax=max { Q1, Q2, Q3... ... Qm}。
5. system as claimed in claim 4, it is characterised in that in the initial value calculating sub module, calculated using equation below I-th similarity value calculation Qi
Qi=Mi 2×Li
Wherein, i is natural number, also, 1≤i≤m;Also,
Mi=si/nmax
Li=ri/nmax
Wherein, riRepresent when ith is contrasted, it is overlapping in the character string of the character string of itself domain name and the target domain name Character number;nmaxRepresent the word of the character string of itself domain name and longer character string in the character string of the target domain name Accord with number;LiRepresent when ith is contrasted, the character string of itself domain name is overlapping with the character string of the target domain name Rate;siRepresent when ith contrast, in the character string of the character string of itself domain name and the target domain name, it is overlapping and Identical character number;MiRepresent when ith is contrasted, the character string of the character string of itself domain name and the target domain name Matching rate.
6. system as claimed in claim 4, it is characterised in that in the initial value calculating sub module, calculated using following manner I-th similarity value calculation Qi
When ith is contrasted, the character string and overlapping and phase in the character string of itself domain name of the target domain name are calculated Same character number, regard described overlapping and identical character number as the i-th similarity value calculation Qi
7. the system as claimed in claim 1, it is characterised in that the system also includes:Supplement recognition unit;
The supplement recognition unit, the suspected site is denoted as suitable for will determine that result is shown as the website to be identified of fishing website, and Supplement identification is carried out to described the suspected site, will in the case where recognition result shows that described the suspected site is still fishing website The suspected site feeding fishing website storehouse.
8. the system as claimed in claim 1, it is characterised in that the corresponding domain name of the link for the link utterly Location.
9. the system as claimed in claim 1, it is characterised in that the system also includes:Website acquiring unit;
The website acquiring unit, suitable for searching newly-built website to be used as website to be identified.
10. a kind of fishing website recognition methods, it includes step:
The all-links occurred in website to be identified are collected, the corresponding domain name of the link is obtained;
The number of times that statistics domain name occurs in the website to be identified, finds the most domain name of occurrence number, is denoted as target Domain name;
Judge whether the website to be identified is Fishing net according to itself domain name of the target domain name and the website to be identified Stand.
11. method as claimed in claim 10, it is characterised in that described according to the target domain name and the website to be identified Itself domain name judge whether the website to be identified is fishing website, further comprise step:
Judge whether the target domain name is identical with itself domain name, if it is, judging that the website to be identified is not fishing Website, terminates flow;Otherwise, next step is performed;
Calculate the ratio between the occurrence number of the target domain name and the occurrence number of itself domain name, and the target Whether similarity between domain name and itself domain name, the website to be identified is judged according to the ratio and the similarity It is fishing website.
12. method as claimed in claim 11, it is characterised in that the occurrence number of the calculating target domain name with it is described Similarity between ratio between the occurrence number of itself domain name, and the target domain name and itself domain name, according to The ratio and the similarity judge whether the website to be identified is fishing website, further comprise step:
Calculate the ratio between the occurrence number of the target domain name and the occurrence number of itself domain name;
Calculate the similarity between the target domain name and itself domain name;
Judge whether to meet following condition:The ratio is more than predetermined ratio, and the similarity is more than predetermined threshold;If It is to judge that the website to be identified is fishing website;Otherwise, it is determined that the website to be identified is not fishing website.
13. method as claimed in claim 12, it is characterised in that the calculating target domain name and itself domain name it Between similarity, further comprise step:
The contrast array of the character string of the target domain name and the character string of itself domain name is built, by the target domain name Character string is arranged on the first row of the contrast array and holding position is fixed, and the character string of itself domain name is arranged on into institute State the second row of contrast array and move from left to right, character overlapping in two line character strings is contrasted;
When the initial character of the target domain name aligns with the trailing character of itself domain name, calculate the target domain name with it is described The first similarity value calculation Q between itself domain name1;When the second character and the tail word of itself domain name of the target domain name During symbol alignment, the second similarity value calculation Q between the target domain name and itself domain name is calculated2;The like, work as institute When the trailing character for stating target domain name aligns with the initial character of itself domain name, the target domain name and itself domain name are calculated Between m similarity value calculations Qm;Wherein, m=n1+n2- 1, n1Represent the string length of the target domain name, n2Represent institute State the string length of itself domain name;
The similarity Q obtained between the target domain name and itself domain name is calculated according to following formulamax
Qmax=max { Q1, Q2, Q3... ... Qm}。
14. method as claimed in claim 13, it is characterised in that the initial character when the target domain name with it is described itself During the trailing character alignment of domain name, the first similarity value calculation Q between the target domain name and itself domain name is calculated1;When When second character of the target domain name aligns with the trailing character of itself domain name, calculate the target domain name with it is described itself The second similarity value calculation Q between domain name2;The like, when trailing character and itself domain name of the target domain name When initial character aligns, the m similarity value calculations Q between the target domain name and itself domain name is calculatedmIn, i-th is similar Spend calculated value QiCalculation formula it is as follows:
Qi=Mi 2×Li
Wherein, i is natural number, also, 1≤i≤m;Also,
Mi=si/nmax
Li=ri/nmax
Wherein, riRepresent when ith is contrasted, it is overlapping in the character string of the character string of itself domain name and the target domain name Character number;nmaxRepresent the word of the character string of itself domain name and longer character string in the character string of the target domain name Accord with number;LiRepresent when ith is contrasted, the character string of itself domain name is overlapping with the character string of the target domain name Rate;siRepresent when ith contrast, in the character string of the character string of itself domain name and the target domain name, it is overlapping and Identical character number;MiRepresent when ith is contrasted, the character string of the character string of itself domain name and the target domain name Matching rate.
15. method as claimed in claim 13, it is characterised in that the initial character when the target domain name with it is described itself During the trailing character alignment of domain name, the first similarity value calculation Q between the target domain name and itself domain name is calculated1;When When second character of the target domain name aligns with the trailing character of itself domain name, calculate the target domain name with it is described itself The second similarity value calculation Q between domain name2;The like, when trailing character and itself domain name of the target domain name When initial character aligns, the m similarity value calculations Q between the target domain name and itself domain name is calculatedmIn, using as follows Mode calculates the i-th similarity value calculation Qi
When ith is contrasted, the character string and overlapping and phase in the character string of itself domain name of the target domain name are calculated Same character number, regard described overlapping and identical character number as the i-th similarity value calculation Qi
16. method as claimed in claim 10, it is characterised in that described according to the target domain name and the net to be identified Itself domain name stood judges whether the website to be identified is also to include step after fishing website:It will determine that result is shown as fishing The website to be identified at fishnet station is denoted as the suspected site, and carries out supplement identification to described the suspected site, and institute is shown in recognition result State in the case that the suspected site is still fishing website, described the suspected site is sent into fishing website storehouse.
17. method as claimed in claim 10, it is characterised in that the corresponding domain name of the link for the link utterly Location.
18. method as claimed in claim 10, it is characterised in that in all chains collected and occurred in website to be identified Connect, obtain also including step before the corresponding domain name of the link:Newly-built website is searched to be used as website to be identified.
CN201510051628.9A 2012-06-28 2012-06-28 A kind of fishing website identifying system and method Active CN104580254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510051628.9A CN104580254B (en) 2012-06-28 2012-06-28 A kind of fishing website identifying system and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210224485.3A CN102801709B (en) 2012-06-28 2012-06-28 Phishing website identification system and method
CN201510051628.9A CN104580254B (en) 2012-06-28 2012-06-28 A kind of fishing website identifying system and method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201210224485.3A Division CN102801709B (en) 2012-06-28 2012-06-28 Phishing website identification system and method

Publications (2)

Publication Number Publication Date
CN104580254A CN104580254A (en) 2015-04-29
CN104580254B true CN104580254B (en) 2017-10-31

Family

ID=53095434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510051628.9A Active CN104580254B (en) 2012-06-28 2012-06-28 A kind of fishing website identifying system and method

Country Status (1)

Country Link
CN (1) CN104580254B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106330861B (en) * 2016-08-09 2020-03-03 中国信息安全测评中心 Website detection method and device
CN106302440B (en) * 2016-08-11 2019-12-10 国家计算机网络与信息安全管理中心 Method for acquiring suspicious phishing websites through multiple channels
CN109391584A (en) * 2017-08-03 2019-02-26 武汉安天信息技术有限责任公司 A kind of recognition methods of doubtful malicious websites and device
CN108173814B (en) * 2017-12-08 2021-02-05 深信服科技股份有限公司 Phishing website detection method, terminal device and storage medium
CN108337259A (en) * 2018-02-01 2018-07-27 南京邮电大学 A kind of suspicious web page identification method based on HTTP request Host information
CN108846672B (en) * 2018-06-25 2021-11-23 北京奇虎科技有限公司 Personalized address generation method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145902A (en) * 2007-08-17 2008-03-19 东南大学 Fishing webpage detection method based on image processing
US7630987B1 (en) * 2004-11-24 2009-12-08 Bank Of America Corporation System and method for detecting phishers by analyzing website referrals
CN101667979A (en) * 2009-10-12 2010-03-10 哈尔滨工程大学 System and method for anti-phishing emails based on link domain name and user feedback
US7958555B1 (en) * 2007-09-28 2011-06-07 Trend Micro Incorporated Protecting computer users from online frauds
CN102098235A (en) * 2011-01-18 2011-06-15 南京邮电大学 Fishing mail inspection method based on text characteristic analysis
CN102223316A (en) * 2011-06-15 2011-10-19 成都市华为赛门铁克科技有限公司 Method and device for processing electronic mail
CN102801709B (en) * 2012-06-28 2015-03-04 北京奇虎科技有限公司 Phishing website identification system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8578481B2 (en) * 2006-10-16 2013-11-05 Red Hat, Inc. Method and system for determining a probability of entry of a counterfeit domain in a browser

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7630987B1 (en) * 2004-11-24 2009-12-08 Bank Of America Corporation System and method for detecting phishers by analyzing website referrals
CN101145902A (en) * 2007-08-17 2008-03-19 东南大学 Fishing webpage detection method based on image processing
US7958555B1 (en) * 2007-09-28 2011-06-07 Trend Micro Incorporated Protecting computer users from online frauds
CN101667979A (en) * 2009-10-12 2010-03-10 哈尔滨工程大学 System and method for anti-phishing emails based on link domain name and user feedback
CN102098235A (en) * 2011-01-18 2011-06-15 南京邮电大学 Fishing mail inspection method based on text characteristic analysis
CN102223316A (en) * 2011-06-15 2011-10-19 成都市华为赛门铁克科技有限公司 Method and device for processing electronic mail
CN102801709B (en) * 2012-06-28 2015-03-04 北京奇虎科技有限公司 Phishing website identification system and method

Also Published As

Publication number Publication date
CN104580254A (en) 2015-04-29

Similar Documents

Publication Publication Date Title
CN102801709B (en) Phishing website identification system and method
CN104580254B (en) A kind of fishing website identifying system and method
CN107067020B (en) Image identification method and device
CN105488024B (en) The abstracting method and device of Web page subject sentence
CN102779249B (en) Malware detection methods and scanning engine
CN107992469A (en) A kind of fishing URL detection methods and system based on word sequence
CN107967208A (en) A kind of Python resource sensitive defect code detection methods based on deep neural network
WO2016201938A1 (en) Multi-stage phishing website detection method and system
CN103678528B (en) Electronic homework plagiarism preventing system and method based on paragraph plagiarism detection
CN109117634A (en) Malware detection method and system based on network flow multi-view integration
CN106302438A (en) A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means
CN102222187A (en) Domain name structural feature-based hang horse web page detection method
CN104020845B (en) Acceleration transducer placement-unrelated movement recognition method based on shapelet characteristic
CN104778164B (en) Detection repeats URL method and device
CN107122411A (en) A kind of collaborative filtering recommending method based on discrete multi views Hash
CN106649273A (en) Text processing method and text processing device
CN104268289B (en) The abatement detecting method and device of link URL
CN107958154A (en) A kind of malware detection device and method
CN106121622A (en) A kind of Multiple faults diagnosis approach of Dlagnosis of Sucker Rod Pumping Well based on indicator card
CN105654144A (en) Social network body constructing method based on machine learning
CN107145779A (en) A kind of recognition methods of offline Malware daily record and device
CN107612911A (en) Method based on the infected main frame of DNS flow detections and C&C servers
CN107967332A (en) Enterprise's address recognition methods and identifying system
CN105119876A (en) automatically-generated domain name
CN106330861A (en) Website detection method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220727

Address after: 300450 No. 9-3-401, No. 39, Gaoxin 6th Road, Binhai Science Park, Binhai New Area, Tianjin

Patentee after: 3600 Technology Group Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230713

Address after: 1765, floor 17, floor 15, building 3, No. 10 Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: Beijing Hongxiang Technical Service Co.,Ltd.

Address before: 300450 No. 9-3-401, No. 39, Gaoxin 6th Road, Binhai Science Park, Binhai New Area, Tianjin

Patentee before: 3600 Technology Group Co.,Ltd.

TR01 Transfer of patent right