CN104202291A

CN104202291A - Anti-phishing method based on multi-factor comprehensive assessment method

Info

Publication number: CN104202291A
Application number: CN201410177968.1A
Authority: CN
Inventors: 胡建伟; 崔艳鹏; 李英; 胥红艳; 李蕊; 许乐
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2014-07-11
Filing date: 2014-07-11
Publication date: 2014-12-10

Abstract

The invention relates to an anti-phishing method based on a multi-factor comprehensive assessment method. The method comprises the following steps: step a, establishing a blacklist and whitelist library of URL (uniform resource locator), and processing a target URL, judging whether the processed URL is in the blacklist/whitelist, if so, executing the step d, directly feeding back a result to the user; otherwise, executing the step b, subsequently detecting the website; step b, detecting four aspects of the website: URL angle recognition, website behavior and detail feature recognition, server angle recognition and crawler angle recognition; step c, summarizing and affirming the feedback result; and step d, displaying a result. The method provided by the invention can be used for assessing in many ways with strict procedure; the consideration is comprehensive and the accuracy is high; the hit suspects and corresponding weight value, searched webpage link, website file and judgment criterion are displayed in a graphical interface in a simple and clear mode , the result is available for related professional for examining while being fed back to the user.

Description

Anti-phishing method based on multifactor Comprehensive Assessment method

Technical field

The present invention relates to the assessment method that a kind of guarding phishing is attacked, relate in particular to a kind of anti-phishing method based on multifactor Comprehensive Assessment method.

Background technology

At present, the Internet fraud occurs again and again, is threatening user's personal secrets.According to statistics, only the first half of the year in 2010, the direct and indirect economic loss that phishing brings to the common people and society is over 12,000,000,000 yuan.It is extremely urgent how guarding phishing is attacked (Phishing attack).Fail-safe software is single based on URL knowledge method for distinguishing to fishing website at present, does not relate to fishing website essence; Black and white lists identification has hysteresis quality, and fishing website frequently changes URL, and the method is a kind of passive anti-phishing that sacrificial section user benefit is prerequisite of take; Very low based on page feature recognition efficiency and speed, easily by fisherman, pretended to break through and detect, in addition, this type of solution all faces a common problem, conventionally when keeping high discrimination, can follow a higher rate of false alarm.Thereby existing classical inverse fishing method in the face of threat with rapid changepl. never-ending changes and improvements obviously unable to do what one wishes.

Through summary and the analysis to a large amount of fishing websites to existing anti-phishing means, the present invention has made up the deficiency of current fail-safe software.The present invention comforms and multi-direction fishing website is analyzed, and by applied statistics algorithm, thresholding algorithm, linear weighted function, verification and (checksum) means such as algorithm, makes this present invention have very high discrimination and reduce the rate of misrepresenting deliberately.

In view of above-mentioned defect, creator of the present invention has obtained this creation finally through long research and practice.

Summary of the invention

The object of the present invention is to provide a kind of anti-phishing method based on multifactor Comprehensive Assessment method in order to overcome above-mentioned technological deficiency.

For achieving the above object, the invention provides a kind of anti-phishing method based on multifactor Comprehensive Assessment method, it comprises the following steps:

Step a, sets up the black and white lists storehouse of URL, and target URL is processed, and whether URL after treatment of judgement in black/white list, if in list storehouse, performs step d, and directly feedback result is to user; If not in list storehouse, perform step b, carry out the detection to website below;

Step b, detects website;

Described detection comprises the detection of four aspects, URL angle recognition, website behavior and minutia identification, server side identification and reptile angle recognition; First carry out described URL angle recognition; The behavior of described website and minutia identification, can be written to total weight value in file to set form after described server side identification and described reptile angle recognition complete with three thread execution and detection respectively, to facilitate result to sum up feedback;

Step c, sums up equal rights feedback result;

If total weight value is added up, surpass the threshold value of setting, to user, send the danger warning of fishing website, if be less than threshold value, give the testing result of user feedback safety;

Steps d, shows result.

Preferably, described URL angle recognition method step is:

Step b11, carries out format specification to importing the URL of parameter into;

Step b12, if domain name progression surpasses setting, adds corresponding value in the relevant position of recording weights array;

Step b13, if the URL after described standard is IP form, adds corresponding value in the relevant position of recording weights array;

Step b14, if comprise spcial character, illustrates that network address pretends with spcial character, in the relevant position of recording weights array, adds corresponding value;

Step b15, if the number of path number of plies is too much, adds corresponding value in the relevant position of recording weights array.

Preferably, the process of the behavior of described website and minutia identification is,

Step b21, imports network address to be detected into, processes URL and extracts domain name and path, carries out DNS inquiry, connects with target;

Step b22, sends the GET request of HTTP according to the path extracting, obtain page source code and this source code is analyzed;

Step b23, analyzes the request of receiving.

Preferably, the described step that the request of receiving is analyzed is,

Step b231, checks in message header whether be provided with Cookie, if do not give corresponding weight value to global variable;

Step b232, adds up the content of script in response, by its length, divided by total page length, obtains script proportion, compares with lower threshold, if be greater than threshold value, in the relevant position of recording weights array, adds corresponding value;

Step b233, detects whether standard of HTML code, comprises that the attribute size judging in label is write whether to meet standard, and whether the target of action has drawn together with double quotation marks; Often meet a suspicious feature, corresponding weights institute multiplying factor adds 1;

Step b234, checks that whether the target of action attribute in <form> label is identical with this domain name, if weighting of difference;

Step b235,, analyzes GET response extraction parameter and also sends list in this domain name lower time in action target, and its response is analyzed, if there is Location in message header, detects this address whether under this domain name, if not weighting;

Step b236, to arrange formal output in weight feedback file, calls result while being convenient to gather weights.

Preferably, the process of described server side identification is:

Step b31, processes the URL importing into, extracts Main Domain, carries out DNS inquiry, if a not only IP under it, not weighting; If only have an IP under it, add corresponding value in the relevant position of recording weights array;

Step b32, inquires about IP address, if target is in the fishing website more country that distributes, in the relevant position of recording weights array, adds corresponding value;

Step b33, inquires about the domain name after standardization, extracts the difference of website expiration time and hour of log-on, if the designated value of being less than adds corresponding value in the relevant position of recording weights array from the response obtaining; Otherwise value corresponding in weights array need to do not recorded;

Step b34, by result according to formatted output in weight feedback file.

Preferably, described weights array, while being initialized as 0 rear statistics is added value in array, finally according to formatted output in file weight feedback file, while being convenient to gather weights, call;

The computing formula of described total weight value is: G=∑ s _iw _i; If draw, numerical value G is greater than upper limit threshold, to user, warns this website dangerous; If the numerical value G drawing is less than lower threshold, to user, return to the prompting of web portal security; If the numerical value G drawing, between bound, returns to corresponding suspicious degree to user, the prompting user access of being careful, and advise that user understands the method for anti-phishing attack.

Preferably, described reptile angle recognition comprises that a page outdegree number of links detects;

The method step that described page outdegree number of links detects is, after importing URL into, first with reptile, crawl webpage to be measured, obtain the ground floor out-degree under identical female domain name, the result searching is returned in graphic interface to facilitate user to check, and recorded its number of links;

Choose the second layer and link while testing, the method that employing extraction immediately from ground floor link is less than or equal to 5 out-degree realizes; While crawling second layer link, select the path under the female domain name of former webpage to search, record their out-degree sum; If number is greater than setting max-thresholds, think that this website is unsuspicious, jump out execution;

Finally, result is exported in weight feedback file.

Preferably, described reptile angle recognition also comprises the method for a web page files number and species detection, steps of the method are:

After importing URL into, first with reptile, crawl webpage to be measured, obtain the ground floor out-degree under identical female domain name, the result searching is returned in graphic interface to facilitate user to check, and recorded its number of links;

Crawl second layer when link, select the path under the female domain name of former webpage to search, check successively the file under webpage, judge that whether it is with html, htm, shtlm, asp, one of five types of php ending, if so, records the filename of this ground floor link; See that again its URL whether under former female domain name, if it is crawls out second layer file to this link, search the file of respective type and record its number, if number is greater than setting max-thresholds, think that this website is unsuspicious, jump out execution;

Finally, result is exported in weight feedback file.

Preferably, in the method for described web page files number and species detection, the method that described weights evaluation has taked by stages to judge; First by statistical method, divide number interval, according to giving corresponding suspicious degree S (S ∈ [0,1]) between result location, be then multiplied by whole COEFFICIENT K of dividing; Suspicious degree total weight value: N=SK; Finally total weight value is outputed in weight feedback file with true-to-shape.

Preferably, the upper threshold value that described total weight value is reported to the police is decided to be 70, and lower threshold value is 30.

Beneficial effect of the present invention is compared with the prior art: through summary and the analysis to a large amount of fishing websites to existing anti-phishing means, the present invention has broken through the single shortcoming of classical inverse fishing method detection angles, and in conjunction with existing detection means, from many-sides such as URL, website behavior and minutia, server, reptile obtaining informations, evaluate, there is again multinomial deliberated index each aspect, and process is rigorous; Utilize statistic algorithm and thresholding algorithm, to every suspicious points, give corresponding weight value, last comprehensive grading, considers that accuracy is high comprehensively; The suspicious points of hitting and corresponding weight value, the web page interlinkage searching, site file, basis for estimation are presented to graphical interfaces, simple and clear, when feeding back to user, also can check for relevant speciality personnel.

Accompanying drawing explanation

Fig. 1 is the functional block diagram of distribution wire broadband power carrier communication system of the present invention.

Embodiment

Below in conjunction with accompanying drawing, to the present invention is above-mentioned, be described in more detail with other technical characterictic and advantage.

Refer to shown in Fig. 1, it is the flow chart that the present invention is based on the anti-phishing method of multifactor Comprehensive Assessment method, wherein:

Step a, sets up the black and white lists storehouse of URL, and target URL is processed, and whether URL after treatment of judgement in black/white list, if in list storehouse, performs step d, and directly feedback result is to user; If not in list storehouse, perform step b, carry out the detection to website below.

Step b, in testing process, what first carry out is URL angle recognition, because this part execution speed is than very fast, there is no need to waste expense again and establishes specially a thread; The detection that three threads are left three aspects: afterwards.After completing, the detection of these three parts total weight value can be written in file temp_result.dat with the form of appointing, to facilitate result to sum up feedback.

Step c, sums up equal rights feedback result.If total weight value is added up, surpass the threshold value of agreement, to user, send the danger warning of fishing website, if be less than threshold value, give the testing result of user feedback safety.

An array that represents result used in the record of weights, is first initialized as 0, the value in array is added during statistics afterwards, finally according to formatted output in file temp_result, while being convenient to gather weights, call.For URL, website behavior and minutia, server side, first all suspicious points weights are set to 1, then upper 500 foreign fishing websites and 500 the domestic fishing websites of announcing of PhishTank are carried out to the number of times that each point of test statistics hits, then give weights to each suspicious points according to result.

In to the processing of above-mentioned testing result, linear weighted function method, thresholding algorithm and statistic algorithm etc. have mainly been applied.Statistic algorithm is in given scope, to obtain to meet the number that records imposing a condition, and with a conditional statement, judges whether current record meets specified criteria, meets to add up number and add one.In first three part, we adopt the linear weighted function method in multifactor comprehensive grading method to give a mark to above recognition result.With two vectors, realize, be respectively vectorial S<s ₁, s ₂... s _i... .> and vectorial W<w ₁, w ₂... w _i... >.In vectorial S, if suspicious points is above suspicious, will respond assignment is 1, otherwise assignment is 0; In vectorial W, w _ifor corresponding s _iweights, w _imethod by above-mentioned statistic algorithm, drawn.

The computing formula of described total weight value is: G=∑ s _iw _i.Set, if draw, numerical value G is greater than upper limit threshold, to user, warns this website dangerous; If the numerical value G drawing is less than lower threshold, to user, return to the prompting of web portal security; If the numerical value G drawing, between bound, returns to corresponding suspicious degree to user, the prompting user access of being careful, and advise that user understands the method for anti-phishing attack.Wherein, concrete threshold value is also drawn by statistic algorithm.The upper threshold value that regulation total weight value is reported to the police is decided to be: 70, and lower threshold value is 30.

Steps d, shows result.

In running, the response of form list, the GET that can synchronously return to targeted website ask response, geographical position inquiry, and website out-degree link etc., site file and suspicious characteristic point and corresponding weight value thereof, can understand operation principle for relevant speciality personnel; After program end of run, can, according to the difference of total weight value, eject different prompting windows to user.

In described step b, comprise the inspection of four aspects, URL angle recognition b1, website behavior and minutia identification b2, server side identification b3 and reptile angle recognition b4.Below respectively these four kinds of inspections are described.

Described URL angle recognition: in angle recognition of the present invention, comprise black and white lists identification, network address formal check, is used spcial character to carry out camouflage inspection, and domain name progression checks, the inspection of path progression.

URL identification is one of method the most extensively adopting at present, has recognition speed fast, and the advantages such as black and white lists 100% discrimination, comprise based on URL blacklist technology and the URL detection technique based on machine learning etc.In the present invention, in black and white lists, directly point out user, further improve accuracy rate and the speed of detection.

Described network address formal check is for judging that whether network address form is suspicious.Fisherman often represents the universe name of fishing website URL with IP, so effective hidden server identity, this kind of URL can not forbid by closing the form of domain name simultaneously, and this kind of situation less appearance in the situation that of normal website, therefore can be used as judging the sign of URL dubiety.

Described use spcial character carries out camouflage inspection, in order to check fishing website except hide other forms of expression its domain name with IP address, conventionally by this mode of hexadecimal, encrypts or in URL, adds spcial character to disguise and forge URL.URL is used@to carry out camouflage inspection, and in URL, some character has specific function, and some character has specific function according to position.If character can not show according to literal meaning, will send to WEB server with escape form.In URL the real network address that plays analytic function from sign below, Here it is Deception Principle.

Whether domain name value of series checks, regular in order to judge domain name progression.In a normal URL, domain name can be reacted web site contents simply, and fisherman for the website that allows user believe that they access be regular website, its domain name can be arranged on the one hand be similar to regular website, also can after the domain name of its use, supplement the domain name of what regular website on the other hand.

Described path progression checks, in order to check the path progression of URL.A normal URL is comprised of domain name, access path and access parameter.Fisherman not only can make an effort in domain name, and access path below also tends to add that the contents such as abbreviation of counterfeit website carry out user cheating, and that this often shows as path progression is very many.

Described URL angle recognition procedure:

Step b11, becomes with http importing the URL standard of parameter into: the form of // beginning

Step b12, the number of ". ", English alphabet and "/" in statistics character string, if ". " outnumber appointed threshold, illustrate that domain name progression surpasses setting, adds corresponding value in the relevant position of recording weights array;

Step b13, is 3 (as 192.168.0.1) if there is no the number of English alphabet and ". ", and explanation is IP form, weighted value;

Step b14, if comprise spcial character, as "@" character and used too much hexadecimal code (as: %XX, X representative digit), illustrates that network address pretends with spcial character, weighting;

Step b15, if the number of "/" is too much, illustrates that the number of path number of plies is too much, weighting.

An array that represents result used in the record of weights, is first initialized as 0, the value in array is added during statistics afterwards, finally according to formatted output in file temp_result, while being convenient to gather weights, call.

The behavior of described website and minutia identification comprise that list Action checks, response analysis after submission form, and HTML standard degree checks, Cookie is set and checks, script ratio checks.

In fishing website, input is arbitrarily inputted after user name and password, and fishing website cannot learn whether user has inputted real user name and password, but make almost, similarly responds to user.In fishing website, more than 90% be all after obtaining user name and password, user is redirected to regular website and hides oneself; Also having some is " progressive formulas ", arbitrarily inputs also can obtain logging in successful response after data and carry out follow-up swindle content.Why fishing website has such behavioural characteristic, is because it does not have inquire about the database of examining, only for recording user name and password, and this itself and regular website are submitted aspect behavior this qualitative difference to processing user just.

Described list Action inspection, described list is for collecting dissimilar user's input, and when user click ACK button, the content of list can be sent to another file.The action attributes of list (action) has defined the filename (" html_form_action.asp ") of object file.The processing that this file being defined by action attributes can be correlated with to the input data that receive conventionally.The associated registration content of fishing website by by form submission of sheet to regular website, and regular website often can submission form to other domain name, so can be used as a suspicious feature.

Response analysis after described submission form, after submission form, the operation that carry out regular website is that user name and password are compared to inquiring about in database, and fishing website is taked some behaviors of fixing often, as user being redirected to regular website, strengthen the disguise of oneself, allow user be difficult to discover.Suspicious feature is that user is redirected under another domain name that does not belong to former domain name.

Described HTML standard degree checks, in order to check that the HTML code of website has standard.A legal regular website should be observed new standard as much as possible, and writing of fishing website code is often more random, and its standardization degree is compared low with regular website.Therefore,, if find that the HTML code of a website has nonstandard place, that will increase its suspicious degree.

The described Cookie that arranges checks, Cookie refer to some website in order to distinguish user identity, carry out session tracking and be stored in the data (conventionally through encrypting) in subscriber's local terminal.Yet, the function that fishing website does not need above-mentioned Cookie to provide conventionally.It builds is in order to extract user's the contents such as accounts information, does not even wish that user accesses them again, can increase like this risk that they are found report.

Described script ratio checks, by statistics, sets a threshold value, if script length accounts for the ratio of total page, has surpassed this threshold value, thinks that it is suspicious.

Website behavior and minutia identifying are

Step b21, imports network address to be detected into, then processes URL and extracts domain name and path, carries out DNS inquiry, connects with target;

Step b22, sends the GET request of HTTP according to the path extracting, obtain page source code and analyze.Wherein GET request is imitated IE browse request and is built

Step b23, carries out following a few step analysis to the request of receiving:

(1) check in message header whether be provided with Cookie, if give corresponding weight value to global variable Weight_Sum.

(2) content between all " <script> " and " </script> " in response is added up, by its length divided by total page length, obtain script proportion, compare with lower threshold, if be greater than threshold value, Weight_Sum adds corresponding weights.This team adds up in a large number to regular website and the script script ratio with the fishing website of long script feature, determines that lower threshold is 0.60;

(3) detect whether standard of HTML code: if find response in the situation that finding " <****> " label, close label " </****> ", judge the attribute size in label writes whether meet standard, whether the target of action uses double quotation marks " " to draw together, etc.Often meet a suspicious feature, corresponding weights institute multiplying factor adds 1.

(4) check that whether the target of action attribute in <form> label is identical with this domain name, if weighting of difference;

(5) in action target in this domain name lower time, analyze GET response extraction parameter and also send list, its response is analyzed, if there is Location in message header, detect this address whether under this domain name, if not weighting.

(6) by result with agreement formal output in temp_result.dat file, while being convenient to gather weights, call.

Server side identification comprises corresponding IP number inspection under domain name, and IP address geo location checks and Whois information check.

The visit capacity of a regular website and the visit capacity of fishing website have very big difference, and therefore the server of regular website and the server side surface technology of fishing website are perhaps differentiated.According to statistics, more than 90% fishing website is distributed in and escapes domestic law sanction overseas.In addition, if you access be domestic bank, but that domain name has but been resolved to overseas, this is also very suspicious.So we also can infer from the geographical position of IP whether it is fishing website.Some researchers also represent, fishing website has short feature life cycle, and this can be reflected in the whois information of website domain name.

Under domain name, corresponding IP number checks, large-scale website is conducted interviews, and sometimes can be mapped to different IP and get on, and this is because the higher domain name of these visit capacities has been used load-balancing technique.DNS load-balancing technique is for same host name, to configure a plurality of IP address in dns server, when replying DNS inquiry, dns server will return to different analysis results with the IP address of host record in DNS file in order to each inquiry, the access of client is directed to different machines gets on, make the different server of different client-access, thereby reach the object of load balancing.Yet often visit capacity is very limited for a simple and crude fishing website, producer can not spend into this technology of original employing, therefore can be used as a kind of feature of judgement fishing website.

Whether described IP address geo location checks, abnormal in order to judge IP address.For domestic user, we can detect its IP geographical position that will access, see whether it is at home and whether in above-mentioned the most suspicious several regions, judges that whether it is suspicious.

Described Whois information check, according to statistics, the mean survival time of fishing website is less than one day, and its domain name of using is often less expensive, and the service time of domain name is not long.And crucial its qualification of regular website is older, early, the difference of deadline and hour of log-on can be larger for hour of log-on.According to the test of this team, this difference of most regular website is greater than 3 years, and fishing website major part is less than 3 years.Therefore, can be accordingly as a suspicious points, whether detect website is fishing website.

The process of described server side identification is:

Step b31, the URL importing into is processed, extract Main Domain, carry out DNS inquiry, use the gethostbyname function in winsock, h_addr_list chained list length in the structure hostent returning is added up, if be greater than 1 explanation not only IP under it, not weighting, otherwise add corresponding weights.

Step b32, is submitted to http://www.ip138.com/ to the IP returning and inquires about, according to the statistics of anti-phishing alliance, and for domestic user, if target is in the fishing website more country that distributes, as: the U.S., corresponding weighting.

Step b33, is submitted to http://whois.chinaz.com/ by the domain name after standardization and inquires about, and extracts the difference of website expiration time and hour of log-on from the response obtaining, if the designated value of being less than, weighting, otherwise not weighting.Here according to previous statistics, designated value is 3 years temporarily.

Step b34, still installs result agreement formatted output in temp_result.dat.

Described reptile angle recognition comprises that page outdegree number of links detects and web page files number and species detection.

Web crawlers (Spider) is found webpage by chained address, from the some pages in website (normally homepage), read the content of webpage, find other chained address in webpage, then by them, find next webpage, circulation is always gone down, until all webpages in this website have all been captured.By the method with reptile, can analyze structure, scale and the importance of website.

Described page outdegree number of links detects, and the essence of the Internet is that some Web that formed by hyperlink scheme.When processing with web crawlers, the Web figure that above-mentioned hyperlink need to be formed puts into internal memory.To the webpage in Web figure, the link of its sensing is called " out-degree " of webpage.The departures link of website is called " link of ground floor out-degree " of this website, is called for short " ground floor out-degree "; The departures link of corresponding webpage ground floor out-degree is called " link of second layer out-degree " of original web, is called for short " second layer out-degree "; Web page importance is comprehensively drawn by the two-layer out-degree link of webpage, to improve correlation and the quality of Search Results.The judgment principle that described page outdegree number of links detects: regular website, webpage ground floor out-degree number of links is a lot, and it is also larger that the second layer of webpage goes out the number of degrees.And fishing website is managed by individual or Small Groups, low with other website correlation degrees, be difficult to form compared with large network structure, two-layer out-degree link number is all less.

The method step that described page outdegree number of links detects is, after importing URL into, first with reptile, crawl webpage to be measured, obtain the ground floor out-degree under identical female domain name, the result searching is returned in graphic interface to facilitate user to check, and recorded its number of links.

Because website ground floor may have a lot of links, consider time and efficiency, can not crawl successively each link.In order to address this problem, to choose the second layer and link while testing, program adopts the method that extracts immediately 5 out-degree from ground floor link to obtain realization, if discontented 5 links are all chosen and tested.While crawling second layer link, select the path under the female domain name of former webpage to search, record their out-degree sum.By prevention the too much problem of detection website out-degree number, set max-thresholds, if number is greater than it, think that this website is unsuspicious, jump out.Through experiment statistics, the two-layer out-degree total number of fishing website generally can not surpass 500, so threshold value is fixed tentatively, is 500.

Finally, by result with agreement formal file in so that subsequent calls is commented power assessment.

Described web page files number and species detection, obtain web page files with reptile, can obtain the number of files of corresponding kind.Web page files, mainly contains following several: static Web page text html, htm, dynamic page file shtml, server script file asp, php etc.They can show the Nomenclature Composition and Structure of Complexes layout of corresponding website, and its number and kind are more, and website level is darker, and server front end and rear library file relevance are stronger, and website property in neat formation is higher, and website is also more important.

Regular website is because website making is meticulous, and file hierarchy is clearly demarcated, and framework is complete, therefore web page files number is more and kind (dividing with function) is comparatively complete.And the general page of fishing website imitates regular website, overall structure is loose, lower with the correlation degree of other websites, therefore web page text and server script kind and number are all seldom, only there are keeper's daily record of the log-on message of recording and php, the asp etc. of support website in general backstage, and the testing result by reptile can directly contrast out.

But some fishing website keeper can copy the design feature of regular website, at server end, add file, so only by detecting this page, will find that its number of files and kind are all a lot, structure is also reasonable, thereby reaches the object of mixing the spurious with the genuine.For this situation, with reptile, obtain the web page files of second layer out-degree link: only search the path under female domain name, then conduct interviews respectively, record kind and the number of corresponding document.Because the out-degree that the second layer of fishing website is created link importance is low, website frame construction is simple, and web page files is less.Therefore comprehensively analyze the two-layer result that crawls, can draw the suspicious degree in website.

The method step of described web page files number and species detection:

First with reptile, crawl webpage to be measured, check successively the file under webpage, judge that whether it is with html, htm, shtlm, asp, one of five types of php ending, if so, records the filename of this ground floor link; See that again its URL whether under former female domain name, if it is crawls out second layer file to this link, search the file of respective type and record its number, and out-degree is returned successively and in graphic interface frame, formed arborescence and facilitate user to check.Consideration time and efficiency, it is identical that number choosing method and link portions are above divided, and is that two-layer links total number upper limit threshold changes 200 into.

What obtain due to this part is number, so the method that weights evaluation has taked by stages to judge.First by statistical method, divide number interval, according to giving corresponding suspicious degree S (S ∈ [0,1]) between result location, be then multiplied by whole COEFFICIENT K of dividing.Suspicious degree total weight value: N=SK.Finally by total weight value with agreement formal output in file, be convenient to call below feedback.

The foregoing is only preferred embodiment of the present invention, is only illustrative for invention, and nonrestrictive.Those skilled in the art is understood, and in the spirit and scope that limit, can carry out many changes to it in invention claim, revise, and even equivalence, but all will fall within the scope of protection of the present invention.

Claims

1. the anti-phishing method based on multifactor Comprehensive Assessment method, is characterized in that, it comprises the following steps:

Step b, detects website;

Step c, sums up equal rights feedback result;

Steps d, obtains result.

2. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 1, is characterized in that, described URL angle recognition method step is:

3. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 2, is characterized in that, the process of the behavior of described website and minutia identification is,

Step mule b23, analyzes the request of receiving.

4. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 3, is characterized in that, the described step that the request of receiving is analyzed is,

5. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 1, is characterized in that, the process of described server side identification is:

Step b34, by result according to formatted output in weight feedback file.

6. according to the anti-phishing method based on multifactor Comprehensive Assessment method described in claim 2 or 4 or 5, it is characterized in that,

Described weights array, while being initialized as 0 rear statistics is added value in array, finally according to formatted output in file weight feedback file, while being convenient to gather weights, call;

7. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 1, is characterized in that, described reptile angle recognition comprises that a page outdegree number of links detects;

Finally, result is exported in weight feedback file.

8. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 1, is characterized in that, described reptile angle recognition also comprises the method for a web page files number and species detection, steps of the method are:

Finally, result is exported in weight feedback file.

9. according to the anti-phishing method based on multifactor Comprehensive Assessment method described in claim 7 or 8, it is characterized in that, in the method for described web page files number and species detection, the method that described weights evaluation has taked by stages to judge; First by statistical method, divide number interval, according to giving corresponding suspicious degree S (S ∈ [0,1]) between result location, be then multiplied by whole COEFFICIENT K of dividing; Suspicious degree total weight value: N=SK; Finally total weight value is outputed in weight feedback file with true-to-shape.

10. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 9, is characterized in that, the upper threshold value that described total weight value is reported to the police is decided to be 70, and lower threshold value is 30.