CN104202291A - Anti-phishing method based on multi-factor comprehensive assessment method - Google Patents

Anti-phishing method based on multi-factor comprehensive assessment method Download PDF

Info

Publication number
CN104202291A
CN104202291A CN201410177968.1A CN201410177968A CN104202291A CN 104202291 A CN104202291 A CN 104202291A CN 201410177968 A CN201410177968 A CN 201410177968A CN 104202291 A CN104202291 A CN 104202291A
Authority
CN
China
Prior art keywords
website
url
value
result
domain name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410177968.1A
Other languages
Chinese (zh)
Inventor
胡建伟
崔艳鹏
李英
胥红艳
李蕊
许乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201410177968.1A priority Critical patent/CN104202291A/en
Publication of CN104202291A publication Critical patent/CN104202291A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to an anti-phishing method based on a multi-factor comprehensive assessment method. The method comprises the following steps: step a, establishing a blacklist and whitelist library of URL (uniform resource locator), and processing a target URL, judging whether the processed URL is in the blacklist/whitelist, if so, executing the step d, directly feeding back a result to the user; otherwise, executing the step b, subsequently detecting the website; step b, detecting four aspects of the website: URL angle recognition, website behavior and detail feature recognition, server angle recognition and crawler angle recognition; step c, summarizing and affirming the feedback result; and step d, displaying a result. The method provided by the invention can be used for assessing in many ways with strict procedure; the consideration is comprehensive and the accuracy is high; the hit suspects and corresponding weight value, searched webpage link, website file and judgment criterion are displayed in a graphical interface in a simple and clear mode , the result is available for related professional for examining while being fed back to the user.

Description

Anti-phishing method based on multifactor Comprehensive Assessment method
Technical field
The present invention relates to the assessment method that a kind of guarding phishing is attacked, relate in particular to a kind of anti-phishing method based on multifactor Comprehensive Assessment method.
Background technology
At present, the Internet fraud occurs again and again, is threatening user's personal secrets.According to statistics, only the first half of the year in 2010, the direct and indirect economic loss that phishing brings to the common people and society is over 12,000,000,000 yuan.It is extremely urgent how guarding phishing is attacked (Phishing attack).Fail-safe software is single based on URL knowledge method for distinguishing to fishing website at present, does not relate to fishing website essence; Black and white lists identification has hysteresis quality, and fishing website frequently changes URL, and the method is a kind of passive anti-phishing that sacrificial section user benefit is prerequisite of take; Very low based on page feature recognition efficiency and speed, easily by fisherman, pretended to break through and detect, in addition, this type of solution all faces a common problem, conventionally when keeping high discrimination, can follow a higher rate of false alarm.Thereby existing classical inverse fishing method in the face of threat with rapid changepl. never-ending changes and improvements obviously unable to do what one wishes.
Through summary and the analysis to a large amount of fishing websites to existing anti-phishing means, the present invention has made up the deficiency of current fail-safe software.The present invention comforms and multi-direction fishing website is analyzed, and by applied statistics algorithm, thresholding algorithm, linear weighted function, verification and (checksum) means such as algorithm, makes this present invention have very high discrimination and reduce the rate of misrepresenting deliberately.
In view of above-mentioned defect, creator of the present invention has obtained this creation finally through long research and practice.
Summary of the invention
The object of the present invention is to provide a kind of anti-phishing method based on multifactor Comprehensive Assessment method in order to overcome above-mentioned technological deficiency.
For achieving the above object, the invention provides a kind of anti-phishing method based on multifactor Comprehensive Assessment method, it comprises the following steps:
Step a, sets up the black and white lists storehouse of URL, and target URL is processed, and whether URL after treatment of judgement in black/white list, if in list storehouse, performs step d, and directly feedback result is to user; If not in list storehouse, perform step b, carry out the detection to website below;
Step b, detects website;
Described detection comprises the detection of four aspects, URL angle recognition, website behavior and minutia identification, server side identification and reptile angle recognition; First carry out described URL angle recognition; The behavior of described website and minutia identification, can be written to total weight value in file to set form after described server side identification and described reptile angle recognition complete with three thread execution and detection respectively, to facilitate result to sum up feedback;
Step c, sums up equal rights feedback result;
If total weight value is added up, surpass the threshold value of setting, to user, send the danger warning of fishing website, if be less than threshold value, give the testing result of user feedback safety;
Steps d, shows result.
Preferably, described URL angle recognition method step is:
Step b11, carries out format specification to importing the URL of parameter into;
Step b12, if domain name progression surpasses setting, adds corresponding value in the relevant position of recording weights array;
Step b13, if the URL after described standard is IP form, adds corresponding value in the relevant position of recording weights array;
Step b14, if comprise spcial character, illustrates that network address pretends with spcial character, in the relevant position of recording weights array, adds corresponding value;
Step b15, if the number of path number of plies is too much, adds corresponding value in the relevant position of recording weights array.
Preferably, the process of the behavior of described website and minutia identification is,
Step b21, imports network address to be detected into, processes URL and extracts domain name and path, carries out DNS inquiry, connects with target;
Step b22, sends the GET request of HTTP according to the path extracting, obtain page source code and this source code is analyzed;
Step b23, analyzes the request of receiving.
Preferably, the described step that the request of receiving is analyzed is,
Step b231, checks in message header whether be provided with Cookie, if do not give corresponding weight value to global variable;
Step b232, adds up the content of script in response, by its length, divided by total page length, obtains script proportion, compares with lower threshold, if be greater than threshold value, in the relevant position of recording weights array, adds corresponding value;
Step b233, detects whether standard of HTML code, comprises that the attribute size judging in label is write whether to meet standard, and whether the target of action has drawn together with double quotation marks; Often meet a suspicious feature, corresponding weights institute multiplying factor adds 1;
Step b234, checks that whether the target of action attribute in <form> label is identical with this domain name, if weighting of difference;
Step b235,, analyzes GET response extraction parameter and also sends list in this domain name lower time in action target, and its response is analyzed, if there is Location in message header, detects this address whether under this domain name, if not weighting;
Step b236, to arrange formal output in weight feedback file, calls result while being convenient to gather weights.
Preferably, the process of described server side identification is:
Step b31, processes the URL importing into, extracts Main Domain, carries out DNS inquiry, if a not only IP under it, not weighting; If only have an IP under it, add corresponding value in the relevant position of recording weights array;
Step b32, inquires about IP address, if target is in the fishing website more country that distributes, in the relevant position of recording weights array, adds corresponding value;
Step b33, inquires about the domain name after standardization, extracts the difference of website expiration time and hour of log-on, if the designated value of being less than adds corresponding value in the relevant position of recording weights array from the response obtaining; Otherwise value corresponding in weights array need to do not recorded;
Step b34, by result according to formatted output in weight feedback file.
Preferably, described weights array, while being initialized as 0 rear statistics is added value in array, finally according to formatted output in file weight feedback file, while being convenient to gather weights, call;
The computing formula of described total weight value is: G=∑ s iw i; If draw, numerical value G is greater than upper limit threshold, to user, warns this website dangerous; If the numerical value G drawing is less than lower threshold, to user, return to the prompting of web portal security; If the numerical value G drawing, between bound, returns to corresponding suspicious degree to user, the prompting user access of being careful, and advise that user understands the method for anti-phishing attack.
Preferably, described reptile angle recognition comprises that a page outdegree number of links detects;
The method step that described page outdegree number of links detects is, after importing URL into, first with reptile, crawl webpage to be measured, obtain the ground floor out-degree under identical female domain name, the result searching is returned in graphic interface to facilitate user to check, and recorded its number of links;
Choose the second layer and link while testing, the method that employing extraction immediately from ground floor link is less than or equal to 5 out-degree realizes; While crawling second layer link, select the path under the female domain name of former webpage to search, record their out-degree sum; If number is greater than setting max-thresholds, think that this website is unsuspicious, jump out execution;
Finally, result is exported in weight feedback file.
Preferably, described reptile angle recognition also comprises the method for a web page files number and species detection, steps of the method are:
After importing URL into, first with reptile, crawl webpage to be measured, obtain the ground floor out-degree under identical female domain name, the result searching is returned in graphic interface to facilitate user to check, and recorded its number of links;
Crawl second layer when link, select the path under the female domain name of former webpage to search, check successively the file under webpage, judge that whether it is with html, htm, shtlm, asp, one of five types of php ending, if so, records the filename of this ground floor link; See that again its URL whether under former female domain name, if it is crawls out second layer file to this link, search the file of respective type and record its number, if number is greater than setting max-thresholds, think that this website is unsuspicious, jump out execution;
Finally, result is exported in weight feedback file.
Preferably, in the method for described web page files number and species detection, the method that described weights evaluation has taked by stages to judge; First by statistical method, divide number interval, according to giving corresponding suspicious degree S (S ∈ [0,1]) between result location, be then multiplied by whole COEFFICIENT K of dividing; Suspicious degree total weight value: N=SK; Finally total weight value is outputed in weight feedback file with true-to-shape.
Preferably, the upper threshold value that described total weight value is reported to the police is decided to be 70, and lower threshold value is 30.
Beneficial effect of the present invention is compared with the prior art: through summary and the analysis to a large amount of fishing websites to existing anti-phishing means, the present invention has broken through the single shortcoming of classical inverse fishing method detection angles, and in conjunction with existing detection means, from many-sides such as URL, website behavior and minutia, server, reptile obtaining informations, evaluate, there is again multinomial deliberated index each aspect, and process is rigorous; Utilize statistic algorithm and thresholding algorithm, to every suspicious points, give corresponding weight value, last comprehensive grading, considers that accuracy is high comprehensively; The suspicious points of hitting and corresponding weight value, the web page interlinkage searching, site file, basis for estimation are presented to graphical interfaces, simple and clear, when feeding back to user, also can check for relevant speciality personnel.
Accompanying drawing explanation
Fig. 1 is the functional block diagram of distribution wire broadband power carrier communication system of the present invention.
Embodiment
Below in conjunction with accompanying drawing, to the present invention is above-mentioned, be described in more detail with other technical characterictic and advantage.
Refer to shown in Fig. 1, it is the flow chart that the present invention is based on the anti-phishing method of multifactor Comprehensive Assessment method, wherein:
Step a, sets up the black and white lists storehouse of URL, and target URL is processed, and whether URL after treatment of judgement in black/white list, if in list storehouse, performs step d, and directly feedback result is to user; If not in list storehouse, perform step b, carry out the detection to website below.
Step b, in testing process, what first carry out is URL angle recognition, because this part execution speed is than very fast, there is no need to waste expense again and establishes specially a thread; The detection that three threads are left three aspects: afterwards.After completing, the detection of these three parts total weight value can be written in file temp_result.dat with the form of appointing, to facilitate result to sum up feedback.
Step c, sums up equal rights feedback result.If total weight value is added up, surpass the threshold value of agreement, to user, send the danger warning of fishing website, if be less than threshold value, give the testing result of user feedback safety.
An array that represents result used in the record of weights, is first initialized as 0, the value in array is added during statistics afterwards, finally according to formatted output in file temp_result, while being convenient to gather weights, call.For URL, website behavior and minutia, server side, first all suspicious points weights are set to 1, then upper 500 foreign fishing websites and 500 the domestic fishing websites of announcing of PhishTank are carried out to the number of times that each point of test statistics hits, then give weights to each suspicious points according to result.
In to the processing of above-mentioned testing result, linear weighted function method, thresholding algorithm and statistic algorithm etc. have mainly been applied.Statistic algorithm is in given scope, to obtain to meet the number that records imposing a condition, and with a conditional statement, judges whether current record meets specified criteria, meets to add up number and add one.In first three part, we adopt the linear weighted function method in multifactor comprehensive grading method to give a mark to above recognition result.With two vectors, realize, be respectively vectorial S<s 1, s 2... s i... .> and vectorial W<w 1, w 2... w i... >.In vectorial S, if suspicious points is above suspicious, will respond assignment is 1, otherwise assignment is 0; In vectorial W, w ifor corresponding s iweights, w imethod by above-mentioned statistic algorithm, drawn.
The computing formula of described total weight value is: G=∑ s iw i.Set, if draw, numerical value G is greater than upper limit threshold, to user, warns this website dangerous; If the numerical value G drawing is less than lower threshold, to user, return to the prompting of web portal security; If the numerical value G drawing, between bound, returns to corresponding suspicious degree to user, the prompting user access of being careful, and advise that user understands the method for anti-phishing attack.Wherein, concrete threshold value is also drawn by statistic algorithm.The upper threshold value that regulation total weight value is reported to the police is decided to be: 70, and lower threshold value is 30.
Steps d, shows result.
In running, the response of form list, the GET that can synchronously return to targeted website ask response, geographical position inquiry, and website out-degree link etc., site file and suspicious characteristic point and corresponding weight value thereof, can understand operation principle for relevant speciality personnel; After program end of run, can, according to the difference of total weight value, eject different prompting windows to user.
In described step b, comprise the inspection of four aspects, URL angle recognition b1, website behavior and minutia identification b2, server side identification b3 and reptile angle recognition b4.Below respectively these four kinds of inspections are described.
Described URL angle recognition: in angle recognition of the present invention, comprise black and white lists identification, network address formal check, is used spcial character to carry out camouflage inspection, and domain name progression checks, the inspection of path progression.
URL identification is one of method the most extensively adopting at present, has recognition speed fast, and the advantages such as black and white lists 100% discrimination, comprise based on URL blacklist technology and the URL detection technique based on machine learning etc.In the present invention, in black and white lists, directly point out user, further improve accuracy rate and the speed of detection.
Described network address formal check is for judging that whether network address form is suspicious.Fisherman often represents the universe name of fishing website URL with IP, so effective hidden server identity, this kind of URL can not forbid by closing the form of domain name simultaneously, and this kind of situation less appearance in the situation that of normal website, therefore can be used as judging the sign of URL dubiety.
Described use spcial character carries out camouflage inspection, in order to check fishing website except hide other forms of expression its domain name with IP address, conventionally by this mode of hexadecimal, encrypts or in URL, adds spcial character to disguise and forge URL.URL is used@to carry out camouflage inspection, and in URL, some character has specific function, and some character has specific function according to position.If character can not show according to literal meaning, will send to WEB server with escape form.In URL the real network address that plays analytic function from sign below, Here it is Deception Principle.
Whether domain name value of series checks, regular in order to judge domain name progression.In a normal URL, domain name can be reacted web site contents simply, and fisherman for the website that allows user believe that they access be regular website, its domain name can be arranged on the one hand be similar to regular website, also can after the domain name of its use, supplement the domain name of what regular website on the other hand.
Described path progression checks, in order to check the path progression of URL.A normal URL is comprised of domain name, access path and access parameter.Fisherman not only can make an effort in domain name, and access path below also tends to add that the contents such as abbreviation of counterfeit website carry out user cheating, and that this often shows as path progression is very many.
Described URL angle recognition procedure:
Step b11, becomes with http importing the URL standard of parameter into: the form of // beginning
Step b12, the number of ". ", English alphabet and "/" in statistics character string, if ". " outnumber appointed threshold, illustrate that domain name progression surpasses setting, adds corresponding value in the relevant position of recording weights array;
Step b13, is 3 (as 192.168.0.1) if there is no the number of English alphabet and ". ", and explanation is IP form, weighted value;
Step b14, if comprise spcial character, as "@" character and used too much hexadecimal code (as: %XX, X representative digit), illustrates that network address pretends with spcial character, weighting;
Step b15, if the number of "/" is too much, illustrates that the number of path number of plies is too much, weighting.
An array that represents result used in the record of weights, is first initialized as 0, the value in array is added during statistics afterwards, finally according to formatted output in file temp_result, while being convenient to gather weights, call.
The behavior of described website and minutia identification comprise that list Action checks, response analysis after submission form, and HTML standard degree checks, Cookie is set and checks, script ratio checks.
In fishing website, input is arbitrarily inputted after user name and password, and fishing website cannot learn whether user has inputted real user name and password, but make almost, similarly responds to user.In fishing website, more than 90% be all after obtaining user name and password, user is redirected to regular website and hides oneself; Also having some is " progressive formulas ", arbitrarily inputs also can obtain logging in successful response after data and carry out follow-up swindle content.Why fishing website has such behavioural characteristic, is because it does not have inquire about the database of examining, only for recording user name and password, and this itself and regular website are submitted aspect behavior this qualitative difference to processing user just.
Described list Action inspection, described list is for collecting dissimilar user's input, and when user click ACK button, the content of list can be sent to another file.The action attributes of list (action) has defined the filename (" html_form_action.asp ") of object file.The processing that this file being defined by action attributes can be correlated with to the input data that receive conventionally.The associated registration content of fishing website by by form submission of sheet to regular website, and regular website often can submission form to other domain name, so can be used as a suspicious feature.
Response analysis after described submission form, after submission form, the operation that carry out regular website is that user name and password are compared to inquiring about in database, and fishing website is taked some behaviors of fixing often, as user being redirected to regular website, strengthen the disguise of oneself, allow user be difficult to discover.Suspicious feature is that user is redirected under another domain name that does not belong to former domain name.
Described HTML standard degree checks, in order to check that the HTML code of website has standard.A legal regular website should be observed new standard as much as possible, and writing of fishing website code is often more random, and its standardization degree is compared low with regular website.Therefore,, if find that the HTML code of a website has nonstandard place, that will increase its suspicious degree.
The described Cookie that arranges checks, Cookie refer to some website in order to distinguish user identity, carry out session tracking and be stored in the data (conventionally through encrypting) in subscriber's local terminal.Yet, the function that fishing website does not need above-mentioned Cookie to provide conventionally.It builds is in order to extract user's the contents such as accounts information, does not even wish that user accesses them again, can increase like this risk that they are found report.
Described script ratio checks, by statistics, sets a threshold value, if script length accounts for the ratio of total page, has surpassed this threshold value, thinks that it is suspicious.
Website behavior and minutia identifying are
Step b21, imports network address to be detected into, then processes URL and extracts domain name and path, carries out DNS inquiry, connects with target;
Step b22, sends the GET request of HTTP according to the path extracting, obtain page source code and analyze.Wherein GET request is imitated IE browse request and is built
Step b23, carries out following a few step analysis to the request of receiving:
(1) check in message header whether be provided with Cookie, if give corresponding weight value to global variable Weight_Sum.
(2) content between all " <script> " and " </script> " in response is added up, by its length divided by total page length, obtain script proportion, compare with lower threshold, if be greater than threshold value, Weight_Sum adds corresponding weights.This team adds up in a large number to regular website and the script script ratio with the fishing website of long script feature, determines that lower threshold is 0.60;
(3) detect whether standard of HTML code: if find response in the situation that finding " <****> " label, close label " </****> ", judge the attribute size in label writes whether meet standard, whether the target of action uses double quotation marks " " to draw together, etc.Often meet a suspicious feature, corresponding weights institute multiplying factor adds 1.
(4) check that whether the target of action attribute in <form> label is identical with this domain name, if weighting of difference;
(5) in action target in this domain name lower time, analyze GET response extraction parameter and also send list, its response is analyzed, if there is Location in message header, detect this address whether under this domain name, if not weighting.
(6) by result with agreement formal output in temp_result.dat file, while being convenient to gather weights, call.
Server side identification comprises corresponding IP number inspection under domain name, and IP address geo location checks and Whois information check.
The visit capacity of a regular website and the visit capacity of fishing website have very big difference, and therefore the server of regular website and the server side surface technology of fishing website are perhaps differentiated.According to statistics, more than 90% fishing website is distributed in and escapes domestic law sanction overseas.In addition, if you access be domestic bank, but that domain name has but been resolved to overseas, this is also very suspicious.So we also can infer from the geographical position of IP whether it is fishing website.Some researchers also represent, fishing website has short feature life cycle, and this can be reflected in the whois information of website domain name.
Under domain name, corresponding IP number checks, large-scale website is conducted interviews, and sometimes can be mapped to different IP and get on, and this is because the higher domain name of these visit capacities has been used load-balancing technique.DNS load-balancing technique is for same host name, to configure a plurality of IP address in dns server, when replying DNS inquiry, dns server will return to different analysis results with the IP address of host record in DNS file in order to each inquiry, the access of client is directed to different machines gets on, make the different server of different client-access, thereby reach the object of load balancing.Yet often visit capacity is very limited for a simple and crude fishing website, producer can not spend into this technology of original employing, therefore can be used as a kind of feature of judgement fishing website.
Whether described IP address geo location checks, abnormal in order to judge IP address.For domestic user, we can detect its IP geographical position that will access, see whether it is at home and whether in above-mentioned the most suspicious several regions, judges that whether it is suspicious.
Described Whois information check, according to statistics, the mean survival time of fishing website is less than one day, and its domain name of using is often less expensive, and the service time of domain name is not long.And crucial its qualification of regular website is older, early, the difference of deadline and hour of log-on can be larger for hour of log-on.According to the test of this team, this difference of most regular website is greater than 3 years, and fishing website major part is less than 3 years.Therefore, can be accordingly as a suspicious points, whether detect website is fishing website.
The process of described server side identification is:
Step b31, the URL importing into is processed, extract Main Domain, carry out DNS inquiry, use the gethostbyname function in winsock, h_addr_list chained list length in the structure hostent returning is added up, if be greater than 1 explanation not only IP under it, not weighting, otherwise add corresponding weights.
Step b32, is submitted to http://www.ip138.com/ to the IP returning and inquires about, according to the statistics of anti-phishing alliance, and for domestic user, if target is in the fishing website more country that distributes, as: the U.S., corresponding weighting.
Step b33, is submitted to http://whois.chinaz.com/ by the domain name after standardization and inquires about, and extracts the difference of website expiration time and hour of log-on from the response obtaining, if the designated value of being less than, weighting, otherwise not weighting.Here according to previous statistics, designated value is 3 years temporarily.
Step b34, still installs result agreement formatted output in temp_result.dat.
Described reptile angle recognition comprises that page outdegree number of links detects and web page files number and species detection.
Web crawlers (Spider) is found webpage by chained address, from the some pages in website (normally homepage), read the content of webpage, find other chained address in webpage, then by them, find next webpage, circulation is always gone down, until all webpages in this website have all been captured.By the method with reptile, can analyze structure, scale and the importance of website.
Described page outdegree number of links detects, and the essence of the Internet is that some Web that formed by hyperlink scheme.When processing with web crawlers, the Web figure that above-mentioned hyperlink need to be formed puts into internal memory.To the webpage in Web figure, the link of its sensing is called " out-degree " of webpage.The departures link of website is called " link of ground floor out-degree " of this website, is called for short " ground floor out-degree "; The departures link of corresponding webpage ground floor out-degree is called " link of second layer out-degree " of original web, is called for short " second layer out-degree "; Web page importance is comprehensively drawn by the two-layer out-degree link of webpage, to improve correlation and the quality of Search Results.The judgment principle that described page outdegree number of links detects: regular website, webpage ground floor out-degree number of links is a lot, and it is also larger that the second layer of webpage goes out the number of degrees.And fishing website is managed by individual or Small Groups, low with other website correlation degrees, be difficult to form compared with large network structure, two-layer out-degree link number is all less.
The method step that described page outdegree number of links detects is, after importing URL into, first with reptile, crawl webpage to be measured, obtain the ground floor out-degree under identical female domain name, the result searching is returned in graphic interface to facilitate user to check, and recorded its number of links.
Because website ground floor may have a lot of links, consider time and efficiency, can not crawl successively each link.In order to address this problem, to choose the second layer and link while testing, program adopts the method that extracts immediately 5 out-degree from ground floor link to obtain realization, if discontented 5 links are all chosen and tested.While crawling second layer link, select the path under the female domain name of former webpage to search, record their out-degree sum.By prevention the too much problem of detection website out-degree number, set max-thresholds, if number is greater than it, think that this website is unsuspicious, jump out.Through experiment statistics, the two-layer out-degree total number of fishing website generally can not surpass 500, so threshold value is fixed tentatively, is 500.
Finally, by result with agreement formal file in so that subsequent calls is commented power assessment.
Described web page files number and species detection, obtain web page files with reptile, can obtain the number of files of corresponding kind.Web page files, mainly contains following several: static Web page text html, htm, dynamic page file shtml, server script file asp, php etc.They can show the Nomenclature Composition and Structure of Complexes layout of corresponding website, and its number and kind are more, and website level is darker, and server front end and rear library file relevance are stronger, and website property in neat formation is higher, and website is also more important.
Regular website is because website making is meticulous, and file hierarchy is clearly demarcated, and framework is complete, therefore web page files number is more and kind (dividing with function) is comparatively complete.And the general page of fishing website imitates regular website, overall structure is loose, lower with the correlation degree of other websites, therefore web page text and server script kind and number are all seldom, only there are keeper's daily record of the log-on message of recording and php, the asp etc. of support website in general backstage, and the testing result by reptile can directly contrast out.
But some fishing website keeper can copy the design feature of regular website, at server end, add file, so only by detecting this page, will find that its number of files and kind are all a lot, structure is also reasonable, thereby reaches the object of mixing the spurious with the genuine.For this situation, with reptile, obtain the web page files of second layer out-degree link: only search the path under female domain name, then conduct interviews respectively, record kind and the number of corresponding document.Because the out-degree that the second layer of fishing website is created link importance is low, website frame construction is simple, and web page files is less.Therefore comprehensively analyze the two-layer result that crawls, can draw the suspicious degree in website.
The method step of described web page files number and species detection:
First with reptile, crawl webpage to be measured, check successively the file under webpage, judge that whether it is with html, htm, shtlm, asp, one of five types of php ending, if so, records the filename of this ground floor link; See that again its URL whether under former female domain name, if it is crawls out second layer file to this link, search the file of respective type and record its number, and out-degree is returned successively and in graphic interface frame, formed arborescence and facilitate user to check.Consideration time and efficiency, it is identical that number choosing method and link portions are above divided, and is that two-layer links total number upper limit threshold changes 200 into.
What obtain due to this part is number, so the method that weights evaluation has taked by stages to judge.First by statistical method, divide number interval, according to giving corresponding suspicious degree S (S ∈ [0,1]) between result location, be then multiplied by whole COEFFICIENT K of dividing.Suspicious degree total weight value: N=SK.Finally by total weight value with agreement formal output in file, be convenient to call below feedback.
The foregoing is only preferred embodiment of the present invention, is only illustrative for invention, and nonrestrictive.Those skilled in the art is understood, and in the spirit and scope that limit, can carry out many changes to it in invention claim, revise, and even equivalence, but all will fall within the scope of protection of the present invention.

Claims (10)

1. the anti-phishing method based on multifactor Comprehensive Assessment method, is characterized in that, it comprises the following steps:
Step a, sets up the black and white lists storehouse of URL, and target URL is processed, and whether URL after treatment of judgement in black/white list, if in list storehouse, performs step d, and directly feedback result is to user; If not in list storehouse, perform step b, carry out the detection to website below;
Step b, detects website;
Described detection comprises the detection of four aspects, URL angle recognition, website behavior and minutia identification, server side identification and reptile angle recognition; First carry out described URL angle recognition; The behavior of described website and minutia identification, can be written to total weight value in file to set form after described server side identification and described reptile angle recognition complete with three thread execution and detection respectively, to facilitate result to sum up feedback;
Step c, sums up equal rights feedback result;
If total weight value is added up, surpass the threshold value of setting, to user, send the danger warning of fishing website, if be less than threshold value, give the testing result of user feedback safety;
Steps d, obtains result.
2. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 1, is characterized in that, described URL angle recognition method step is:
Step b11, carries out format specification to importing the URL of parameter into;
Step b12, if domain name progression surpasses setting, adds corresponding value in the relevant position of recording weights array;
Step b13, if the URL after described standard is IP form, adds corresponding value in the relevant position of recording weights array;
Step b14, if comprise spcial character, illustrates that network address pretends with spcial character, in the relevant position of recording weights array, adds corresponding value;
Step b15, if the number of path number of plies is too much, adds corresponding value in the relevant position of recording weights array.
3. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 2, is characterized in that, the process of the behavior of described website and minutia identification is,
Step b21, imports network address to be detected into, processes URL and extracts domain name and path, carries out DNS inquiry, connects with target;
Step b22, sends the GET request of HTTP according to the path extracting, obtain page source code and this source code is analyzed;
Step mule b23, analyzes the request of receiving.
4. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 3, is characterized in that, the described step that the request of receiving is analyzed is,
Step b231, checks in message header whether be provided with Cookie, if do not give corresponding weight value to global variable;
Step b232, adds up the content of script in response, by its length, divided by total page length, obtains script proportion, compares with lower threshold, if be greater than threshold value, in the relevant position of recording weights array, adds corresponding value;
Step b233, detects whether standard of HTML code, comprises that the attribute size judging in label is write whether to meet standard, and whether the target of action has drawn together with double quotation marks; Often meet a suspicious feature, corresponding weights institute multiplying factor adds 1;
Step b234, checks that whether the target of action attribute in <form> label is identical with this domain name, if weighting of difference;
Step b235,, analyzes GET response extraction parameter and also sends list in this domain name lower time in action target, and its response is analyzed, if there is Location in message header, detects this address whether under this domain name, if not weighting;
Step b236, to arrange formal output in weight feedback file, calls result while being convenient to gather weights.
5. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 1, is characterized in that, the process of described server side identification is:
Step b31, processes the URL importing into, extracts Main Domain, carries out DNS inquiry, if a not only IP under it, not weighting; If only have an IP under it, add corresponding value in the relevant position of recording weights array;
Step b32, inquires about IP address, if target is in the fishing website more country that distributes, in the relevant position of recording weights array, adds corresponding value;
Step b33, inquires about the domain name after standardization, extracts the difference of website expiration time and hour of log-on, if the designated value of being less than adds corresponding value in the relevant position of recording weights array from the response obtaining; Otherwise value corresponding in weights array need to do not recorded;
Step b34, by result according to formatted output in weight feedback file.
6. according to the anti-phishing method based on multifactor Comprehensive Assessment method described in claim 2 or 4 or 5, it is characterized in that,
Described weights array, while being initialized as 0 rear statistics is added value in array, finally according to formatted output in file weight feedback file, while being convenient to gather weights, call;
The computing formula of described total weight value is: G=∑ s iw i; If draw, numerical value G is greater than upper limit threshold, to user, warns this website dangerous; If the numerical value G drawing is less than lower threshold, to user, return to the prompting of web portal security; If the numerical value G drawing, between bound, returns to corresponding suspicious degree to user, the prompting user access of being careful, and advise that user understands the method for anti-phishing attack.
7. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 1, is characterized in that, described reptile angle recognition comprises that a page outdegree number of links detects;
The method step that described page outdegree number of links detects is, after importing URL into, first with reptile, crawl webpage to be measured, obtain the ground floor out-degree under identical female domain name, the result searching is returned in graphic interface to facilitate user to check, and recorded its number of links;
Choose the second layer and link while testing, the method that employing extraction immediately from ground floor link is less than or equal to 5 out-degree realizes; While crawling second layer link, select the path under the female domain name of former webpage to search, record their out-degree sum; If number is greater than setting max-thresholds, think that this website is unsuspicious, jump out execution;
Finally, result is exported in weight feedback file.
8. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 1, is characterized in that, described reptile angle recognition also comprises the method for a web page files number and species detection, steps of the method are:
After importing URL into, first with reptile, crawl webpage to be measured, obtain the ground floor out-degree under identical female domain name, the result searching is returned in graphic interface to facilitate user to check, and recorded its number of links;
Crawl second layer when link, select the path under the female domain name of former webpage to search, check successively the file under webpage, judge that whether it is with html, htm, shtlm, asp, one of five types of php ending, if so, records the filename of this ground floor link; See that again its URL whether under former female domain name, if it is crawls out second layer file to this link, search the file of respective type and record its number, if number is greater than setting max-thresholds, think that this website is unsuspicious, jump out execution;
Finally, result is exported in weight feedback file.
9. according to the anti-phishing method based on multifactor Comprehensive Assessment method described in claim 7 or 8, it is characterized in that, in the method for described web page files number and species detection, the method that described weights evaluation has taked by stages to judge; First by statistical method, divide number interval, according to giving corresponding suspicious degree S (S ∈ [0,1]) between result location, be then multiplied by whole COEFFICIENT K of dividing; Suspicious degree total weight value: N=SK; Finally total weight value is outputed in weight feedback file with true-to-shape.
10. the anti-phishing method based on multifactor Comprehensive Assessment method according to claim 9, is characterized in that, the upper threshold value that described total weight value is reported to the police is decided to be 70, and lower threshold value is 30.
CN201410177968.1A 2014-07-11 2014-07-11 Anti-phishing method based on multi-factor comprehensive assessment method Pending CN104202291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410177968.1A CN104202291A (en) 2014-07-11 2014-07-11 Anti-phishing method based on multi-factor comprehensive assessment method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410177968.1A CN104202291A (en) 2014-07-11 2014-07-11 Anti-phishing method based on multi-factor comprehensive assessment method

Publications (1)

Publication Number Publication Date
CN104202291A true CN104202291A (en) 2014-12-10

Family

ID=52087518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410177968.1A Pending CN104202291A (en) 2014-07-11 2014-07-11 Anti-phishing method based on multi-factor comprehensive assessment method

Country Status (1)

Country Link
CN (1) CN104202291A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104902008A (en) * 2015-04-26 2015-09-09 成都创行信息科技有限公司 Crawler data processing method
CN106354800A (en) * 2016-08-26 2017-01-25 中国互联网络信息中心 Undesirable website detection method based on multi-dimensional feature
CN106776946A (en) * 2016-12-02 2017-05-31 重庆大学 A kind of detection method of fraudulent website
CN106888220A (en) * 2017-04-12 2017-06-23 恒安嘉新(北京)科技股份公司 A kind of detection method for phishing site and equipment
CN107392022A (en) * 2017-07-20 2017-11-24 北京小度信息科技有限公司 Reptile identification, processing method and relevant apparatus
CN107896225A (en) * 2017-12-08 2018-04-10 深信服科技股份有限公司 Fishing website decision method, server and storage medium
WO2018072363A1 (en) * 2016-10-19 2018-04-26 中国互联网络信息中心 Method and device for extending data source
CN108243189A (en) * 2018-01-08 2018-07-03 平安科技(深圳)有限公司 A kind of Cyberthreat management method, device, computer equipment and storage medium
CN109121004A (en) * 2018-06-29 2019-01-01 深圳市九洲电器有限公司 Set-top box file access protection method and system
CN112966194A (en) * 2021-02-23 2021-06-15 杭州安恒信息技术股份有限公司 Method and system for checking two-dimensional code
CN113420239A (en) * 2021-06-24 2021-09-21 中山大学 Fishing site detection method based on hacker search grammar

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080028444A1 (en) * 2006-07-27 2008-01-31 William Loesch Secure web site authentication using web site characteristics, secure user credentials and private browser
CN101820366A (en) * 2010-01-27 2010-09-01 南京邮电大学 Pre-fetching-based phishing web page detection method
CN103905372A (en) * 2012-12-24 2014-07-02 珠海市君天电子科技有限公司 Method and device for removing false alarm of phishing website

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080028444A1 (en) * 2006-07-27 2008-01-31 William Loesch Secure web site authentication using web site characteristics, secure user credentials and private browser
CN101820366A (en) * 2010-01-27 2010-09-01 南京邮电大学 Pre-fetching-based phishing web page detection method
CN103905372A (en) * 2012-12-24 2014-07-02 珠海市君天电子科技有限公司 Method and device for removing false alarm of phishing website

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
梁雪松: "《基于浏览器的钓鱼网站检测技术研究》", 《信息安全与同心保密》 *
谭光林: "《反钓鱼系统的研究与设计》", 《反钓鱼系统的研究与设计》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104902008A (en) * 2015-04-26 2015-09-09 成都创行信息科技有限公司 Crawler data processing method
CN106354800A (en) * 2016-08-26 2017-01-25 中国互联网络信息中心 Undesirable website detection method based on multi-dimensional feature
WO2018072363A1 (en) * 2016-10-19 2018-04-26 中国互联网络信息中心 Method and device for extending data source
CN106776946A (en) * 2016-12-02 2017-05-31 重庆大学 A kind of detection method of fraudulent website
CN106888220A (en) * 2017-04-12 2017-06-23 恒安嘉新(北京)科技股份公司 A kind of detection method for phishing site and equipment
CN107392022A (en) * 2017-07-20 2017-11-24 北京小度信息科技有限公司 Reptile identification, processing method and relevant apparatus
CN107392022B (en) * 2017-07-20 2020-12-29 北京星选科技有限公司 Crawler identification and processing method and related device
CN107896225A (en) * 2017-12-08 2018-04-10 深信服科技股份有限公司 Fishing website decision method, server and storage medium
CN108243189A (en) * 2018-01-08 2018-07-03 平安科技(深圳)有限公司 A kind of Cyberthreat management method, device, computer equipment and storage medium
CN108243189B (en) * 2018-01-08 2020-08-18 平安科技(深圳)有限公司 Network threat management method and device, computer equipment and storage medium
CN109121004A (en) * 2018-06-29 2019-01-01 深圳市九洲电器有限公司 Set-top box file access protection method and system
CN109121004B (en) * 2018-06-29 2021-02-12 深圳市九洲电器有限公司 Set top box file access protection method and system
CN112966194A (en) * 2021-02-23 2021-06-15 杭州安恒信息技术股份有限公司 Method and system for checking two-dimensional code
CN113420239A (en) * 2021-06-24 2021-09-21 中山大学 Fishing site detection method based on hacker search grammar

Similar Documents

Publication Publication Date Title
CN104202291A (en) Anti-phishing method based on multi-factor comprehensive assessment method
Cresci et al. Fame for sale: Efficient detection of fake Twitter followers
CN104077396B (en) Method and device for detecting phishing website
Amin Azad et al. Web runner 2049: Evaluating third-party anti-bot services
CN107659570A (en) Webshell detection methods and system based on machine learning and static and dynamic analysis
CN111435507A (en) Advertisement anti-cheating method and device, electronic equipment and readable storage medium
CN106357689A (en) Method and system for processing threat data
US11308502B2 (en) Method for detecting web tracking services
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
US20220070215A1 (en) Method and Apparatus for Evaluating Phishing Sites to Determine Their Level of Danger and Profile Phisher Behavior
CN108023868B (en) Malicious resource address detection method and device
CN108229170B (en) Software analysis method and apparatus using big data and neural network
CN111859234A (en) Illegal content identification method and device, electronic equipment and storage medium
CN107800686A (en) A kind of fishing website recognition methods and device
CN109657119A (en) A kind of web crawlers detection method based on access log IP analysis
CN105376217A (en) Method for automatically determining malicious redirecting and malicious nesting offensive websites
CN111787002B (en) Method and system for analyzing safety of service data network
Apruzzese et al. Spacephish: The evasion-space of adversarial attacks against phishing website detectors using machine learning
Burda et al. Characterizing the redundancy of DarkWeb. onion services
CN107231383A (en) The detection method and device of CC attacks
CN107231364A (en) A kind of website vulnerability detection method and device, computer installation and storage medium
Roy et al. A large-scale analysis of phishing websites hosted on free web hosting domains
Shrestha et al. High-performance classification of phishing URLs using a multi-modal approach with MapReduce
CN113225343B (en) Risk website identification method and system based on identity characteristic information
Trivedi et al. Threat intelligence analysis of onion websites using sublinks and keywords

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141210

RJ01 Rejection of invention patent application after publication