US20080162449A1 - Dynamic page similarity measurement - Google Patents
Dynamic page similarity measurement Download PDFInfo
- Publication number
- US20080162449A1 US20080162449A1 US11/617,654 US61765406A US2008162449A1 US 20080162449 A1 US20080162449 A1 US 20080162449A1 US 61765406 A US61765406 A US 61765406A US 2008162449 A1 US2008162449 A1 US 2008162449A1
- Authority
- US
- United States
- Prior art keywords
- web page
- component
- similarity
- components
- given
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/51—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/235—Update request formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1483—Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2119—Authenticating web pages, e.g. with suspicious links
Definitions
- Phishing represents a fraudulent technique employed to obtain confidential transaction information (such as user name, password, financial information, credit card information, etc.) from computer users for misuse.
- the phisher employs a phishing server to send an apparently official electronic communication (such as an official looking email) to the victim.
- an apparently official electronic communication such as an official looking email
- the email would typically come from an XYZ bank email address and contain official-looking logos and language to deceive the victim into believing that the email is legitimate.
- the phisher's email typically includes language urging the victim to access the website of XYZ bank in order to verify some information or to confirm some transaction.
- the email also typically includes a link for use by the victim to supposedly access the website of XYZ bank.
- the sham website referred to herein as the phishing website, would then ask for confidential information from the victim. Since the victim had been told in advance that the purpose of clicking on the link is to verify some account information or to confirm some transaction, many victims unquestioningly enter the requested information.
- the confidential information is collected by the phisher, the phisher can subsequently employ the information to perpetrate fraud on the victim by stealing money from the victim's account, by purchasing goods using the account funds, etc.
- FIG. 1 illustrates an example of a phishing attack.
- a phisher 102 typically an email server that is under control of a human phisher
- the email may, for example, attempt to convince the recipient 108 to update his account by clicking on an attached link to access a web page. If the recipient 108 clicks on the link, the web page that opens would then request the user to enter the user's confidential information such as userid, password, account number, etc.
- the user's confidential information is sent ( 110 ) the user's confidential information to a phishing website 112 .
- Phishing website 112 collects the user's confidential information to allow the phisher to perpetrate fraud on the user.
- phishers actually divert the victim to another website other than the website of the legitimate business that the victim intended to visit, some knowledgeable users may be able to spot the difference in the website domain names and may become alert to the possibility that a phishing attack is being attempted. For example, if a victim is taken to a website whose domain name “http://218.246.224.203/icons/cgi-bin/xyzbank/login.php” appears in the browser's URL address bar, that victim may be alert to the fact that the phisher's website URL address as shown on the browser's URL toolbar is different from the usual “http://www.xyzbank.com/us/cgi-bin/login.php” and may refuse to furnish the confidential information out of suspicion.
- the invention relates, in an embodiment, to a computer-implemented method for ascertaining which web page among a plurality of candidate web pages is similar to a given web page.
- the method includes extracting a set of web page components from the given web page.
- the method also includes comparing the given web page against each of the plurality of candidate web pages in turn. The comparing results in a composite similarity score for the set of web page components.
- the composite similarity score is computed from scores assigned to individual ones of the set of web page components in accordance with a set of scoring rules associated with the web page that is under examination for similarity, wherein a web page component of the set of web page components is associated with a first score if the web page component also exists in the web page that is under examination for similarity.
- the web page component of the set of web page components is associated with second score different from the first web page component if the web page component does not exists in the web page that is under examination for similarity. If the composite similarity score exceeds a predefined threshold, the method also includes designating the given web page similar to the web page that is under examination for similarity.
- FIG. 1 illustrates an example of a phishing attack.
- FIG. 2 shows, in accordance with an embodiment of the invention, the high level steps for preparing the set of likely target web pages for similarity comparison.
- FIG. 3 shows, in accordance with an embodiment of the present invention, the steps for performing similarity analysis for a suspect web page.
- the invention might also cover articles of manufacture that includes a computer readable medium on which computer-readable instructions for carrying out embodiments of the inventive technique are stored.
- the computer readable medium may include, for example, semiconductor, magnetic, opto-magnetic, optical, or other forms of computer readable medium for storing computer readable code.
- the invention may also cover apparatuses for practicing embodiments of the invention. Such apparatus may include circuits, dedicated and/or programmable, to carry out tasks pertaining to embodiments of the invention. Examples of such apparatus include a general-purpose computer and/or a dedicated computing device when appropriately programmed and may include a combination of a computer/computing device and dedicated/programmable circuits adapted for the various tasks pertaining to embodiments of the invention.
- phishing web page Since the purpose of a phishing web page is divert the user input information to a website controlled by the phisher, this fact provides a possible approach to detect whether a particular web page is being used in attempting to commit phishing fraud. If the counterpart legitimate web page can be determined, it is possible then to determine whether the transaction information destination (i.e., the location that the respective web pages specify for user input data to be sent) would be the same for both the legitimate web page and for the suspect web page (e.g., one under investigation to ascertain whether that web page is attempting to commit a phishing fraud). If the transaction information destinations are different for the two web pages, that difference is an indication that a phishing fraud may be underway.
- the transaction information destination i.e., the location that the respective web pages specify for user input data to be sent
- a given web page is sufficiently similar to a suspect web page such that the given web page is likely the counterpart legitimate web page that the suspect web page is attempting to emulate. Since there are potentially billions of web pages in existence today, it would be impractical to test a suspect web page against every web page in existence to determine whether they are similar. Even if there is sufficient computing power to do so, the amount of time required to make such a similarity determination would render the technique impractical in use.
- the inventors herein realize, however, that given the scope of the phishing problem, the set of web pages to be tested for similarity against a suspect web page is substantially smaller and more manageable than the set of all available web pages. It is reasoned that the majority of phishing attempts will be focused on a few types of web page, including those that collect transaction information from the user for example. Accordingly, web pages that merely implement static presentations of data do not present the same degree of phishing risk as a web page that collects, for example, the user's login data, the user's financial data, or any of the user's personal, financial, and/or confidential data.
- the target websites are likely to be found among financial institution sites (such as banks, on-line trading accounts, online payment accounts), shopping sites (such as sites that allow the user to purchase goods and have the goods shipped to a particular address upon entering the user's financial and/or login data), and generally any website that provides goods and/or services upon the user's presentation of authenticating and/or financial/personal data.
- financial institution sites such as banks, on-line trading accounts, online payment accounts
- shopping sites such as sites that allow the user to purchase goods and have the goods shipped to a particular address upon entering the user's financial and/or login data
- any website that provides goods and/or services upon the user's presentation of authenticating and/or financial/personal data.
- each likely target web page is associated with a set of scoring rules (which may comprise one or more scoring rules) for scoring features of that target web page if those same features are found on the suspect web page.
- each web page may be thought of as a combination of features.
- These features may include visible characteristics or attributes, such as the color, location, and size of its images or textual information.
- These features may also include background characteristics or attributes that are not necessarily visible to a user. For example, some portion of many web pages may be formed using code that is largely invisible to the user but nevertheless contributes to the transmission, generation, and/or operation of the web page. Examples of these features include the URL strings specifying the destination for the user-input transaction information, HTML strings or other codes to perform computations, etc.
- the login page of XYZ bank may be associated with a set of scoring rules that gives a high score for a nearly invisible security feature while giving a lower score for an obvious feature, such as a prominently displayed logo. This is because, for example, it may have been judged that it would be unlikely for a phisher to duplicate a nearly invisible and easily overlooked feature than to copy a highly visible logo.
- a set of scoring rules for the login page for XYZ bank may give a particular score for a particular field of content, including for example the domain/port/query/string of a URL and/or the HTML/text string of a URL.
- any feature may be associated with a score, if desired, and the particular score associated with a feature may vary and may even be arbitrary.
- the rule creator may arbitrarily decide that a particular misspelling is intentional, or a particular background characteristic that can be easily overlooked is intentional and the absence of that feature in a suspect web page may indicate that that the suspect web page is not similar to the target web page at issue.
- the set of scoring rules associated with the login page for XYZ bank would be employed for scoring features found in the suspect web page. In this manner, if the suspect web page has a large number of features in common with the login page for XYZ bank and/or has in common certain high-scoring features, the suspect web page may earn a sufficiently high aggregate score to be deemed similar to the login page for XYZ bank.
- the threshold for deciding whether an aggregate score earned by a suspect web page when that suspect web page is compared against the login page for XYZ bank may be implemented in the set of scoring rules for the login page of XYZ bank, for example. As with the determination of how many point a particular feature may be worth, the determination of the particular threshold value for deeming a suspect web page similar may be made empirically by a human or by automated software.
- each potential target web page e.g., Acme Store credit card entry page
- each potential target web page e.g., Acme Store credit card entry page
- that set of scoring rules are employed to generate a score for a suspect web page when that suspect web page is compared against Acme Store credit card entry page.
- the similarity threshold value to determine whether a suspect web page is similar to Acme Store credit card entry page is implemented by the set of scoring rules associated with the Acme Store credit card entry page.
- the set of scoring rules associated with that potential target web page e.g., ABC Bank personal information authentication page
- the similarity threshold value to determine whether a suspect web page is similar to the ABC Bank personal information authentication page is implemented by the set of scoring rules associated with the ABC Bank personal information authentication page.
- the score associated with each feature and/or the similarity threshold in the set of scoring rules for a particular web page may be continually refined and updated each time a “false positive” or an erroneous identification of similarity or dissimilarity occurs.
- the scoring rules may be revised and/or the similarity threshold in the set of scoring rules for that particular web page may be revised upward so that only suspect web pages that have a large number of features in common or having a sufficient number of high-scoring features in common would be judged to be similar.
- the scoring rules may be revised and/or the similarity threshold in the set of scoring rules for that particular web page may be revised downward so that web pages that are truly similar may be judged to be to be similar by the set of scoring rules for that particular web page. Since the set of scoring rules are associated with the legitimate web page, the effect of continually improving the scoring rules result in increasingly accurate similarity identification as more suspect web pages are tested against the legitimate web page.
- fuzzy logic or artificial intelligence may be employed to render the comparison process more efficient and/or accurate.
- regular expressions for textual features may be employed in the evaluation of features and can achieve a good accuracy.
- a regular expression refers to a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are known to those skilled in the art and will not be explained in details herein. Using regular expressions in the creation of the set of scoring rules and in the scoring rules themselves increases the flexibility with which features in the suspect web pages may be identified and scored.
- FIG. 2 shows, in accordance with an embodiment of the invention, the high level steps for preparing the set of likely target web pages for similarity comparison.
- the set of likely target web pages are selected on the basis of website type and web page type.
- website type websites that are popular and/or provide money, goods, or services tend to be sites that are targets for phishers and may thus be chosen in an embodiment.
- web pages that request from users transaction information tend to be web pages that are targets of phishers and may thus be chosen, in an embodiment.
- both the website type filter and web page type filter may be employed to select the set of likely target web pages.
- a human operator may select and add web pages to the set likely target web pages if it is believed that those web pages may be phishing targets.
- web pages may also be included based on other criteria designed to select web pages deemed to be likely to be susceptible to phishing attacks
- each of the likely target web page in the set of likely target web pages are processed to generate a set of scoring rules for features in that web page.
- a feature may represent any attribute or characteristic of a web page, whether or not human or visually perceptible.
- a human operator may manually designate the features worthy of scoring and the score associated with each of the web page features.
- software may be employed to scan through a web page and/or the code implementing the web page and assign scores to some or all of the features found.
- each web page and its set of scoring rules are stored ( 206 ) for subsequent use in similarity determination with a suspect web page.
- FIG. 3 shows, in accordance with an embodiment of the present invention, the steps for performing similarity analysis for a suspect web page.
- the suspect web page is received.
- the suspect web page is compared against each likely target web page in the set of likely target web pages.
- web pages in the set of likely target web pages may optionally be re-ordered based on information gleaned from the suspect web page such that those likely target web pages that have a highly probability of a similarity match are tested first. For example, if text or image in the suspect web page suggests that the suspect web page is a login web page for a particular enterprise, likely target login web pages for that particular enterprise may be tested first.
- the set of scoring rules for the likely target web page currently being tested is employed to score features found in the suspect web page. If the aggregate score exceeds (or equal to, in an embodiment) a certain similarity threshold (as determined by step 306 ), that likely target web page is identified as the web page that is similar to the suspect web page ( 308 ). Thereafter, analysis may be performed on the suspect web page to determine whether the suspect web page is indeed represents an attempt to perform a phishing attack on the identified similar target web page.
- changes may be made to the selection of features, the scoring of features, and/or the similarity threshold associated with the set of scoring rules for the target web page that was misidentified as being similar to the suspect web page. If all likely target web pages are exhausted and no similar web pages are found, a report is then provided, noting that a similar web page is not found among the set of likely target web pages. In this case, the similarity testing may proceed against additional web pages that were not included in the set of likely target web pages or the operator may be notified and the method of FIG. 3 may simply end after notification.
- embodiments of the invention are able to ascertain the identity of the target web page in a highly efficient manner.
- the set of likely target web pages may be made smaller. Since each likely target web page is associated with its own scoring rules, much flexibility is afforded to entities who own those likely target web pages in deciding whether the suspect web page is sufficiently similar. If an erroneous similarity determination is made, changes to the scoring rules and/or the similarity threshold may be made, enabling the similarity determination process to become more accurate over time.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
A method for determining which web page among multiple candidate web pages is similar to a given web page. For each candidate web page, a set of scoring rules is provided to score the components therein. When the given web page is compared against a candidate web page, each component that is found in both the given web page and the candidate web page under examination is given a score in accordance with the set of scoring rules that is specific to that web page under examination. A composite similarity score is computed for each comparison between the given webpage and a candidate web page. If the composite similarity score exceeds a predefined threshold value for a comparison between the given webpage and a candidate web page, that candidate web page is deemed the web page that is similar.
Description
- Phishing represents a fraudulent technique employed to obtain confidential transaction information (such as user name, password, financial information, credit card information, etc.) from computer users for misuse. In phishing, the phisher employs a phishing server to send an apparently official electronic communication (such as an official looking email) to the victim. For example, if a phisher wishes to obtain confidential information to access a victim's account at XYZ bank, the email would typically come from an XYZ bank email address and contain official-looking logos and language to deceive the victim into believing that the email is legitimate.
- Further, the phisher's email typically includes language urging the victim to access the website of XYZ bank in order to verify some information or to confirm some transaction. The email also typically includes a link for use by the victim to supposedly access the website of XYZ bank. However, when the victim clicks on the link included in the email, the victim is taken instead to a sham website set up in advance by the phisher. The sham website, referred to herein as the phishing website, would then ask for confidential information from the victim. Since the victim had been told in advance that the purpose of clicking on the link is to verify some account information or to confirm some transaction, many victims unquestioningly enter the requested information. Once the confidential information is collected by the phisher, the phisher can subsequently employ the information to perpetrate fraud on the victim by stealing money from the victim's account, by purchasing goods using the account funds, etc.
-
FIG. 1 illustrates an example of a phishing attack. InFIG. 1 , a phisher 102 (typically an email server that is under control of a human phisher) sends an official-lookingemail 104 designed to convince arecipient 108 that the email is sent by a legitimate business, such as bybank 106. The email may, for example, attempt to convince therecipient 108 to update his account by clicking on an attached link to access a web page. If therecipient 108 clicks on the link, the web page that opens would then request the user to enter the user's confidential information such as userid, password, account number, etc. - However, since the web page did not come from the
legitimate business 106, the user's confidential information is sent (110) the user's confidential information to aphishing website 112.Phishing website 112 then collects the user's confidential information to allow the phisher to perpetrate fraud on the user. - Because phishers actually divert the victim to another website other than the website of the legitimate business that the victim intended to visit, some knowledgeable users may be able to spot the difference in the website domain names and may become alert to the possibility that a phishing attack is being attempted. For example, if a victim is taken to a website whose domain name “http://218.246.224.203/icons/cgi-bin/xyzbank/login.php” appears in the browser's URL address bar, that victim may be alert to the fact that the phisher's website URL address as shown on the browser's URL toolbar is different from the usual “http://www.xyzbank.com/us/cgi-bin/login.php” and may refuse to furnish the confidential information out of suspicion. However, it is known that many users are not sophisticated or always vigilant against phishing attempts. Accordingly, relying on users to stay on guard against phishing attempts has proven to be an inadequate response to the phishing problem.
- The invention relates, in an embodiment, to a computer-implemented method for ascertaining which web page among a plurality of candidate web pages is similar to a given web page. The method includes extracting a set of web page components from the given web page. The method also includes comparing the given web page against each of the plurality of candidate web pages in turn. The comparing results in a composite similarity score for the set of web page components. The composite similarity score is computed from scores assigned to individual ones of the set of web page components in accordance with a set of scoring rules associated with the web page that is under examination for similarity, wherein a web page component of the set of web page components is associated with a first score if the web page component also exists in the web page that is under examination for similarity. The web page component of the set of web page components is associated with second score different from the first web page component if the web page component does not exists in the web page that is under examination for similarity. If the composite similarity score exceeds a predefined threshold, the method also includes designating the given web page similar to the web page that is under examination for similarity.
- These and other features of the present invention will be described in more detail below in the detailed description of the invention and in conjunction with the following figures.
- The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
-
FIG. 1 illustrates an example of a phishing attack. -
FIG. 2 shows, in accordance with an embodiment of the invention, the high level steps for preparing the set of likely target web pages for similarity comparison. -
FIG. 3 shows, in accordance with an embodiment of the present invention, the steps for performing similarity analysis for a suspect web page. - The present invention will now be described in detail with reference to a few embodiments thereof as illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process steps and/or structures have not been described in detail in order to not unnecessarily obscure the present invention.
- Various embodiments are described herein below, including methods and techniques. It should be kept in mind that the invention might also cover articles of manufacture that includes a computer readable medium on which computer-readable instructions for carrying out embodiments of the inventive technique are stored. The computer readable medium may include, for example, semiconductor, magnetic, opto-magnetic, optical, or other forms of computer readable medium for storing computer readable code. Further, the invention may also cover apparatuses for practicing embodiments of the invention. Such apparatus may include circuits, dedicated and/or programmable, to carry out tasks pertaining to embodiments of the invention. Examples of such apparatus include a general-purpose computer and/or a dedicated computing device when appropriately programmed and may include a combination of a computer/computing device and dedicated/programmable circuits adapted for the various tasks pertaining to embodiments of the invention.
- Since the purpose of a phishing web page is divert the user input information to a website controlled by the phisher, this fact provides a possible approach to detect whether a particular web page is being used in attempting to commit phishing fraud. If the counterpart legitimate web page can be determined, it is possible then to determine whether the transaction information destination (i.e., the location that the respective web pages specify for user input data to be sent) would be the same for both the legitimate web page and for the suspect web page (e.g., one under investigation to ascertain whether that web page is attempting to commit a phishing fraud). If the transaction information destinations are different for the two web pages, that difference is an indication that a phishing fraud may be underway.
- The aforementioned approach would be operative only if, however, the identity of the counterpart legitimate web page can be ascertained from the suspect web page. Ascertaining whether a given web page is sufficiently similar to a suspect web page such that the given web page is likely the counterpart legitimate web page that the suspect web page is attempting to emulate is a subject of the present invention herein.
- In accordance with embodiments of the present invention, there are provided methods and apparatus for dynamically ascertaining whether a given web page is sufficiently similar to a suspect web page such that the given web page is likely the counterpart legitimate web page that the suspect web page is attempting to emulate. Since there are potentially billions of web pages in existence today, it would be impractical to test a suspect web page against every web page in existence to determine whether they are similar. Even if there is sufficient computing power to do so, the amount of time required to make such a similarity determination would render the technique impractical in use.
- The inventors herein realize, however, that given the scope of the phishing problem, the set of web pages to be tested for similarity against a suspect web page is substantially smaller and more manageable than the set of all available web pages. It is reasoned that the majority of phishing attempts will be focused on a few types of web page, including those that collect transaction information from the user for example. Accordingly, web pages that merely implement static presentations of data do not present the same degree of phishing risk as a web page that collects, for example, the user's login data, the user's financial data, or any of the user's personal, financial, and/or confidential data.
- Furthermore, it is reasoned that the majority of phishing attempts would also be focused on a certain known types of website. For example, the large majority of phishing attempts will be motivated by financial fraud, and thus the target websites are likely to be found among financial institution sites (such as banks, on-line trading accounts, online payment accounts), shopping sites (such as sites that allow the user to purchase goods and have the goods shipped to a particular address upon entering the user's financial and/or login data), and generally any website that provides goods and/or services upon the user's presentation of authenticating and/or financial/personal data.
- Of these websites, it is reasoned that a large majority of phishing attempts will again be focused on those that are most popular since the user whom the phisher is attempting to deceive would more likely have an account at a popular online store versus a relatively obscure online store. By progressively narrowing down the set of possible target websites and web pages, the number of web pages to be tested for similarity against a suspect phishing web page can be kept manageably small for computational purposes. Even by focusing only on the top dozens or hundreds of target websites and web pages (which may be identified by performing a study of past phishing attempts for example), it is possible to provide a heightened level of protection against phishing via the ability to identify the target web page for a large majority of the time, and to determine whether their transaction information destinations are the same.
- The inventors herein also provide techniques to efficiently test a particular potential target web page for similarity with a suspect web page. In accordance with an embodiment of the invention, each likely target web page is associated with a set of scoring rules (which may comprise one or more scoring rules) for scoring features of that target web page if those same features are found on the suspect web page.
- To elaborate, each web page may be thought of as a combination of features. These features may include visible characteristics or attributes, such as the color, location, and size of its images or textual information. These features may also include background characteristics or attributes that are not necessarily visible to a user. For example, some portion of many web pages may be formed using code that is largely invisible to the user but nevertheless contributes to the transmission, generation, and/or operation of the web page. Examples of these features include the URL strings specifying the destination for the user-input transaction information, HTML strings or other codes to perform computations, etc
- Since the set of likely target web pages are limited in number given the scope of the phishing problem, it is possible to manually (i.e., performed by a human) or automatically (i.e., performed in an automated manner using software) generate rules for scoring features of a particular target web page.
- For example, the login page of XYZ bank may be associated with a set of scoring rules that gives a high score for a nearly invisible security feature while giving a lower score for an obvious feature, such as a prominently displayed logo. This is because, for example, it may have been judged that it would be unlikely for a phisher to duplicate a nearly invisible and easily overlooked feature than to copy a highly visible logo. As another example, such a set of scoring rules for the login page for XYZ bank may give a particular score for a particular field of content, including for example the domain/port/query/string of a URL and/or the HTML/text string of a URL.
- Generally speaking, any feature may be associated with a score, if desired, and the particular score associated with a feature may vary and may even be arbitrary. For example, the rule creator may arbitrarily decide that a particular misspelling is intentional, or a particular background characteristic that can be easily overlooked is intentional and the absence of that feature in a suspect web page may indicate that that the suspect web page is not similar to the target web page at issue.
- Thus, when a suspect web page is compared against the login page for XYZ bank for the purpose of determining whether the suspect web page and the login page for XYZ bank is similar, the set of scoring rules associated with the login page for XYZ bank would be employed for scoring features found in the suspect web page. In this manner, if the suspect web page has a large number of features in common with the login page for XYZ bank and/or has in common certain high-scoring features, the suspect web page may earn a sufficiently high aggregate score to be deemed similar to the login page for XYZ bank.
- The threshold for deciding whether an aggregate score earned by a suspect web page when that suspect web page is compared against the login page for XYZ bank may be implemented in the set of scoring rules for the login page of XYZ bank, for example. As with the determination of how many point a particular feature may be worth, the determination of the particular threshold value for deeming a suspect web page similar may be made empirically by a human or by automated software.
- The point is each potential target web page (e.g., Acme Store credit card entry page) is associated with a set of scoring rules for its features, and that set of scoring rules are employed to generate a score for a suspect web page when that suspect web page is compared against Acme Store credit card entry page. Furthermore, the similarity threshold value to determine whether a suspect web page is similar to Acme Store credit card entry page is implemented by the set of scoring rules associated with the Acme Store credit card entry page.
- When the suspect web page is compared against another potential target web page (e.g., ABC Bank personal information authentication page), the set of scoring rules associated with that potential target web page (e.g., ABC Bank personal information authentication page) would be employed instead to generate the similarity score. Further, the similarity threshold value to determine whether a suspect web page is similar to the ABC Bank personal information authentication page is implemented by the set of scoring rules associated with the ABC Bank personal information authentication page.
- In this manner, it is possible for each web page or website owner to decide the importance place on each individual feature of his web page for the purpose of deciding whether another web page is sufficiently similar. In an embodiment, the score associated with each feature and/or the similarity threshold in the set of scoring rules for a particular web page may be continually refined and updated each time a “false positive” or an erroneous identification of similarity or dissimilarity occurs. For example, if the similarity threshold is so low that suspect web pages are often misidentified as being similar to a particular web page, the scoring rules may be revised and/or the similarity threshold in the set of scoring rules for that particular web page may be revised upward so that only suspect web pages that have a large number of features in common or having a sufficient number of high-scoring features in common would be judged to be similar.
- As another example, if the similarity threshold is so high that no suspect web page is ever identified as being similar to a particular web page even though a suspect web page is the same as that particular web page (i.e., failing to identify that the two websites are similar), the scoring rules may be revised and/or the similarity threshold in the set of scoring rules for that particular web page may be revised downward so that web pages that are truly similar may be judged to be to be similar by the set of scoring rules for that particular web page. Since the set of scoring rules are associated with the legitimate web page, the effect of continually improving the scoring rules result in increasingly accurate similarity identification as more suspect web pages are tested against the legitimate web page.
- In an embodiment, fuzzy logic or artificial intelligence may be employed to render the comparison process more efficient and/or accurate. In some embodiments, regular expressions for textual features may be employed in the evaluation of features and can achieve a good accuracy. In the context of the present application, a regular expression refers to a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are known to those skilled in the art and will not be explained in details herein. Using regular expressions in the creation of the set of scoring rules and in the scoring rules themselves increases the flexibility with which features in the suspect web pages may be identified and scored.
- The features and advantages of the invention may be better understood with reference to the figures and discussions that follow.
FIG. 2 shows, in accordance with an embodiment of the invention, the high level steps for preparing the set of likely target web pages for similarity comparison. Instep 202, the set of likely target web pages are selected on the basis of website type and web page type. With respect to website type, websites that are popular and/or provide money, goods, or services tend to be sites that are targets for phishers and may thus be chosen in an embodiment. - With respect to web page type, web pages that request from users transaction information (including for example login information, any confidential and/or financial transaction information, etc.) tend to be web pages that are targets of phishers and may thus be chosen, in an embodiment. In an embodiment, both the website type filter and web page type filter may be employed to select the set of likely target web pages. Alternatively or additionally, a human operator may select and add web pages to the set likely target web pages if it is believed that those web pages may be phishing targets. In these or other embodiments, web pages may also be included based on other criteria designed to select web pages deemed to be likely to be susceptible to phishing attacks
- In
step 204 each of the likely target web page in the set of likely target web pages are processed to generate a set of scoring rules for features in that web page. As discussed, a feature may represent any attribute or characteristic of a web page, whether or not human or visually perceptible. In an embodiment, a human operator may manually designate the features worthy of scoring and the score associated with each of the web page features. In another embodiment, software may be employed to scan through a web page and/or the code implementing the web page and assign scores to some or all of the features found. - After each web page in the set of likely target web pages is processed, each web page and its set of scoring rules are stored (206) for subsequent use in similarity determination with a suspect web page.
-
FIG. 3 shows, in accordance with an embodiment of the present invention, the steps for performing similarity analysis for a suspect web page. Instep 302, the suspect web page is received. Instep 304, the suspect web page is compared against each likely target web page in the set of likely target web pages. In an embodiment, web pages in the set of likely target web pages may optionally be re-ordered based on information gleaned from the suspect web page such that those likely target web pages that have a highly probability of a similarity match are tested first. For example, if text or image in the suspect web page suggests that the suspect web page is a login web page for a particular enterprise, likely target login web pages for that particular enterprise may be tested first. - Generally speaking, the set of scoring rules for the likely target web page currently being tested is employed to score features found in the suspect web page. If the aggregate score exceeds (or equal to, in an embodiment) a certain similarity threshold (as determined by step 306), that likely target web page is identified as the web page that is similar to the suspect web page (308). Thereafter, analysis may be performed on the suspect web page to determine whether the suspect web page is indeed represents an attempt to perform a phishing attack on the identified similar target web page.
- On the other hand, if the aggregate score is below (or equal to, in another embodiment) to the similarity threshold, that likely target web page is not identified as the web page that is similar to the suspect web page (310). Thereafter, comparison of the suspect web page against the likely target web pages continue until similarity is found.
- In an embodiment, if a subsequent analysis ascertains that the similarity determination result from the steps of
FIG. 3 is erroneous, changes may be made to the selection of features, the scoring of features, and/or the similarity threshold associated with the set of scoring rules for the target web page that was misidentified as being similar to the suspect web page. If all likely target web pages are exhausted and no similar web pages are found, a report is then provided, noting that a similar web page is not found among the set of likely target web pages. In this case, the similarity testing may proceed against additional web pages that were not included in the set of likely target web pages or the operator may be notified and the method ofFIG. 3 may simply end after notification. In an embodiment, if more than one target web pages are determined to be similar to the suspect web page, no result will be drawn for this suspect web page, and the scoring rules may be revised iteratively to avoid this case. This embodiment is intended to minimize “false positives,” as in the case wherein multiple web pages are determined to be similar and the result is thus inconclusive. - As can be appreciated from the foregoing, embodiments of the invention are able to ascertain the identity of the target web page in a highly efficient manner. By filtering the available web pages based on likely website types and likely web page types and further in view of the phishing problem to be solved, the set of likely target web pages may be made smaller. Since each likely target web page is associated with its own scoring rules, much flexibility is afforded to entities who own those likely target web pages in deciding whether the suspect web page is sufficiently similar. If an erroneous similarity determination is made, changes to the scoring rules and/or the similarity threshold may be made, enabling the similarity determination process to become more accurate over time.
- While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents, which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and apparatuses of the present invention. Additionally, it is intended that the abstract section, having a limit to the number of words that can be provided, be furnished for convenience to the reader and not to be construed as limiting of the claims herein. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Claims (21)
1. A computer-implemented method for ascertaining which web page among a plurality of candidate web pages is similar to a given web page, comprising:
extracting a set of web page components from said given web page;
comparing said given web page against each of said plurality of candidate web pages in turn, said comparing results in a composite similarity score for said set of web page components, said composite similarity score being computed from scores assigned to individual ones of said set of web page components in accordance with a set of scoring rules associated with said web page that is under examination for similarity, wherein a web page component of said set of web page components is associated with a first score if said web page component also exists in said web page that is under examination for similarity, said web page component of said set of web page components is associated with second score different from said first web page component if said web page component does not exists in said web page that is under examination for similarity; and
if said composite similarity score exceeds a predefined threshold, designating said given web page similar to said web page that is under examination for similarity.
2. The method of claim 1 wherein said set of web page components includes at least a URL string.
3. The method of claim 1 wherein said set of web page components includes an image element.
4. The method of claim 1 wherein said web page component represents text.
5. The method of claim 4 wherein said web page component is tested for similarity using a regular expression.
6. The method of claim 1 wherein said web page component is visible.
7. The method of claim 1 wherein said web page component is invisible.
8. The method of claim 1 wherein said comparing is performed until a similar web page is found.
9. The method of claim 1 further comprising providing a warning indication if multiple web pages of said plurality of web pages are deemed similar to said given web page.
10. A computer-implemented method for designating a given web page similar or dissimilar with respect to a reference web page, comprising:
extracting a set of web page components from said given web page;
computing, using a set of scoring rules associated with said reference web page, a composite similarity score for said set of web page components, said composite similarity score being computed from scores assigned to individual ones of said set of web page components, wherein a web page component of said set of web page components is assigned first score if said web page component also exists in said reference web page, said web page component of said set of web page components is assigned second score different from said first web page component if said web page component does not exists in said reference web page;
if said composite similarity score exceeds a predefined threshold, designating said given web page similar to said reference web page.
11. The method of claim 10 wherein said set of web page components includes at least a URL string.
12. The method of claim 10 wherein said set of web page components includes an image element.
13. The method of claim 10 wherein said web page component represents text.
14. The method of claim 13 wherein said web page component is tested for similarity using a regular expression.
15. The method of claim 10 wherein said web page component is visible.
16. The method of claim 10 wherein said web page component is invisible.
17. An article of manufacture comprising a computer storage medium for storing thereon computer readable code for ascertaining which web page among a plurality of candidate web pages is similar to a given web page, comprising:
computer readable code for extracting a set of web page components from said given web page;
computer readable code for comparing said given web page against each of said plurality of candidate web pages in turn, said comparing results in a composite similarity score for said set of web page components, said composite similarity score being computed from scores assigned to individual ones of said set of web page components in accordance with a set of scoring rules associated with said web page that is under examination for similarity, wherein a web page component of said set of web page components is associated with a first score if said web page component also exists in said web page that is under examination for similarity, said web page component of said set of web page components is associated with second score different from said first web page component if said web page component does not exists in said web page that is under examination for similarity; and
computer readable code for designating, if said composite similarity score exceeds a predefined threshold, said given web page similar to said web page that is under examination for similarity.
18. The article of manufacture of claim 17 wherein said set of web page components includes at least a URL string.
19. The article of manufacture of claim 17 wherein said set of web page components includes an image element.
20. The article of manufacture of claim 17 wherein said web page component represents text.
21. The article of manufacture of claim 20 wherein said web page component is tested for similarity using a regular expression.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/617,654 US20080162449A1 (en) | 2006-12-28 | 2006-12-28 | Dynamic page similarity measurement |
US16/548,269 US11042630B2 (en) | 2006-12-28 | 2019-08-22 | Dynamic page similarity measurement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/617,654 US20080162449A1 (en) | 2006-12-28 | 2006-12-28 | Dynamic page similarity measurement |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/548,269 Continuation US11042630B2 (en) | 2006-12-28 | 2019-08-22 | Dynamic page similarity measurement |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080162449A1 true US20080162449A1 (en) | 2008-07-03 |
Family
ID=39585407
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/617,654 Abandoned US20080162449A1 (en) | 2006-12-28 | 2006-12-28 | Dynamic page similarity measurement |
US16/548,269 Active US11042630B2 (en) | 2006-12-28 | 2019-08-22 | Dynamic page similarity measurement |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/548,269 Active US11042630B2 (en) | 2006-12-28 | 2019-08-22 | Dynamic page similarity measurement |
Country Status (1)
Country | Link |
---|---|
US (2) | US20080162449A1 (en) |
Cited By (65)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080172741A1 (en) * | 2007-01-16 | 2008-07-17 | International Business Machines Corporation | Method and Apparatus for Detecting Computer Fraud |
US20080178302A1 (en) * | 2007-01-19 | 2008-07-24 | Attributor Corporation | Determination of originality of content |
US20100043071A1 (en) * | 2008-08-12 | 2010-02-18 | Yahoo! Inc. | System and method for combating phishing |
US20100095375A1 (en) * | 2008-10-14 | 2010-04-15 | Balachander Krishnamurthy | Method for locating fraudulent replicas of web sites |
US8161561B1 (en) * | 2004-10-05 | 2012-04-17 | Symantec Corporation | Confidential data protection through usage scoping |
US20120304291A1 (en) * | 2011-05-26 | 2012-11-29 | International Business Machines Corporation | Rotation of web site content to prevent e-mail spam/phishing attacks |
US20130204860A1 (en) * | 2012-02-03 | 2013-08-08 | TrueMaps LLC | Apparatus and Method for Comparing and Statistically Extracting Commonalities and Differences Between Different Websites |
WO2013134350A1 (en) * | 2012-03-09 | 2013-09-12 | Digilant, Inc. | Look-alike website scoring |
US8561185B1 (en) * | 2011-05-17 | 2013-10-15 | Google Inc. | Personally identifiable information detection |
US20140068016A1 (en) * | 2012-08-28 | 2014-03-06 | Greg Howett | System and Method for Web Application Acceleration |
US20150073944A1 (en) * | 2008-06-05 | 2015-03-12 | Craze, Inc. | Method and system for classification of venue by analyzing data from venue website |
US8990933B1 (en) * | 2012-07-24 | 2015-03-24 | Intuit Inc. | Securing networks against spear phishing attacks |
US9065850B1 (en) * | 2011-02-07 | 2015-06-23 | Zscaler, Inc. | Phishing detection systems and methods |
US20150205808A1 (en) * | 2014-01-22 | 2015-07-23 | International Business Machines Corporation | Storing information to manipulate focus for a webpage |
EP3065367A1 (en) * | 2015-03-05 | 2016-09-07 | AO Kaspersky Lab | System and method for automated phishing detection rule evolution |
US20160307201A1 (en) * | 2010-11-29 | 2016-10-20 | Biocatch Ltd. | Contextual mapping of web-pages, and generation of fraud-relatedness score-values |
US9842200B1 (en) | 2006-08-29 | 2017-12-12 | Attributor Corporation | Content monitoring and host compliance evaluation |
US20180067833A1 (en) * | 2011-05-16 | 2018-03-08 | Intuit Inc. | System and method for automated web site information retrieval scripting using untraine |
US10007723B2 (en) | 2005-12-23 | 2018-06-26 | Digimarc Corporation | Methods for identifying audio or video content |
US10032010B2 (en) | 2010-11-29 | 2018-07-24 | Biocatch Ltd. | System, device, and method of visual login and stochastic cryptography |
US10037421B2 (en) | 2010-11-29 | 2018-07-31 | Biocatch Ltd. | Device, system, and method of three-dimensional spatial user authentication |
US10049209B2 (en) | 2010-11-29 | 2018-08-14 | Biocatch Ltd. | Device, method, and system of differentiating between virtual machine and non-virtualized device |
US10055560B2 (en) | 2010-11-29 | 2018-08-21 | Biocatch Ltd. | Device, method, and system of detecting multiple users accessing the same account |
US10069852B2 (en) | 2010-11-29 | 2018-09-04 | Biocatch Ltd. | Detection of computerized bots and automated cyber-attack modules |
US10069837B2 (en) | 2015-07-09 | 2018-09-04 | Biocatch Ltd. | Detection of proxy server |
US10083439B2 (en) | 2010-11-29 | 2018-09-25 | Biocatch Ltd. | Device, system, and method of differentiating over multiple accounts between legitimate user and cyber-attacker |
US10097580B2 (en) | 2016-04-12 | 2018-10-09 | Microsoft Technology Licensing, Llc | Using web search engines to correct domain names used for social engineering |
US10108525B2 (en) | 2013-06-14 | 2018-10-23 | International Business Machines Corporation | Optimizing automated interactions with web applications |
US10164985B2 (en) | 2010-11-29 | 2018-12-25 | Biocatch Ltd. | Device, system, and method of recovery and resetting of user authentication factor |
US10198122B2 (en) | 2016-09-30 | 2019-02-05 | Biocatch Ltd. | System, device, and method of estimating force applied to a touch surface |
US10242415B2 (en) | 2006-12-20 | 2019-03-26 | Digimarc Corporation | Method and system for determining content treatment |
US10262324B2 (en) | 2010-11-29 | 2019-04-16 | Biocatch Ltd. | System, device, and method of differentiating among users based on user-specific page navigation sequence |
US10298614B2 (en) * | 2010-11-29 | 2019-05-21 | Biocatch Ltd. | System, device, and method of generating and managing behavioral biometric cookies |
US10397262B2 (en) | 2017-07-20 | 2019-08-27 | Biocatch Ltd. | Device, system, and method of detecting overlay malware |
US10395018B2 (en) | 2010-11-29 | 2019-08-27 | Biocatch Ltd. | System, method, and device of detecting identity of a user and authenticating a user |
US10404729B2 (en) | 2010-11-29 | 2019-09-03 | Biocatch Ltd. | Device, method, and system of generating fraud-alerts for cyber-attacks |
US10474815B2 (en) | 2010-11-29 | 2019-11-12 | Biocatch Ltd. | System, device, and method of detecting malicious automatic script and code injection |
US10476873B2 (en) | 2010-11-29 | 2019-11-12 | Biocatch Ltd. | Device, system, and method of password-less user authentication and password-less detection of user identity |
US10497006B2 (en) * | 2014-10-08 | 2019-12-03 | Facebook, Inc. | Systems and methods for processing potentially misidentified illegitimate incidents |
EP3465455A4 (en) * | 2016-05-23 | 2020-01-01 | Greathorn, Inc. | COMPUTER-IMPLEMENTED METHODS AND SYSTEMS FOR lDENTIFYING VISUALLY SIMILAR TEXT CHARACTER STRINGS |
US20200042696A1 (en) * | 2006-12-28 | 2020-02-06 | Trend Micro Incorporated | Dynamic page similarity measurement |
US10579784B2 (en) | 2016-11-02 | 2020-03-03 | Biocatch Ltd. | System, device, and method of secure utilization of fingerprints for user authentication |
US10586036B2 (en) | 2010-11-29 | 2020-03-10 | Biocatch Ltd. | System, device, and method of recovery and resetting of user authentication factor |
WO2020086773A1 (en) | 2018-10-23 | 2020-04-30 | Cser Tamas | Software test case maintenance |
US10685355B2 (en) | 2016-12-04 | 2020-06-16 | Biocatch Ltd. | Method, device, and system of detecting mule accounts and accounts used for money laundering |
US10719765B2 (en) | 2015-06-25 | 2020-07-21 | Biocatch Ltd. | Conditional behavioral biometrics |
US10728761B2 (en) | 2010-11-29 | 2020-07-28 | Biocatch Ltd. | Method, device, and system of detecting a lie of a user who inputs data |
US10735381B2 (en) | 2006-08-29 | 2020-08-04 | Attributor Corporation | Customized handling of copied content based on owner-specified similarity thresholds |
US10747305B2 (en) | 2010-11-29 | 2020-08-18 | Biocatch Ltd. | Method, system, and device of authenticating identity of a user of an electronic device |
US10776476B2 (en) | 2010-11-29 | 2020-09-15 | Biocatch Ltd. | System, device, and method of visual login |
US10805346B2 (en) * | 2017-10-01 | 2020-10-13 | Fireeye, Inc. | Phishing attack detection |
US10834590B2 (en) | 2010-11-29 | 2020-11-10 | Biocatch Ltd. | Method, device, and system of differentiating between a cyber-attacker and a legitimate user |
US10897482B2 (en) | 2010-11-29 | 2021-01-19 | Biocatch Ltd. | Method, device, and system of back-coloring, forward-coloring, and fraud detection |
US10917431B2 (en) | 2010-11-29 | 2021-02-09 | Biocatch Ltd. | System, method, and device of authenticating a user based on selfie image or selfie video |
US10949757B2 (en) | 2010-11-29 | 2021-03-16 | Biocatch Ltd. | System, device, and method of detecting user identity based on motor-control loop model |
US10949514B2 (en) | 2010-11-29 | 2021-03-16 | Biocatch Ltd. | Device, system, and method of differentiating among users based on detection of hardware components |
US10970394B2 (en) | 2017-11-21 | 2021-04-06 | Biocatch Ltd. | System, device, and method of detecting vishing attacks |
US11055395B2 (en) | 2016-07-08 | 2021-07-06 | Biocatch Ltd. | Step-up authentication |
US20210329030A1 (en) * | 2010-11-29 | 2021-10-21 | Biocatch Ltd. | Device, System, and Method of Detecting Vishing Attacks |
US11210674B2 (en) | 2010-11-29 | 2021-12-28 | Biocatch Ltd. | Method, device, and system of detecting mule accounts and accounts used for money laundering |
US11223619B2 (en) | 2010-11-29 | 2022-01-11 | Biocatch Ltd. | Device, system, and method of user authentication based on user-specific characteristics of task performance |
US11269977B2 (en) | 2010-11-29 | 2022-03-08 | Biocatch Ltd. | System, apparatus, and method of collecting and processing data in electronic devices |
US11606353B2 (en) | 2021-07-22 | 2023-03-14 | Biocatch Ltd. | System, device, and method of generating and utilizing one-time passwords |
US20230188563A1 (en) * | 2021-12-09 | 2023-06-15 | Blackberry Limited | Identifying a phishing attempt |
US20240080339A1 (en) * | 2010-11-29 | 2024-03-07 | Biocatch Ltd. | Device, System, and Method of Detecting Vishing Attacks |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5477451A (en) * | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5857212A (en) * | 1995-07-06 | 1999-01-05 | Sun Microsystems, Inc. | System and method for horizontal alignment of tokens in a structural representation program editor |
US6138129A (en) * | 1997-12-16 | 2000-10-24 | World One Telecom, Ltd. | Method and apparatus for providing automated searching and linking of electronic documents |
US6266664B1 (en) * | 1997-10-01 | 2001-07-24 | Rulespace, Inc. | Method for scanning, analyzing and rating digital information content |
US20030088643A1 (en) * | 2001-06-04 | 2003-05-08 | Shupps Eric A. | Method and computer system for isolating and interrelating components of an application |
US20040158799A1 (en) * | 2003-02-07 | 2004-08-12 | Breuel Thomas M. | Information extraction from html documents by structural matching |
US20040163043A1 (en) * | 2003-02-10 | 2004-08-19 | Kaidara S.A. | System method and computer program product for obtaining structured data from text |
US20050108630A1 (en) * | 2003-11-19 | 2005-05-19 | Wasson Mark D. | Extraction of facts from text |
US7051368B1 (en) * | 1999-11-09 | 2006-05-23 | Microsoft Corporation | Methods and systems for screening input strings intended for use by web servers |
US20080010683A1 (en) * | 2006-07-10 | 2008-01-10 | Baddour Victor L | System and method for analyzing web content |
US20080046738A1 (en) * | 2006-08-04 | 2008-02-21 | Yahoo! Inc. | Anti-phishing agent |
US20080115214A1 (en) * | 2006-11-09 | 2008-05-15 | Rowley Peter A | Web page protection against phishing |
US20080133540A1 (en) * | 2006-12-01 | 2008-06-05 | Websense, Inc. | System and method of analyzing web addresses |
US7457823B2 (en) * | 2004-05-02 | 2008-11-25 | Markmonitor Inc. | Methods and systems for analyzing data related to possible online fraud |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1586054A4 (en) * | 2002-12-13 | 2010-12-08 | Symantec Corp | Method, system, and computer program product for security within a global computer network |
US20070107053A1 (en) * | 2004-05-02 | 2007-05-10 | Markmonitor, Inc. | Enhanced responses to online fraud |
US9203648B2 (en) * | 2004-05-02 | 2015-12-01 | Thomson Reuters Global Resources | Online fraud solution |
US7913302B2 (en) * | 2004-05-02 | 2011-03-22 | Markmonitor, Inc. | Advanced responses to online fraud |
US8041769B2 (en) * | 2004-05-02 | 2011-10-18 | Markmonitor Inc. | Generating phish messages |
US7992204B2 (en) * | 2004-05-02 | 2011-08-02 | Markmonitor, Inc. | Enhanced responses to online fraud |
US20060080735A1 (en) * | 2004-09-30 | 2006-04-13 | Usa Revco, Llc | Methods and systems for phishing detection and notification |
US7630987B1 (en) * | 2004-11-24 | 2009-12-08 | Bank Of America Corporation | System and method for detecting phishers by analyzing website referrals |
US20060123478A1 (en) * | 2004-12-02 | 2006-06-08 | Microsoft Corporation | Phishing detection, prevention, and notification |
JP2006221242A (en) * | 2005-02-08 | 2006-08-24 | Fujitsu Ltd | Authentication information fraud prevention system, program, and method |
US20060253584A1 (en) * | 2005-05-03 | 2006-11-09 | Dixon Christopher J | Reputation of an entity associated with a content item |
US7590707B2 (en) * | 2006-08-07 | 2009-09-15 | Webroot Software, Inc. | Method and system for identifying network addresses associated with suspect network destinations |
US8578481B2 (en) * | 2006-10-16 | 2013-11-05 | Red Hat, Inc. | Method and system for determining a probability of entry of a counterfeit domain in a browser |
US20080163369A1 (en) * | 2006-12-28 | 2008-07-03 | Ming-Tai Allen Chang | Dynamic phishing detection methods and apparatus |
US20080162449A1 (en) * | 2006-12-28 | 2008-07-03 | Chen Chao-Yu | Dynamic page similarity measurement |
US9521161B2 (en) * | 2007-01-16 | 2016-12-13 | International Business Machines Corporation | Method and apparatus for detecting computer fraud |
US8205255B2 (en) * | 2007-05-14 | 2012-06-19 | Cisco Technology, Inc. | Anti-content spoofing (ACS) |
US9148445B2 (en) * | 2008-05-07 | 2015-09-29 | Cyveillance Inc. | Method and system for misuse detection |
US8850570B1 (en) * | 2008-06-30 | 2014-09-30 | Symantec Corporation | Filter-based identification of malicious websites |
US8448245B2 (en) * | 2009-01-17 | 2013-05-21 | Stopthehacker.com, Jaal LLC | Automated identification of phishing, phony and malicious web sites |
US8943588B1 (en) * | 2012-09-20 | 2015-01-27 | Amazon Technologies, Inc. | Detecting unauthorized websites |
US9621566B2 (en) * | 2013-05-31 | 2017-04-11 | Adi Labs Incorporated | System and method for detecting phishing webpages |
-
2006
- 2006-12-28 US US11/617,654 patent/US20080162449A1/en not_active Abandoned
-
2019
- 2019-08-22 US US16/548,269 patent/US11042630B2/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5477451A (en) * | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5857212A (en) * | 1995-07-06 | 1999-01-05 | Sun Microsystems, Inc. | System and method for horizontal alignment of tokens in a structural representation program editor |
US6266664B1 (en) * | 1997-10-01 | 2001-07-24 | Rulespace, Inc. | Method for scanning, analyzing and rating digital information content |
US6138129A (en) * | 1997-12-16 | 2000-10-24 | World One Telecom, Ltd. | Method and apparatus for providing automated searching and linking of electronic documents |
US7051368B1 (en) * | 1999-11-09 | 2006-05-23 | Microsoft Corporation | Methods and systems for screening input strings intended for use by web servers |
US20030088643A1 (en) * | 2001-06-04 | 2003-05-08 | Shupps Eric A. | Method and computer system for isolating and interrelating components of an application |
US20040158799A1 (en) * | 2003-02-07 | 2004-08-12 | Breuel Thomas M. | Information extraction from html documents by structural matching |
US20040163043A1 (en) * | 2003-02-10 | 2004-08-19 | Kaidara S.A. | System method and computer program product for obtaining structured data from text |
US20050108630A1 (en) * | 2003-11-19 | 2005-05-19 | Wasson Mark D. | Extraction of facts from text |
US7457823B2 (en) * | 2004-05-02 | 2008-11-25 | Markmonitor Inc. | Methods and systems for analyzing data related to possible online fraud |
US20080010683A1 (en) * | 2006-07-10 | 2008-01-10 | Baddour Victor L | System and method for analyzing web content |
US20080046738A1 (en) * | 2006-08-04 | 2008-02-21 | Yahoo! Inc. | Anti-phishing agent |
US20080115214A1 (en) * | 2006-11-09 | 2008-05-15 | Rowley Peter A | Web page protection against phishing |
US20080133540A1 (en) * | 2006-12-01 | 2008-06-05 | Websense, Inc. | System and method of analyzing web addresses |
Cited By (91)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8161561B1 (en) * | 2004-10-05 | 2012-04-17 | Symantec Corporation | Confidential data protection through usage scoping |
US10007723B2 (en) | 2005-12-23 | 2018-06-26 | Digimarc Corporation | Methods for identifying audio or video content |
US10735381B2 (en) | 2006-08-29 | 2020-08-04 | Attributor Corporation | Customized handling of copied content based on owner-specified similarity thresholds |
US9842200B1 (en) | 2006-08-29 | 2017-12-12 | Attributor Corporation | Content monitoring and host compliance evaluation |
US9436810B2 (en) | 2006-08-29 | 2016-09-06 | Attributor Corporation | Determination of copied content, including attribution |
US10242415B2 (en) | 2006-12-20 | 2019-03-26 | Digimarc Corporation | Method and system for determining content treatment |
US20200042696A1 (en) * | 2006-12-28 | 2020-02-06 | Trend Micro Incorporated | Dynamic page similarity measurement |
US11042630B2 (en) * | 2006-12-28 | 2021-06-22 | Trend Micro Incorporated | Dynamic page similarity measurement |
US20080172741A1 (en) * | 2007-01-16 | 2008-07-17 | International Business Machines Corporation | Method and Apparatus for Detecting Computer Fraud |
US9521161B2 (en) * | 2007-01-16 | 2016-12-13 | International Business Machines Corporation | Method and apparatus for detecting computer fraud |
US8707459B2 (en) * | 2007-01-19 | 2014-04-22 | Digimarc Corporation | Determination of originality of content |
US20080178302A1 (en) * | 2007-01-19 | 2008-07-24 | Attributor Corporation | Determination of originality of content |
US20150073944A1 (en) * | 2008-06-05 | 2015-03-12 | Craze, Inc. | Method and system for classification of venue by analyzing data from venue website |
US8528079B2 (en) * | 2008-08-12 | 2013-09-03 | Yahoo! Inc. | System and method for combating phishing |
US20100043071A1 (en) * | 2008-08-12 | 2010-02-18 | Yahoo! Inc. | System and method for combating phishing |
US20100095375A1 (en) * | 2008-10-14 | 2010-04-15 | Balachander Krishnamurthy | Method for locating fraudulent replicas of web sites |
US8701185B2 (en) * | 2008-10-14 | 2014-04-15 | At&T Intellectual Property I, L.P. | Method for locating fraudulent replicas of web sites |
US10395018B2 (en) | 2010-11-29 | 2019-08-27 | Biocatch Ltd. | System, method, and device of detecting identity of a user and authenticating a user |
US11269977B2 (en) | 2010-11-29 | 2022-03-08 | Biocatch Ltd. | System, apparatus, and method of collecting and processing data in electronic devices |
US12101354B2 (en) * | 2010-11-29 | 2024-09-24 | Biocatch Ltd. | Device, system, and method of detecting vishing attacks |
US20240080339A1 (en) * | 2010-11-29 | 2024-03-07 | Biocatch Ltd. | Device, System, and Method of Detecting Vishing Attacks |
US20160307201A1 (en) * | 2010-11-29 | 2016-10-20 | Biocatch Ltd. | Contextual mapping of web-pages, and generation of fraud-relatedness score-values |
US10897482B2 (en) | 2010-11-29 | 2021-01-19 | Biocatch Ltd. | Method, device, and system of back-coloring, forward-coloring, and fraud detection |
US11838118B2 (en) * | 2010-11-29 | 2023-12-05 | Biocatch Ltd. | Device, system, and method of detecting vishing attacks |
US11580553B2 (en) | 2010-11-29 | 2023-02-14 | Biocatch Ltd. | Method, device, and system of detecting mule accounts and accounts used for money laundering |
US10776476B2 (en) | 2010-11-29 | 2020-09-15 | Biocatch Ltd. | System, device, and method of visual login |
US10747305B2 (en) | 2010-11-29 | 2020-08-18 | Biocatch Ltd. | Method, system, and device of authenticating identity of a user of an electronic device |
US10728761B2 (en) | 2010-11-29 | 2020-07-28 | Biocatch Ltd. | Method, device, and system of detecting a lie of a user who inputs data |
US10917431B2 (en) | 2010-11-29 | 2021-02-09 | Biocatch Ltd. | System, method, and device of authenticating a user based on selfie image or selfie video |
US10032010B2 (en) | 2010-11-29 | 2018-07-24 | Biocatch Ltd. | System, device, and method of visual login and stochastic cryptography |
US10037421B2 (en) | 2010-11-29 | 2018-07-31 | Biocatch Ltd. | Device, system, and method of three-dimensional spatial user authentication |
US10049209B2 (en) | 2010-11-29 | 2018-08-14 | Biocatch Ltd. | Device, method, and system of differentiating between virtual machine and non-virtualized device |
US10055560B2 (en) | 2010-11-29 | 2018-08-21 | Biocatch Ltd. | Device, method, and system of detecting multiple users accessing the same account |
US10069852B2 (en) | 2010-11-29 | 2018-09-04 | Biocatch Ltd. | Detection of computerized bots and automated cyber-attack modules |
US11425563B2 (en) | 2010-11-29 | 2022-08-23 | Biocatch Ltd. | Method, device, and system of differentiating between a cyber-attacker and a legitimate user |
US10083439B2 (en) | 2010-11-29 | 2018-09-25 | Biocatch Ltd. | Device, system, and method of differentiating over multiple accounts between legitimate user and cyber-attacker |
US11330012B2 (en) * | 2010-11-29 | 2022-05-10 | Biocatch Ltd. | System, method, and device of authenticating a user based on selfie image or selfie video |
US10949757B2 (en) | 2010-11-29 | 2021-03-16 | Biocatch Ltd. | System, device, and method of detecting user identity based on motor-control loop model |
US11314849B2 (en) | 2010-11-29 | 2022-04-26 | Biocatch Ltd. | Method, device, and system of detecting a lie of a user who inputs data |
US10164985B2 (en) | 2010-11-29 | 2018-12-25 | Biocatch Ltd. | Device, system, and method of recovery and resetting of user authentication factor |
US10834590B2 (en) | 2010-11-29 | 2020-11-10 | Biocatch Ltd. | Method, device, and system of differentiating between a cyber-attacker and a legitimate user |
US10949514B2 (en) | 2010-11-29 | 2021-03-16 | Biocatch Ltd. | Device, system, and method of differentiating among users based on detection of hardware components |
US10262324B2 (en) | 2010-11-29 | 2019-04-16 | Biocatch Ltd. | System, device, and method of differentiating among users based on user-specific page navigation sequence |
US10298614B2 (en) * | 2010-11-29 | 2019-05-21 | Biocatch Ltd. | System, device, and method of generating and managing behavioral biometric cookies |
US11250435B2 (en) * | 2010-11-29 | 2022-02-15 | Biocatch Ltd. | Contextual mapping of web-pages, and generation of fraud-relatedness score-values |
US10621585B2 (en) * | 2010-11-29 | 2020-04-14 | Biocatch Ltd. | Contextual mapping of web-pages, and generation of fraud-relatedness score-values |
US10404729B2 (en) | 2010-11-29 | 2019-09-03 | Biocatch Ltd. | Device, method, and system of generating fraud-alerts for cyber-attacks |
US10474815B2 (en) | 2010-11-29 | 2019-11-12 | Biocatch Ltd. | System, device, and method of detecting malicious automatic script and code injection |
US10476873B2 (en) | 2010-11-29 | 2019-11-12 | Biocatch Ltd. | Device, system, and method of password-less user authentication and password-less detection of user identity |
US11223619B2 (en) | 2010-11-29 | 2022-01-11 | Biocatch Ltd. | Device, system, and method of user authentication based on user-specific characteristics of task performance |
US11210674B2 (en) | 2010-11-29 | 2021-12-28 | Biocatch Ltd. | Method, device, and system of detecting mule accounts and accounts used for money laundering |
US20210329030A1 (en) * | 2010-11-29 | 2021-10-21 | Biocatch Ltd. | Device, System, and Method of Detecting Vishing Attacks |
US10586036B2 (en) | 2010-11-29 | 2020-03-10 | Biocatch Ltd. | System, device, and method of recovery and resetting of user authentication factor |
US9065850B1 (en) * | 2011-02-07 | 2015-06-23 | Zscaler, Inc. | Phishing detection systems and methods |
US9996441B2 (en) * | 2011-05-16 | 2018-06-12 | Intuit Inc. | System and method for building a script for a web page using an existing script from a similar web page |
US20180067833A1 (en) * | 2011-05-16 | 2018-03-08 | Intuit Inc. | System and method for automated web site information retrieval scripting using untraine |
US8561185B1 (en) * | 2011-05-17 | 2013-10-15 | Google Inc. | Personally identifiable information detection |
US9015802B1 (en) * | 2011-05-17 | 2015-04-21 | Google Inc. | Personally identifiable information detection |
US20120304291A1 (en) * | 2011-05-26 | 2012-11-29 | International Business Machines Corporation | Rotation of web site content to prevent e-mail spam/phishing attacks |
US9148444B2 (en) * | 2011-05-26 | 2015-09-29 | International Business Machines Corporation | Rotation of web site content to prevent e-mail spam/phishing attacks |
US20130204860A1 (en) * | 2012-02-03 | 2013-08-08 | TrueMaps LLC | Apparatus and Method for Comparing and Statistically Extracting Commonalities and Differences Between Different Websites |
WO2013134350A1 (en) * | 2012-03-09 | 2013-09-12 | Digilant, Inc. | Look-alike website scoring |
US8990933B1 (en) * | 2012-07-24 | 2015-03-24 | Intuit Inc. | Securing networks against spear phishing attacks |
US20140068016A1 (en) * | 2012-08-28 | 2014-03-06 | Greg Howett | System and Method for Web Application Acceleration |
US10108525B2 (en) | 2013-06-14 | 2018-10-23 | International Business Machines Corporation | Optimizing automated interactions with web applications |
US10929265B2 (en) | 2013-06-14 | 2021-02-23 | International Business Machines Corporation | Optimizing automated interactions with web applications |
US10127132B2 (en) | 2013-06-14 | 2018-11-13 | International Business Machines Corporation | Optimizing automated interactions with web applications |
US20150205808A1 (en) * | 2014-01-22 | 2015-07-23 | International Business Machines Corporation | Storing information to manipulate focus for a webpage |
US9680910B2 (en) * | 2014-01-22 | 2017-06-13 | International Business Machines Corporation | Storing information to manipulate focus for a webpage |
US10497006B2 (en) * | 2014-10-08 | 2019-12-03 | Facebook, Inc. | Systems and methods for processing potentially misidentified illegitimate incidents |
US9621570B2 (en) | 2015-03-05 | 2017-04-11 | AO Kaspersky Lab | System and method for selectively evolving phishing detection rules |
EP3065367A1 (en) * | 2015-03-05 | 2016-09-07 | AO Kaspersky Lab | System and method for automated phishing detection rule evolution |
US10719765B2 (en) | 2015-06-25 | 2020-07-21 | Biocatch Ltd. | Conditional behavioral biometrics |
US11238349B2 (en) | 2015-06-25 | 2022-02-01 | Biocatch Ltd. | Conditional behavioural biometrics |
US10834090B2 (en) | 2015-07-09 | 2020-11-10 | Biocatch Ltd. | System, device, and method for detection of proxy server |
US10523680B2 (en) * | 2015-07-09 | 2019-12-31 | Biocatch Ltd. | System, device, and method for detecting a proxy server |
US10069837B2 (en) | 2015-07-09 | 2018-09-04 | Biocatch Ltd. | Detection of proxy server |
US11323451B2 (en) | 2015-07-09 | 2022-05-03 | Biocatch Ltd. | System, device, and method for detection of proxy server |
US10097580B2 (en) | 2016-04-12 | 2018-10-09 | Microsoft Technology Licensing, Llc | Using web search engines to correct domain names used for social engineering |
EP3465455A4 (en) * | 2016-05-23 | 2020-01-01 | Greathorn, Inc. | COMPUTER-IMPLEMENTED METHODS AND SYSTEMS FOR lDENTIFYING VISUALLY SIMILAR TEXT CHARACTER STRINGS |
US11055395B2 (en) | 2016-07-08 | 2021-07-06 | Biocatch Ltd. | Step-up authentication |
US10198122B2 (en) | 2016-09-30 | 2019-02-05 | Biocatch Ltd. | System, device, and method of estimating force applied to a touch surface |
US10579784B2 (en) | 2016-11-02 | 2020-03-03 | Biocatch Ltd. | System, device, and method of secure utilization of fingerprints for user authentication |
US10685355B2 (en) | 2016-12-04 | 2020-06-16 | Biocatch Ltd. | Method, device, and system of detecting mule accounts and accounts used for money laundering |
US10397262B2 (en) | 2017-07-20 | 2019-08-27 | Biocatch Ltd. | Device, system, and method of detecting overlay malware |
US10805346B2 (en) * | 2017-10-01 | 2020-10-13 | Fireeye, Inc. | Phishing attack detection |
US10970394B2 (en) | 2017-11-21 | 2021-04-06 | Biocatch Ltd. | System, device, and method of detecting vishing attacks |
EP3871094A4 (en) * | 2018-10-23 | 2022-07-27 | Functionize, Inc. | Software test case maintenance |
WO2020086773A1 (en) | 2018-10-23 | 2020-04-30 | Cser Tamas | Software test case maintenance |
US11606353B2 (en) | 2021-07-22 | 2023-03-14 | Biocatch Ltd. | System, device, and method of generating and utilizing one-time passwords |
US20230188563A1 (en) * | 2021-12-09 | 2023-06-15 | Blackberry Limited | Identifying a phishing attempt |
Also Published As
Publication number | Publication date |
---|---|
US11042630B2 (en) | 2021-06-22 |
US20200042696A1 (en) | 2020-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11042630B2 (en) | Dynamic page similarity measurement | |
US10951636B2 (en) | Dynamic phishing detection methods and apparatus | |
Tan et al. | PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder | |
Jain et al. | Phishing detection: analysis of visual similarity based approaches | |
US7802298B1 (en) | Methods and apparatus for protecting computers against phishing attacks | |
RU2607229C2 (en) | Systems and methods of dynamic indicators aggregation to detect network fraud | |
US9621566B2 (en) | System and method for detecting phishing webpages | |
CN104217160B (en) | A kind of Chinese detection method for phishing site and system | |
US8191148B2 (en) | Classifying a message based on fraud indicators | |
US8806622B2 (en) | Fraudulent page detection | |
US20090089859A1 (en) | Method and apparatus for detecting phishing attempts solicited by electronic mail | |
US20220030029A1 (en) | Phishing Protection Methods and Systems | |
US20140115704A1 (en) | Homoglyph monitoring | |
US10341382B2 (en) | System and method for filtering electronic messages | |
Abbasi et al. | A comparison of tools for detecting fake websites | |
CN116366338B (en) | Risk website identification method and device, computer equipment and storage medium | |
JP4781922B2 (en) | Link information verification method, system, apparatus, and program | |
Deepa | Phishing website detection using novel features and machine learning approach | |
Nivedha et al. | Improving phishing URL detection using fuzzy association mining | |
KR20070067651A (en) | Method on prevention of phishing through analysis of the internet site pattern | |
JP2007133488A (en) | Information transmission source verification method and device | |
JP2007233904A (en) | Forged site detection method and computer program | |
Sharathkumar et al. | Phishing site detection using machine learning | |
Oko et al. | DEVELOPMENT OF PHISHING SITE DETECTION PLUGIN TO SAFEGUARD ONLINE TRANSCATION SERVICES | |
US11323476B1 (en) | Prevention of credential phishing based upon login behavior analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TREND MICRO INCORPORATED, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHAO-YU, CHEN;PENG-SHIH, PU;YU-FANG, TSAI;REEL/FRAME:019115/0008 Effective date: 20061228 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |