US10505979B2 - Detection and warning of imposter web sites - Google Patents
Detection and warning of imposter web sites Download PDFInfo
- Publication number
- US10505979B2 US10505979B2 US15/153,975 US201615153975A US10505979B2 US 10505979 B2 US10505979 B2 US 10505979B2 US 201615153975 A US201615153975 A US 201615153975A US 10505979 B2 US10505979 B2 US 10505979B2
- Authority
- US
- United States
- Prior art keywords
- web page
- data
- visited
- determining
- legitimate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 238000001514 detection method Methods 0.000 title 1
- 238000000034 method Methods 0.000 claims abstract description 44
- 238000003860 storage Methods 0.000 claims description 30
- 238000004590 computer program Methods 0.000 claims description 24
- 238000000547 structure data Methods 0.000 claims 31
- 238000004891 communication Methods 0.000 description 39
- 238000010586 diagram Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 9
- 238000012015 optical character recognition Methods 0.000 description 9
- 230000003278 mimic effect Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1483—Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/44—Program or device authentication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2111—Location-sensitive, e.g. geographical location, GPS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2119—Authenticating web pages, e.g. with suspicious links
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
Definitions
- the present disclosure relates in general to evaluating the source of an actual or requested electronic communication through a network. More specifically, the present disclosure relates to methodologies, systems and computer program products for efficiently and effectively detecting and/or identifying a network communication that is from an imposter source.
- phishing refers to the nefarious practice of using electronic communications to entice individuals to voluntarily disclose confidential information such as usernames, passwords and credit card details to a requester that, unknown to the individual, is masquerading as a trustworthy entity.
- phishing communications purport to be from legitimate sources such as popular social web sites, auction sites, banks, online payment processors or information technology (IT) administrators.
- Phishing communications can take a variety of forms. For example, phishing communications are often carried out by email spoofing or instant messaging in which email messages are created with a forged sender address and broadcast to a wide number of valid email addresses. The forged email may direct users to enter details at an imposter or fake website whose look and feel are almost identical to the legitimate web site. It would be beneficial to provide the capability of detecting and/or identifying a network communication that is from an imposter source.
- Embodiments are directed to a computer-implemented method of identifying an imposter web page.
- the method includes extracting, using a processor system, visited web page data from a visited web page.
- the method further includes determining, using the processor system, that the visited web page is an imposter web page, based at least in part on determining, using the processor system, that website location data of the visited web page does not match website location data of at least one legitimate web page, as well as determining that text data associated with image data of the visited web page matches text data associated with image data of the at least one legitimate web page.
- Embodiments are further directed to a system for identifying an imposter web page.
- the system includes a memory and a processor system communicatively coupled to the memory.
- the processor system is configured to perform a method that includes extracting visited web page data from a visited web page.
- the method further includes determining that the visited web page is an imposter web page, based at least in part on determining that website location data of the visited web page does not match web site location data of at least one legitimate web page, as well as determining that text data associated with image data of the visited web page matches text data associated with image data of the at least one legitimate web page.
- Embodiments are directed to a computer program product for identifying an imposter web page.
- the computer program product includes a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se.
- the program instructions are readable by a processor system to cause the processor system to perform a method that includes extracting visited web page data from a visited web page.
- the method further includes determining that the visited web page is an imposter web page, based at least in part on determining that website location data of the visited web page does not match web site location data of at least one legitimate web page, as well as determining that text data associated with image data of the visited web page matches text data associated with image data of the at least one legitimate web page.
- FIG. 1 depicts a block diagram illustrating an example of a network communications system in accordance with one or more embodiments
- FIG. 2 depicts a flow diagram illustrating a methodology in accordance with one or more embodiments
- FIG. 3 depicts an example lookup table in accordance with one or more embodiments
- FIG. 4 depicts a block diagram of a computer system in accordance with one or more embodiments.
- FIG. 5 depicts a computer program product in accordance with one or more embodiments.
- compositions comprising, “comprising,” “includes,” “including,” “has,” “having,” “contains” or “containing,” or any other variation thereof, are intended to cover a non-exclusive inclusion.
- a composition, a mixture, process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but can include other elements not expressly listed or inherent to such composition, mixture, process, method, article, or apparatus.
- exemplary is used herein to mean “serving as an example, instance or illustration.” Any embodiment or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or designs.
- the terms “at least one” and “one or more” may be understood to include any integer number greater than or equal to one, i.e. one, two, three, four, etc.
- the terms “a plurality” may be understood to include any integer number greater than or equal to two, i.e. two, three, four, five, etc.
- connection may include both an indirect “connection” and a direct “connection.”
- phishing refers to the nefarious practice of using electronic communications to entice individuals to voluntarily disclose confidential information such as usernames, passwords and credit card details to a requester that, unknown to the individual, is masquerading as a trustworthy entity.
- phishing communications purport to be from legitimate sources such as popular social web sites, auction sites, banks, online payment processors or information technology (IT) administrators.
- Phishing communications can take a variety of forms. For example, phishing communications are often carried out by email spoofing or instant messaging in which email messages are created with a forged sender address and broadcast to a wide number of valid email addresses. The forged email may direct users to enter details at an imposter or fake website whose look and feel are almost identical to the legitimate web site.
- phishing is used in the present disclosure to refer broadly to the multiple variations of phishing, including but not limited to threats such as “pharming,” “vishing” and “spear phishing.”
- Pharming is a hacker's attack aiming to redirect a website's traffic to another (bogus) website. Pharming can be conducted either by changing the hosts file on a victim's computer or by exploitation of a vulnerability in DNS server software.
- Vishing is the practice of leveraging voice over internet protocol (VoIP) technology to trick private personal and financial information from the public for the purpose of financial reward. Spear phishing describes any highly targeted phishing attack.
- VoIP voice over internet protocol
- Spear phishers send e-mail that appears genuine to all the employees or members within a certain company, government agency, organization, or group. Whereas traditional phishing scams are designed to steal information from individuals, spear phishing scams work to gain access to a company's entire computer system.
- phishing and its variations attempt to socially engineer a message recipient into providing personal or confidential information that can later be used by the phisher to conduct fraudulent activities.
- the methods used by the phisher to coax the information from his/her victims are as many as they are varied, but one of the most commonly encountered vectors is through an e-mail that appears to be from the victim's financial institution.
- the e-mail purports a problem with the victim's account, which requires the victim to follow an embedded URL that takes the victim to a fraudulent Web site. Once at the site, the victim is prompted to “login” thereby enabling the phisher to steal a copy of the victim's authentication credentials.
- the present disclosure and exemplary embodiments described herein provide methodologies, systems and computer program products for evaluating the source of an actual or requested electronic communication through a network. More specifically, the present disclosure and exemplary embodiments described herein provide methodologies, systems and computer program products for efficiently and effectively detecting and/or identifying a network communication that is from an imposter source.
- the network includes the internet.
- the terms “web page” and “website” are substantially interchangeable. For example, it is understood that a web page is either itself a website, or that one of several pages of a website. It is further understood that a website can include one or more web pages.
- a methodology extracts and stores local copies of legitimate web page data from the legitimate web sites that are visited by the user computing device.
- the legitimate web page data may be stored at the user computing device, or may be stored remotely and searched/accessed by the user computing device when needed.
- the legitimate stored web page data may also be shared among a user's multiple authorized and connected computing devices.
- the legitimacy of a web site may be determined in a variety ways, including, for example, the number of times the user computing device accesses the particular website.
- the legitimate stored web page data may include image data, text data directly associated with (e.g., contained within) the image data, text data not directly associated with (e.g., not contained within) the image data, page structure/layout data, web site location data and the like. More specifically, in accordance with one or more embodiments, the text data directly associated with image data is developed by analyzing the image data and converting text image data of the image data into searchable text data using techniques such as optical character recognition (OCR).
- OCR optical character recognition
- the legitimate stored web page data may be stored in a lookup table in memory to facilitate searching.
- the disclosed methodology prior to allowing the user to submit information at the web page, compares the website location data (e.g., a URL) of the visited web page data with the legitimate website location data of the legitimate websites stored in memory. If this comparison results in a match, the user computing device determines that the visited site is a website that the user computing device has successfully visited multiple times in the past, and concludes that the visited website is most likely a legitimate website.
- the website location data e.g., a URL
- the user computing device does not have enough information to reliably determine whether the visited website is a legitimate website or an imposter website.
- an additional level of analysis and user notification are provided by developing web page data for the visited website and making selected comparisons between the visited web page data and the stored legitimate web page data.
- the visited web page data similar to the stored legitimate web page data, may include image data, text data directly associated with (e.g., contained within) the image data, text data not directly associated with (e.g., not contained within) the image data, page structure/layout data, website location data and the like. More specifically, in accordance with one or more embodiments, the text data directly associated with image data is developed by analyzing the image data and converting text image data of the image data into searchable text data using techniques such as optical character recognition (OCR).
- OCR optical character recognition
- the selected comparisons performed according to one or more embodiments include a comparison between the visited web page's text data that is directly associated with (e.g., contained within) the visited web page's image data, and the legitimate stored web page's text data that is directly associated with (e.g., contained within) the legitimate web page's image data.
- the selected comparisons performed according to one or more embodiments may further include a comparison between the visited web page's text data that is not directly associated with (e.g., not contained within) the visited web page's image data, and the legitimate stored web page's text data that is not directly associated with (e.g., not contained within) the legitimate web page's image data.
- the selected comparisons performed according to one or more embodiments may further include a comparison between the visited web page's page structure/layout data, and the legitimate stored web page's page structure/layout data.
- the above-described determination of a match can be made by developing a confidence level for the comparison, and counting the comparison result as a match if the confidence level of the comparison result exceeds a predetermined threshold.
- the user computing device may generate a notification and present the notification to the user.
- the notification may take a variety of forms, including for example disabling the user computing device's access to the imposter website, displaying a warning, reporting the imposter website determination to a central location with information about the imposter website/webpage or providing the option to go to the legitimate website/webpage that the imposer website/webpage is attempting to mimic.
- FIG. 1 depicts a block diagram illustrating a network communications system 100 in accordance with one or more embodiments.
- System 100 includes multiple user computing devices, two examples of which are shown in FIG. 1 as computing device 110 and computing device 120 .
- computing device 110 and computing device 120 include a wide variety of computing devices such as a smart phone, a smartwatch, a laptop, a desktop and the like. Additional details of an exemplary configuration for computing devices 110 , 120 are depicted at computer system 400 , which is shown in FIG. 4 and described in greater detail later in this disclosure.
- computing devices 110 , 120 may be stand-alone and unrelated. In one or more embodiments, computing devices 110 , 120 may be related computing devices of the same user, or may be related computing devices of different users that are part of a group such as the different users in a family wireless data plan, or the different users in a crowdsourcing group. Although two computing devices 110 , 120 are shown, it is understood that more than two computing devices may be provided.
- Computing devices 110 , 120 each includes web browsers 112 , 122 , respectively.
- Web browsers 112 , 122 provide the capability of communicating through and with internet/network 130 . It should be understood that descriptions herein of the operations of one of the computing devices 110 , 120 apply equally to both of the computing devices 110 , 120 .
- System 100 further includes various websites (i.e., web hosting servers) that can be accessed by computing devices 110 , 120 through internet/network 130 .
- webpages and/or websites are actually an electronic text file stored on the hard drive of a computer that is constantly hooked up to the Internet (e.g., internet/network 130 ).
- a computer is often referred to as a web hosting server.
- a computing device calls up or “visits” a webpage/website, usually by clicking on a link from another web page, the computing device uses the internet to contact this web hosting server and retrieves or downloads this text file.
- the browser then reads this text file through the aid of a program called a “web browser” and displays it on the screen according to the formatting and specifications laid out in it.
- These electronic text files are written in a special language called hypertext markup language (HTML).
- HTML hypertext markup language
- FIG. 1 For ease of illustration and explanation, two example websites are shown in FIG. 1 , namely a legitimate website 140 and an imposter website 160 . Although two websites 140 , 160 are shown, it is understood that more than two websites may be included.
- Legitimate website 140 as well as imposter website 160 , are each, in effect, a web hosting server that houses pages of linked data that form the content of the given web site. Included among these pages are web page data that may be extracted and stored by computing devices 110 , 120 . The extracted web page data may be stored at the user computing device 110 , 120 , or may be stored remotely and searched/accessed by the user computing device 110 , 120 when needed.
- the stored web page data may also be shared among computing devices 110 , 120 for embodiments wherein computing devices 110 , 120 are related.
- the web page data may that may be extracted and stored for legitimate web site 140 are shown in FIG. 1 as image data 142 , text data directly associated with (e.g., contained within) the image data 144 , text data not directly associated with (e.g., not contained within) the image data 146 , page structure/layout data 148 and website location data (e.g., a URL) 150 .
- the web page data may that may be extracted from imposter website 160 are shown in FIG.
- image data 162 text data directly associated with (e.g., contained within) the image data 164 , text data not directly associated with (e.g., not contained within) the image data 166 , page structure/layout data 168 and web site location data (e.g., a URL) 170 .
- Website location data 150 , 170 may take a variety of forms, including for example an address of a web hosting server on the internet or other network.
- an address is an identifier assigned to each device on the network.
- a device's address is known generally as its “internet protocol address” (IP address), which is a numerical representation of the device's virtual location on the Internet.
- IP address is a numerical representation of the device's virtual location on the Internet.
- IP address internet protocol address
- the host device's IP address is used to locate the host device and provide access to content from the website.
- the web domain www.google.com actually represents a numerical IP address, which could be, for example, 73.14.213.99.
- a downstream DNS system (not shown) matches or routes the entered domain name to an IP address, then uses the numerical IP address to locate and provide access to the host server device associated with that address.
- FIG. 2 depicts a flow diagram illustrating a methodology 200 that may be performed by system 100 in accordance with one or more embodiments.
- FIG. 3 depicts an example lookup table 300 that may be utilized in connection with system 100 performing methodology 200 in accordance with one or more embodiments.
- Lookup table 300 accumulates web page data extracted and stored from legitimate websites such as legitimate website 140 shown in FIG. 1 .
- the format and content of lookup table 300 is an example, and many formats and options are available without departing from the scope of the present disclosure.
- Lookup table 300 may be stored in and accessed through computing devices 110 , 120 . A description of the operation of system 100 will now be provided with reference to system 100 , methodology 200 and lookup table 300 .
- Computing device 110 extracts and stores local copies of legitimate web page data from the legitimate web sites (e.g., legitimate website 140 ) that are visited by computing device 110 (block 202 ).
- the legitimate web page data may be stored at computing device 110 , or may be stored at a remote location (now shown) and searched/accessed by computing device 110 when needed.
- the legitimate stored web page data may also be shared among a user's multiple authorized and connected computing devices (e.g., computing device 120 ).
- the legitimacy of a website may be determined in a variety ways, including, for example, the number of times user computing device 110 accesses the particular website.
- the legitimate stored web page data may include image data 142 , text data directly associated with (e.g., contained within) the image data 144 , text data not directly associated with (e.g., not contained within) the image data 146 , page structure/layout data 148 , website location data 150 and the like. More specifically, in accordance with one or more embodiments, the text data 144 directly associated with image data is developed by analyzing the image data 142 and converting text image data of the image data 142 into searchable text data using techniques such as optical character recognition (OCR).
- OCR optical character recognition
- the legitimate stored web page data may be stored in lookup table 300 , which is part of computing device 100 , to facilitate searching.
- computing device 110 visits a web page, such as a web page of imposter website 160 , prior to allowing the user to submit information at the visited web page, computing device 110 compares the visited website location data 150 (e.g., a URL) of the visited web page data with the legitimate website location data of the legitimate websites stored in lookup table 300 (block 204 ). If this comparison results in a match, computing device 110 determines that the visited site is a website that the user computing device has successfully visited multiple times in the past, and concludes that the visited website is most likely a legitimate website (blocks 206 , 208 ).
- the visited website location data 150 e.g., a URL
- computing device 110 does not have enough information to reliably determine whether the visited website is a legitimate website or an imposter website.
- an additional level of analysis and user notification are provided by extracting web page data from the visited website and making selected comparisons between the visited web page data and the stored legitimate web page data (blocks 210 , 212 ).
- the visited web page data may include image data 162 , text data directly associated with (e.g., contained within) the image data 164 , text data not directly associated with (e.g., not contained within) the image data 166 , page structure/layout data 160 , website location data 170 and the like. More specifically, in accordance with one or more embodiments, text data 164 directly associated with image data is extracted by analyzing image data 162 and converting text image data (e.g., 164 ) of image data into searchable text data using techniques such as optical character recognition (OCR).
- OCR optical character recognition
- the selected comparisons performed according to one or more embodiments include a comparison between the visited web page's text data 164 that is directly associated with (e.g., contained within) the visited web page's image data, and the legitimate stored web page's text data 144 that is directly associated with (e.g., contained within) the legitimate web page's image data.
- the selected comparisons performed according to one or more embodiments may further include a comparison between the visited web page's text data 166 that is not directly associated with (e.g., not contained within) the visited web page's image data, and the legitimate stored web page's text data 146 that is not directly associated with (e.g., not contained within) the legitimate web page's image data.
- the selected comparisons performed according to one or more embodiments may further include a comparison between the visited web page's page structure/layout data 168 , and the legitimate stored web page's page structure/layout data 148 .
- any one of the above-described comparisons for a given visited website does not result in a match, or if any combination of the above-described comparisons for a given visited website does not result in a match, and if it was determined that the website location data (e.g., a URL) of the given visited website is not stored among the stored legitimate web page data of the user computing device, no conclusion is reached about whether the visited website is an imposter website that is attempting to mimic a legitimate website (blocks 216 , 218 ).
- the website location data e.g., a URL
- any one of the above-described comparisons for a given visited website results in a match, or if any combination of the above-described comparisons for a given visited website results in a match, and if it was determined that the website location data (e.g., a URL) of the given visited website is not stored among the stored legitimate web page data of the user computing device, it is determined that the visited web site is an imposter website that is attempting to mimic a legitimate web site (blocks 216 , 218 ).
- the website location data e.g., a URL
- the above-described determination of a match can be made by generating, using standard statistics techniques, a confidence level for the comparison, and counting the comparison result as a match if the confidence level of the comparison result exceeds a predetermined threshold (block 214 ). More specifically, the extent to which a comparison of two things match can be generated at confidence levels (CLs). When it is determined that the value of a CL is below a predetermined threshold (TH) (i.e., CL ⁇ TH), it can be concluded that the comparison did not result in a match. If CL>TH, it can be concluded that the comparison did result in a match. Many different predetermined TH levels may be provided depending on the desired level of accuracy.
- the user computing device may generate a notification and present the notification to the user (blocks 220 , 222 ).
- the notification may take a variety of forms, including for example disabling the user computing device's access to the imposter website, displaying a warning, reporting the imposter website determination to a central location with information about the imposter website/webpage or providing the option to go to the legitimate website/webpage that the imposer website/webpage is attempting to mimic.
- FIG. 4 depicts a high level block diagram computer system 400 , which may be used to implement one or more embodiments of the present disclosure. More specifically, computer system 400 may be used to implement hardware components of computing devices 110 , 120 shown in FIG. 1 and lookup table 300 shown in FIG. 3 . Although one exemplary computer system 400 is shown, computer system 400 includes a communication path 426 , which connects computer system 400 to additional systems (not depicted) and may include one or more wide area networks (WANs) and/or local area networks (LANs) such as the Internet, intranet(s), and/or wireless communication network(s). Computer system 400 and additional system are in communication via communication path 426 , e.g., to communicate data between them.
- WANs wide area networks
- LANs local area networks
- Computer system 400 and additional system are in communication via communication path 426 , e.g., to communicate data between them.
- Computer system 400 includes one or more processors, such as processor 402 .
- Processor 402 is connected to a communication infrastructure 404 (e.g., a communications bus, cross-over bar, or network).
- Computer system 400 can include a display interface 406 that forwards graphics, textual content, and other data from communication infrastructure 404 (or from a frame buffer not shown) for display on a display unit 408 .
- Computer system 400 also includes a main memory 410 , preferably random access memory (RAM), and may also include a secondary memory 412 .
- Secondary memory 412 may include, for example, a hard disk drive 414 and/or a removable storage drive 416 , representing, for example, a floppy disk drive, a magnetic tape drive, or an optical disk drive.
- Removable storage drive 416 reads from and/or writes to a removable storage unit 418 in a manner well known to those having ordinary skill in the art.
- Removable storage unit 418 represents, for example, a floppy disk, a compact disc, a magnetic tape, or an optical disk, etc. which is read by and written to by removable storage drive 416 .
- removable storage unit 418 includes a computer readable medium having stored therein computer software and/or data.
- secondary memory 412 may include other similar means for allowing computer programs or other instructions to be loaded into the computer system.
- Such means may include, for example, a removable storage unit 420 and an interface 422 .
- Examples of such means may include a program package and package interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 420 and interfaces 422 which allow software and data to be transferred from the removable storage unit 420 to computer system 400 .
- Computer system 400 may also include a communications interface 424 .
- Communications interface 424 allows software and data to be transferred between the computer system and external devices. Examples of communications interface 424 may include a modem, a network interface (such as an Ethernet card), a communications port, or a PCM-CIA slot and card, etcetera.
- Software and data transferred via communications interface 424 are in the form of signals which may be, for example, electronic, electromagnetic, optical, or other signals capable of being received by communications interface 424 . These signals are provided to communications interface 424 via communication path (i.e., channel) 426 .
- Communication path 426 carries signals and may be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link, and/or other communications channels.
- computer program medium In the present disclosure, the terms “computer program medium,” “computer usable medium,” and “computer readable medium” are used to generally refer to media such as main memory 410 and secondary memory 412 , removable storage drive 416 , and a hard disk installed in hard disk drive 414 .
- Computer programs also called computer control logic
- main memory 410 and/or secondary memory 412 Computer programs may also be received via communications interface 424 .
- Such computer programs when run, enable the computer system to perform the features of the present disclosure as discussed herein.
- the computer programs when run, enable processor 402 to perform the features of the computer system. Accordingly, such computer programs represent controllers of the computer system.
- one or more embodiments of the present disclosure provide technical benefits and effects, including specifically performing OCR on images of web pages to determine if the text in the image is intended to mimic the content of another web page.
- font and other changes to text in the images could deceive known anti-phishing methodologies that do not checking this content.
- explicitly checking the fields in a form of a web page catches more actual phishing attempts, because the form fields of an imposter web page must exist in order to harvest confidential information from the user.
- FIG. 5 a computer program product 500 in accordance with an embodiment that includes a computer readable storage medium 502 and program instructions 504 is generally shown.
- the present disclosure may be a system, a method, and/or a computer program product.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Medical Informatics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (17)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/153,975 US10505979B2 (en) | 2016-05-13 | 2016-05-13 | Detection and warning of imposter web sites |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/153,975 US10505979B2 (en) | 2016-05-13 | 2016-05-13 | Detection and warning of imposter web sites |
Publications (2)
Publication Number | Publication Date |
---|---|
US20170331855A1 US20170331855A1 (en) | 2017-11-16 |
US10505979B2 true US10505979B2 (en) | 2019-12-10 |
Family
ID=60297578
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/153,975 Expired - Fee Related US10505979B2 (en) | 2016-05-13 | 2016-05-13 | Detection and warning of imposter web sites |
Country Status (1)
Country | Link |
---|---|
US (1) | US10505979B2 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10958683B2 (en) * | 2018-04-26 | 2021-03-23 | Wipro Limited | Method and device for classifying uniform resource locators based on content in corresponding websites |
EP3888335A4 (en) * | 2018-11-26 | 2022-08-10 | Cyberfish Ltd. | Phishing protection methods and systems |
US10885373B2 (en) * | 2018-12-28 | 2021-01-05 | Citrix Systems, Inc. | Systems and methods for Unicode homograph anti-spoofing using optical character recognition |
CN111461545B (en) * | 2020-03-31 | 2023-11-10 | 北京深演智能科技股份有限公司 | Method and device for determining machine access data |
CN112347402A (en) * | 2020-10-21 | 2021-02-09 | 上海淇玥信息技术有限公司 | Illegal website/APP automatic identification method, system and electronic device |
CN115796145B (en) * | 2022-11-16 | 2023-09-08 | 珠海横琴指数动力科技有限公司 | Webpage text acquisition method, system, server and readable storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060168659A1 (en) * | 2004-12-27 | 2006-07-27 | Atsuhisa Saitoh | Security information estimating apparatus, a security information estimating method, a security information estimating program, and a recording medium thereof |
US20080046738A1 (en) | 2006-08-04 | 2008-02-21 | Yahoo! Inc. | Anti-phishing agent |
US8621616B2 (en) | 2009-03-24 | 2013-12-31 | Alibaba Group Holding Limited | Method and system for identifying suspected phishing websites |
US20140173726A1 (en) * | 2012-12-19 | 2014-06-19 | Dropbox, Inc. | Methods and systems for preventing unauthorized acquisition of user information |
US8776220B2 (en) | 2011-10-18 | 2014-07-08 | Institute For Information Industry | Phishing detecting system and method operative to compare web page images to a snapshot of a requested web page |
CN104143008A (en) | 2014-08-11 | 2014-11-12 | 北京奇虎科技有限公司 | Method and device for detecting phishing webpage based on picture matching |
US20140359760A1 (en) * | 2013-05-31 | 2014-12-04 | Adi Labs, Inc. | System and method for detecting phishing webpages |
US20160063541A1 (en) | 2013-05-23 | 2016-03-03 | Computer Network Information Center, Chinese Academy Of Sciences | Method for detecting brand counterfeit websites based on webpage icon matching |
-
2016
- 2016-05-13 US US15/153,975 patent/US10505979B2/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060168659A1 (en) * | 2004-12-27 | 2006-07-27 | Atsuhisa Saitoh | Security information estimating apparatus, a security information estimating method, a security information estimating program, and a recording medium thereof |
US20080046738A1 (en) | 2006-08-04 | 2008-02-21 | Yahoo! Inc. | Anti-phishing agent |
US8621616B2 (en) | 2009-03-24 | 2013-12-31 | Alibaba Group Holding Limited | Method and system for identifying suspected phishing websites |
US8776220B2 (en) | 2011-10-18 | 2014-07-08 | Institute For Information Industry | Phishing detecting system and method operative to compare web page images to a snapshot of a requested web page |
US20140173726A1 (en) * | 2012-12-19 | 2014-06-19 | Dropbox, Inc. | Methods and systems for preventing unauthorized acquisition of user information |
US20160063541A1 (en) | 2013-05-23 | 2016-03-03 | Computer Network Information Center, Chinese Academy Of Sciences | Method for detecting brand counterfeit websites based on webpage icon matching |
US20140359760A1 (en) * | 2013-05-31 | 2014-12-04 | Adi Labs, Inc. | System and method for detecting phishing webpages |
CN104143008A (en) | 2014-08-11 | 2014-11-12 | 北京奇虎科技有限公司 | Method and device for detecting phishing webpage based on picture matching |
Also Published As
Publication number | Publication date |
---|---|
US20170331855A1 (en) | 2017-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12081503B2 (en) | Determining authenticity of reported user action in cybersecurity risk assessment | |
US10505979B2 (en) | Detection and warning of imposter web sites | |
Wu et al. | Effective defense schemes for phishing attacks on mobile computing platforms | |
US8813239B2 (en) | Online fraud detection dynamic scoring aggregation systems and methods | |
US10574697B1 (en) | Providing a honeypot environment in response to incorrect credentials | |
US20130263263A1 (en) | Web element spoofing prevention system and method | |
Patil et al. | Survey on malicious web pages detection techniques | |
US8996669B2 (en) | Internet improvement platform with learning module | |
US11838320B2 (en) | Proxy server and navigation code injection to prevent malicious messaging attacks | |
US9712532B2 (en) | Optimizing security seals on web pages | |
Buchanan et al. | Analysis of the adoption of security headers in HTTP | |
US20210377304A1 (en) | Machine learning to determine command and control sites | |
US20160373262A1 (en) | Systems and methods for digital certificate security | |
US20150067832A1 (en) | Client Side Phishing Avoidance | |
US20220188402A1 (en) | Real-Time Detection and Blocking of Counterfeit Websites | |
US20160381047A1 (en) | Identifying and Assessing Malicious Resources | |
US11159566B2 (en) | Countering phishing attacks | |
Mishra et al. | Intelligent phishing detection system using similarity matching algorithms | |
Masoud et al. | On tackling social engineering web phishing attacks utilizing software defined networks (SDN) approach | |
US12069080B2 (en) | Malware detection using document object model inspection | |
Wang et al. | A cost-effective ocr implementation to prevent phishing on mobile platforms | |
EP3965362A1 (en) | Machine learning to determine domain reputation, content classification, phishing sites, and command and control sites | |
US10484422B2 (en) | Prevention of rendezvous generation algorithm (RGA) and domain generation algorithm (DGA) malware over existing internet services | |
ThreatLab | Zscaler threatlabz 2023 phishing report | |
US20200045078A1 (en) | Resource Security System Using Fake Connections |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HARDEE, CHRISTOPHER J.;JOROFF, STEVEN R.;NESBITT, PAMELA A.;AND OTHERS;SIGNING DATES FROM 20160422 TO 20160508;REEL/FRAME:038589/0179 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20231210 |