GB2418500A - Detection, quarantine and modification of dangerous web pages - Google Patents

Detection, quarantine and modification of dangerous web pages Download PDF

Info

Publication number
GB2418500A
GB2418500A GB0421476A GB0421476A GB2418500A GB 2418500 A GB2418500 A GB 2418500A GB 0421476 A GB0421476 A GB 0421476A GB 0421476 A GB0421476 A GB 0421476A GB 2418500 A GB2418500 A GB 2418500A
Authority
GB
United Kingdom
Prior art keywords
file
computer
database
rules
web
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0421476A
Other versions
GB0421476D0 (en
Inventor
Alyn Hockey
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Clearswift Ltd
Original Assignee
Clearswift Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clearswift Ltd filed Critical Clearswift Ltd
Priority to GB0421476A priority Critical patent/GB2418500A/en
Publication of GB0421476D0 publication Critical patent/GB0421476D0/en
Priority to PCT/GB2005/003647 priority patent/WO2006035201A1/en
Publication of GB2418500A publication Critical patent/GB2418500A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Dangerous files, which may contain malicious code or unsafe links, scripts or other content are detected by a file type detection means (80). Upon detection of a particular type of file (optionally by reference to a database of file types), e.g. HTML, the file is moved to a quarantine area. Once quarantined, the file is scanned for particular unsafe content (84), and a database (82) is referred to, to determine the precise action to be taken for the particular file type. If detected, the particular content is removed (86), and the file is moved to a safe area (90) for viewing. If no unsafe content is found, the file is also moved to the safe area (90). Thus both safe and potentially unsafe files may be viewed safely. The processing may take place on a client or a web proxy.

Description

24 1 8500
SAFE VIEWING OF WEB PAGES
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method and apparatus for preventing the damage, which may be caused to a user's computer, by web pages.
DESCRIPTION OF RELATED ART
There are around 4 billion web pages indexed by the search engine Goggle at the time of writing. The first web pages were made up simply of text, described by the markup language HyperText Markup Language (HTML), which also includes the ability to insert images into the text. Since the original invention of HTML in the early 1 990's, the number and types of codes that can be placed into web pages have increased immensely: not only are there various versions of HTML, but it is also now possible to include general programming language elements such as JScript or Microsoft's) Visual Basic(D Scripting Edition. These languages provide wide support for controlling the appearance of a page of information. Finally, web pages have been further extended to run almost any program through Microsoft's ActiveX controls and Sunny Java language.
With such a wide range of options, it is no wonder that errors in the way browsers render these pages are commonplace. Some of the errors are simple: for example, the user's version of the browser program may be unable to parse the codes in the page, so the page is distorted. In other cases, the careful "sandbox" environment that is supposed to contain webbased programs can be breached, so allowing a program to perform actions it is not supposed to be able to perform, such as accessing the hard disk of the computer on which the web page is being viewed. In other cases, the codes can lure unwary users into downloading and running programs that are potentially dangerous, such as viruses and so on.
The threat of such unwanted behaviour cannot be underestimated. In simple cases, a web page might contain genuine HTML that can cause older versions of browsers to crash. In more extreme cases, a web page might cause a program to be installed, that monitors passwords being typed in and sends the information to organised crime sites.
In order to monitor this situation, various solutions have been proposed. Standard anti virus programs, such as those described in US Patent 5,319, 776, monitor the end user's computer for anything written to disk. A common extension to such anti-virus programs is the ability to retrieve updates of recently known viruses over the internet.
Other solutions involve monitoring from within the web browser and quarantining anything that looks unwanted.
Such solutions require every user to run the anti-virus product.
US Patent 6,785,732 discloses a web server that can be set up to check for virus files arriving through web pages, and thus perform monitoring for many users.
In all these cases, the user may still want to read the web page that is potentially harmful, even after receiving a warning about the content, but the very act of reading it will cause the unwanted payload to be activated.
SUMMARY OF THE INVENTION
The present invention relates to a method and an apparatus, which seek to allow a user to view web pages in a safe manner, even when it has been identified that the web pages have potentially dangerous content.
According to a first aspect of the present invention, there is provided a method of altering data stored in a file, according to a predefined set of rules, comprising the following steps: a) identifying the type of the file; b) consulting a database file, specific to the identified file type, for rules that define changes to be made to files of that file type; and c) making changes to the file according to the rules so defined.
According to a second aspect of the present invention, there is provided a computer system, adapted to operate in accordance with the method according to the first aspect of the invention.
According to a third aspect of the present invention, there is provided a computer program product, containing computer-readable code, for causing a computer to operate in accordance with the method according to the first aspect of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block schematic diagram of a computer system, including a computer in accordance with the present invention.
Figure 2 is a representation of a part of the computer of Figure 1.
Figure 3 is a flow chart illustrating a method according to the present invention.
Figure 4 is a block schematic diagram of a computer system, including a computer in accordance with another embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Figure 1 is a block schematic diagram of a computer system, in accordance with the present invention. Specifically, a user computer 10 has a central processor (CPU) 12, a disk 14, and a network interface 16. It will be appreciated by the person skilled in the art that the user computer 10, which is generally conventional, has various other features, which will be well known to the person skilled in the art. However, these features will not be described, except in so far as they are relevant to the operation of the present invention.
The user computer 10 has a connection via its network interface 16 over the internet 20 to a first web server computer 30. As is well known, information can be stored on the web server 30 in the form of web pages, and the user of the computer 10 can access these web pages, and download the stored information for viewing on the computer 10.
The user of the computer 10 can also access a second web server computer 32, which contains a threat database 34, as will be described in more detail below.
Figure 2 is a further block schematic diagram, representing the contents of the disk 14 of the user computer 10. Again, it will be appreciated by the person skilled in the art that the disk contains additional data, which will be well known to the person skilled in the art, but which will not be described, except in so far as it is relevant to the operation of the present invention.
As shown in Figure 2, the main memory 14 stores operating system software 36. The disk 14 also contains a program directory 40, which itself contains an initialization file 1 1 42, a database file 44, browser software 46, and a file processing program 48. The disk 14 also contains one or more quarantine areas 50, and one or more safe areas 52.
Figure 3 is a flow chart, illustrating the operation of the file processing program 48, in accordance with a preferred embodiment of the present invention.
In step 70 of the process, the computer 10 is turned on by the user in the conventional way. The operating system software 36 then arranges that the file processing program 48 runs whenever the computer is turned on. The program 48 starts to run in step 72 of the process shown in Figure 3.
When the program 48 starts up, it reads the initialization file 42, which contains various optional settings and also describes the location on the disk 14 of the one or more quarantine areas 50 and the safe areas 52. Thus, the program records the options and, in step 74, starts monitoring the defined quarantine directories 50.
In operation of the computer 10, the user may use the browser software 46 to access web pages that are stored on remote web servers, such as the web server 30, shown in Figure 1. When the user wishes to view a web page, the browser software 46 causes a copy of the web page to be downloaded onto the computer 10 using the Hypertext Transfer Protocol (HTTP). Web pages, or other files, that are determined by the browser program 46, or by an extension to the browser program, to be potentially dangerous or unsuitable are placed in the quarantine area 50, and an alternative web page, describing what has been done, is displayed to the user. Similarly, web pages or other files can be placed in the quarantine area 50 by antivirus programs, web-based monitors, web proxies, or other types of technology that can identify such pages. Files placed in the quarantine area 50 are preferably timestamped, for use in later steps of the process, as described below.
In step 76 of the process, it is determined whether the directory content of the quarantine directories 50 has changed. If there has been no change since the quarantine directories 50 were last monitored, the process returns to step 74, where, after a predetermined time, the quarantine directories are again monitored. As is well known in the art, the file processing program can record the time of each directory scan, and compare this with the timestamps on the files in the quarantine directories, in order to determine which files are new.
If, in step 76, one or more new files are detected then, in step 80, the exact filetype of each new file is detected. For example, it is determined whether the file is generated using the Microsoft Wordy word processing program. In a preferred embodiment of the invention, the filetype is determined by examining the filename, and specifically the file extension, that is ".doc" in the case of a file generated using the Microsoft Word@) word processing program. In an alternative embodiment of the invention, the filetype may be determined by examining the content of the file.
Having detected the filetype, then, in step 82 of the process, a relevant part of the database file 44 is consulted. The database file 44 contains parts which are relevant to many of the different filetypes which may be detected. In step 83, it is determined whether the database file 44 contains any entries for the filetype detected in step 80. If not, the process returns to step 74, where, after a predetermined time, the quarantine directories are again monitored. If it is determined in step 83 that the database file 44 does contain entries for the filetype detected in step 80, the process passes to step 84.
In each part of the database file 44, there are descriptions of data that may be contained within files of the relevant type, and which may be unsafe. For example, the data might be potentially unsafe tags in HTML files that may contain scripts, programs, images or references to other websites which may themselves have unsafe content.
As mentioned above, these potentially unsafe data inside the files are identified by reference to the database file 44. This database is updated over the internet 20 from the threat database 34. Thus, the provider of the web server 32 can continually maintain the threat database 34, and this can then be used to update the database file 44 on a regular basis.
The database file 44 can be updated periodically, either by the file processing program 48 itself or by some other scheduled event, allowing potential threats to be identified in a timely manner. The maintenance of the database file 44 can operate in a way which is similar to the way that anti-virus definitions are updated in existing commercially available products.
The process then passes to step 84, in which it is determined if the files, which have been newly added to the quarantine area, contain any of these potentially unsafe data. t
Thus, in this embodiment of the invention, files are sent to a quarantine area, and it is then determined whether those files contain any of the potentially unsafe data. In another embodiment of the invention, the database file may be consulted, in order to determine whether to send the files to the quarantine area.
For each newly added file, if it is determined in step 84 that the file does not contain any of the potentially unsafe data relevant to that filetype, the process jumps to step 90.
However, if it is determined in step 84 that the file contains one or more of the potentially unsafe data relevant to that filetype, then, in step 86, the potentially unsafe data is removed.
Next, in step 90, the files that could be dealt with, or that were determined in step 84 not to contain any unsafe data, are moved from the quarantine area 50 to the safe area 52.
The process finally returns to step 74, where, after a predetermined time, the quarantine directories are again monitored. Thus, as is conventional, the user of the computer 100 can be warned that a requested web page has been identified as potentially unsafe, and has been put into the quarantine area 50.
Further, the user can now be informed that a safe copy of the file is available in the safe area 52, and that, although the potentially unsafe data, such as certain tags, scripts, programs or references to other websites, have been removed, the text, and any normal formatting, are retained, so that the contents of the page can be read, albeit perhaps not quite as the web page designer originally intended.
Figures 1-3 relate to an embodiment of the present invention, in which the file processing program 48 runs on a client computer 10, and modifies files stored in a quarantine area of the disk 14 of the computer.
In an alternative embodiment of the invention, a web page processing program runs on a web server, through which multiple client computers can access the internet.
Figure 4 is a block schematic diagram of a computer system operating in accordance with this alternative embodiment of the invention. Specifically, Figure 4 shows a computer 110, having a CPU 112 and a disk 114, and acting as a web proxy server in a manner which is generally conventional, as will be well known to the person skilled in the art. The web proxy server 110 has a connection over the internet 20 to a first web server computer 30. As is well known, information can be stored on the web server 30 in the form of web pages, and the users of other internetconnected computers can access these web pages, and download the stored information for viewing on their computers. The web proxy server 110 can also access a second web server computer 32, which contains a threat database 34, as described with reference to Figure 1.
As is well known to the person skilled in the art, users of client computers can connect to the internet 20 through the web proxy server 110, and the web proxy server 110 collects web pages and other data for clients, using HTTP. Again, in a generally conventional way, the web proxy server 110 may pre-process the data which it collects, and may keep a local copy for reasons of efficiency. Figure 4 shows two such client computers 120, 122, although it will be appreciated that any number of such client computers may be connected in this way.
In this second embodiment of the invention, the file processing program runs on the web proxy server 110, but otherwise operates generally in accordance with Figure 2 and the associated description. Thus, when a web page is identified by the proxy software as potentially unsafe, and is moved to a quarantine area of the disk 114, a web page is presented to the client computer, identifying the problem and informing the user. If the program is able to remove the potentially unsafe data, the web page presented to the client computer can also contain a hypertext link to the safe version of the file.
In a further, related, embodiment of the invention, the web proxy software itself can advantageously perform many of the steps of the process shown in Figure 2. For example, the web proxy software can identify potentially dangerous web pages or other files, then modify those pages if possible using the relevant database file.
There is therefore described a system which allows a user to have access to a potentially dangerous file, after the potentially dangerous part of the file contents has been removed.

Claims (19)

1. A method of altering data stored in a file, according to a predefined set of rules, comprising the following steps: a) identifying the type of the file; b) consulting a database file, specific to the identified file type, for rules that define changes to be made to files of that file type; and c) making changes to the file according to the rules so defined.
2. A method as claimed in claim 1, further comprising: periodically automatically refreshing the database file by accessing a master copy held on a server accessible over the internet.
3. A method as claimed in claim 1 or 2, comprising identifying the type of the file by reference to the name of the file.
4. A method as claimed in claim 1 or 2, comprising identifying the type of the file by reference to its content.
5. A method as claimed in any one of claims 1 to 4, comprising scanning one or more directories for files to be processed.
6. A method as claimed in claim 5, comprising scanning a quarantine directory for files placed in said quarantine directory by a software program.
7. A method as claimed in any one of claims 1 to 4, further comprising: receiving a request for said file; downloading said file from a remote server via the Hypertext Transfer protocol; subsequently identifying the type of the file, consulting said database file, and making changes to the file; and thereafter, delivering said changed file according to said request.
8. A method as claimed in any preceding claim, wherein said rules define changes comprising removing suspect tags from said file. c
9. A method as claimed in claim 8, wherein said suspect tags define programs contained within said file.
10. A method as claimed in claim 8, wherein said suspect tags define scripts contained within said file.
11. A method as claimed in claim 8, wherein said suspect tags define links to suspect web sites, contained within said file.
12. A method as claimed in any preceding claim, further comprising: updating said database file from a website on a remote server.
13. A method as claimed in any preceding claim, comprising: identifying a file as potentially suspect; storing said potentially suspect file in a quarantine area of a computer memory; and subsequently identifying the type of the file, consulting said database file, and making changes to the file.
14. A method as claimed in claim 13, comprising: determining whether files received over the internet should be identified as potentially suspect.
15. A computer program product, comprising computer-readable code, wherein said code is adapted to cause a computer to perform the method according to one of claims 1 to14.
16. A computer, programmed to perform a method according to one of claims 1 to 14.
17. A computer as claimed in claim 16, wherein said computer has an internet connection, and said method is applied to files received over the internet.
18. A computer as claimed in claim 17, wherein said computer is programmed to operate as a web proxy server, and the web proxy server software performs the method according to one of claims 1 to 14. t
19. A method of altering data stored in a web page, according to a predefined set of rules, comprising the following steps: a) receiving a request for a web page from a requesting one of a plurality of connected client computers; b) retrieving said web page over a computer network from a remote server; c) determining whether said web page contains potentially unsafe data; d) consulting a database file for rules that define changes to be made to web pages, wherein said database file is updated over said computer network with rules retrieved from a master threat database; e) making changes to the web page according to the rules so defined; and f) presenting the changed web page to the requesting client computer.
GB0421476A 2004-09-27 2004-09-27 Detection, quarantine and modification of dangerous web pages Withdrawn GB2418500A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB0421476A GB2418500A (en) 2004-09-27 2004-09-27 Detection, quarantine and modification of dangerous web pages
PCT/GB2005/003647 WO2006035201A1 (en) 2004-09-27 2005-09-22 Safe viewing of web pages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0421476A GB2418500A (en) 2004-09-27 2004-09-27 Detection, quarantine and modification of dangerous web pages

Publications (2)

Publication Number Publication Date
GB0421476D0 GB0421476D0 (en) 2004-10-27
GB2418500A true GB2418500A (en) 2006-03-29

Family

ID=33397337

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0421476A Withdrawn GB2418500A (en) 2004-09-27 2004-09-27 Detection, quarantine and modification of dangerous web pages

Country Status (2)

Country Link
GB (1) GB2418500A (en)
WO (1) WO2006035201A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2111017A1 (en) * 2008-04-17 2009-10-21 Zeus Technology Limited Supplying web pages
US8051482B2 (en) 2006-10-31 2011-11-01 Hewlett-Packard Development Company, L.P. Nullification of malicious code by data file transformation

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090070873A1 (en) * 2007-09-11 2009-03-12 Yahoo! Inc. Safe web based interactions

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001073523A2 (en) * 2000-03-24 2001-10-04 Mcafee.Com Corporation Method and system for detecting viruses on handheld computers
GB2367714A (en) * 2000-07-07 2002-04-10 Messagelabs Ltd Monitoring e-mail traffic for viruses
US20020116639A1 (en) * 2001-02-21 2002-08-22 International Business Machines Corporation Method and apparatus for providing a business service for the detection, notification, and elimination of computer viruses
EP1237065A2 (en) * 1996-09-05 2002-09-04 Cheyenne Software International Sales Corp. Anti-virus agent for use with databases and mail servers
GB2383444A (en) * 2002-05-08 2003-06-25 Gfi Software Ltd Detecting a potentially malicious executable file

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6275937B1 (en) * 1997-11-06 2001-08-14 International Business Machines Corporation Collaborative server processing of content and meta-information with application to virus checking in a server network
US7234167B2 (en) * 2001-09-06 2007-06-19 Mcafee, Inc. Automatic builder of detection and cleaning routines for computer viruses

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1237065A2 (en) * 1996-09-05 2002-09-04 Cheyenne Software International Sales Corp. Anti-virus agent for use with databases and mail servers
WO2001073523A2 (en) * 2000-03-24 2001-10-04 Mcafee.Com Corporation Method and system for detecting viruses on handheld computers
GB2367714A (en) * 2000-07-07 2002-04-10 Messagelabs Ltd Monitoring e-mail traffic for viruses
US20020116639A1 (en) * 2001-02-21 2002-08-22 International Business Machines Corporation Method and apparatus for providing a business service for the detection, notification, and elimination of computer viruses
GB2383444A (en) * 2002-05-08 2003-06-25 Gfi Software Ltd Detecting a potentially malicious executable file

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SnapFiles, "Anti-Virus Tools" [online], see "GFI DownloadSecurity for ISA Server" (2/3/2004) & "Norton AntiVirus" (22/4/2004). Available from http://www.snapfiles.com/shareware/security/swvirus.html [8/11/2004] *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8051482B2 (en) 2006-10-31 2011-11-01 Hewlett-Packard Development Company, L.P. Nullification of malicious code by data file transformation
EP2111017A1 (en) * 2008-04-17 2009-10-21 Zeus Technology Limited Supplying web pages
US8332515B2 (en) 2008-04-17 2012-12-11 Riverbed Technology, Inc. System and method for serving web pages
EP3051774A1 (en) * 2008-04-17 2016-08-03 Riverbed Technology, Inc. Supplying web pages

Also Published As

Publication number Publication date
GB0421476D0 (en) 2004-10-27
WO2006035201A1 (en) 2006-04-06

Similar Documents

Publication Publication Date Title
EP1958119B1 (en) System and method for appending security information to search engine results
US7096500B2 (en) Predictive malware scanning of internet data
KR100519842B1 (en) Virus checking and reporting for computer database search results
JP5483798B2 (en) Stepped object-related credit decisions
US8677481B1 (en) Verification of web page integrity
JP4104640B2 (en) User interface adapted to stepped object-related trust decisions
JP3771822B2 (en) Data retrieval method, system, and program
US8584233B1 (en) Providing malware-free web content to end users using dynamic templates
JP6304833B2 (en) Using telemetry to reduce malware definition package size
US8826411B2 (en) Client-side extensions for use in connection with HTTP proxy policy enforcement
US8060860B2 (en) Security methods and systems
US8126866B1 (en) Identification of possible scumware sites by a search engine
US20030097591A1 (en) System and method for protecting computer users from web sites hosting computer viruses
US20140283078A1 (en) Scanning and filtering of hosted content
US20030140242A1 (en) Anti-virus toolbar system and method for use with a network browser
US9154522B2 (en) Network security identification method, security detection server, and client and system therefor
KR20040002656A (en) Content filtering for web browsing
JP2002207697A (en) System and method for dynamically displaying html form element
JP2016177807A (en) Detection and prevention of illegally purchasing content on internet
US8707251B2 (en) Buffered viewing of electronic documents
KR20010107572A (en) Trust-based link access control
US20120167220A1 (en) Seed information collecting device and method for detecting malicious code landing/hopping/distribution sites
JP5753302B1 (en) Program, method and system for warning access to web page
WO2006035201A1 (en) Safe viewing of web pages
EP2178009A1 (en) Method for filtering a webpage

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)