GB2369203A - Protection of intellectual property rights on a network - Google Patents

Protection of intellectual property rights on a network Download PDF

Info

Publication number
GB2369203A
GB2369203A GB0024360A GB0024360A GB2369203A GB 2369203 A GB2369203 A GB 2369203A GB 0024360 A GB0024360 A GB 0024360A GB 0024360 A GB0024360 A GB 0024360A GB 2369203 A GB2369203 A GB 2369203A
Authority
GB
United Kingdom
Prior art keywords
gt
lt
sep
tb
media
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0024360A
Other versions
GB0024360D0 (en
Inventor
Joseph Matthews
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Assertion Ltd
Original Assignee
ASSERTION Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ASSERTION Ltd filed Critical ASSERTION Ltd
Priority to GB0024360A priority Critical patent/GB2369203A/en
Publication of GB0024360D0 publication Critical patent/GB0024360D0/en
Publication of GB2369203A publication Critical patent/GB2369203A/en
Application status is Withdrawn legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

A method of protecting a users intellectual property rights comprises the steps of creating a digital identification for the users media, searching the network for possible infringing media and creating a digital identifier for the infringing media, comparing the two identifiers and producing a notification if the two identifiers are similar. Preferably at least five identifiers are produced for the original and offending media. Preferably, the notification is made to a human operator for a final determination. Preferably the search of the network is done using search spiders and the media retrieved is stored in a cache prior to the comparison step. The identifiers may be stored in a central database and copies held on a number of machines which may be in different countries, each machine having its own search spider. In one embodiment the network may be the Internet.

Description

PROTECTION OF INTELLECTUAL PROPERTY RIGHTS ON A NETWORK OF COMPUTERS The invention relates to methods of protecting intellectual property rights in a user's media on a network of computers, and particularly, although not exclusively, on the Internet.

The Internet, described below in relation to Figure 7, has provided a means of sharing digital media and content, located on publicly accessible networked computers, between vast and increasing numbers of users. The increased availability of information has coincided with improvements in network bandwidth and data compression techniques, greatly improving the quality of media available to general users.

This present situation has created new opportunities for media creators and publishers, exploiting new distribution channels and new domestic, academic and commercial audiences. The possibility of copyright infringement and hence the loss of valuable intellectual property is a very real difficulty associated with Internet media distribution.

Digital Rights Management (DRM) (see for example www. magex. com, and www. intertrust. com) systems based on encrypted delivery packages can help but they are often complex to use. Their effectiveness is limited as once the media has been delivered and has been released from the DRM container, unauthorised redistribution in an unencrypted form to unlicensed users can take place.

The recent widespread use of peer-to-peer media sharing networks such as Napster (see www. techweb. com/wire/story/TWB 20000821S0003) has also increased the possibility of copyright infringement.

The invention seeks to overcome at least some of the problems of the prior art.

According to the invention there is provided a method of protecting intellectual property rights on a network of computers, and a system for carrying out such a method, as set out in the accompanying claims.

A specific embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawing, in which: Figure 1 shows a schematic overview of the system; Figure 2 shows one way of using the system to generate revenue; Figure 3 shows an example of signature and key generation for image files; Figure 4 shows four stages in the process carried out by the system; Figure 5 shows examples of information displayed on computer screens to allow the overall performance and status of the system to be monitored; Figure 6 is a further overview of the system, showing error, match and mission control panels used for displaying system information to operators; Figure 7 shows a schematic overview of part of the Internet; Figure 8 shows a general overview of the database; and Figure 9 is a general overview of the system incorporated into the Internet.

A system representing one embodiment of the invention will be described. It is assumed that the system is controlled and maintained by a system operator, which may be a company, for the benefit of system users, who may be clients of the system operator.

Figure 1 shows a schematic overview of the system. A company control centre 2 makes use of a database 46 (an overview of which is shown in Figure 8) in order to search the Internet 6 using search spiders 8. Search Spiders 8 are the software agents that navigate and enumerate pages within an Internet domain. The search spiders 8 can search for data relating to for example music, sounds generally, images, video, software and text.

The system therefor has benefits for the music, film, publishing, image and software industries for example, as indicated by boxes 2,4, 10,12 and 14 respectively. The system can also be used by business users 16 and website designers 18 among others.

Figure 2 shows one way of using the system to generate revenue. At step 20 the system operator provides its terms of business to potential users of the system, which may include other website owners. The system operator may contact directly (marketing, sales) and indirectly (Terms & Conditions, resellers, PR) digital media content providers to offer IP protection.

At step 22 users of the system have their IP protected by the search spiders 8 which search for unauthorised use of the IP on other websites. The users may pay the system operator for this service, which directly creates revenue 24.

If unauthorised use of the user's IP is detected, the user can be notified, and the user can either use the company legal department or be directed to a suitable law firm (step 26) selected by the system operator. The system operator can form relationships or even partnerships with such law firms, which can also generate revenue 24 in the form of commission or retainers paid by the law firm to the system operator.

At step 28 revenue 24 can be generated from the information retrieved by the search spiders 8. The data retrieved by the search spiders 8 can be used to form profiles of different websites, which may be useful to other businesses or to businesses planning new ventures.

Referring to Figure 3, the system used identifies digital media without altering the media in any way. Unlike digital watermarking techniques (see for example www. cc. gatech. edu/-mjm/dw/watermarking. html) that alter the source media, the present system identifies media according to high-level content information. The present system uses the'look'of an image or the'sound'of a music file rather than its binary data content, therefore providing detection that is resilient to media changes.

The system uses detection methods which use multiple characterisation algorithms. Each type of media, for example images, has a corresponding group of signatures with <img class="EMIRef" id="024180628-00040001" />

<tb> 'adapter'or <SEP> translator <SEP> algorithms <SEP> to <SEP> translate <SEP> the <SEP> different <SEP> binary <SEP> file <SEP> formats <SEP> within <SEP> the <tb> . <SEP> '\ <tb> group <SEP> eg. <SEP> fromjpg, <SEP> gif, <SEP> bmp, <SEP> tiff <SEP> to <SEP> a <SEP> generic <SEP> format. <SEP> About <SEP> ten <SEP> algorithms <SEP> may <SEP> be <SEP> used <tb> for <SEP> each <SEP> media <SEP> type. <SEP> In <SEP> the <SEP> case <SEP> of <SEP> images, <SEP> they <SEP> examine <SEP> colour <SEP> content, <SEP> characteristic <tb> shapes, textures, scattered samples, Fourier harmonics (see for example http://mathworld. wolfram. com/Fourier Series. html) and so on. Each signature algorithm produces a binary result of several kilobytes in size as a record of the result. This is further processed into several small'key'integer values to provide very broad categorisation, allowing for a specified degree of correlation within the databases.

Figure 3 shows an example relating to image files, but the same process is applicable to files relating to other media. The starting point are the image files 30 representing the user's images, which can be in a number of different formats. These are translated into a generic format 34 by translator 32. A number of different signature algorithms, represented 36 to 42 although typically about 10 are used, are then used to process the generic image 34. Each signature algorithm 36 to 42 produces a signature file, having a size of I to 2 KB, which identifies the generic image 34. The signature files may relate to different characteristics of the image, such as shape and colour for example.

A number of key algorithms, represented generally by 44, are then used to process each signature file to produce a key, consisting of between I and 10 integers, corresponding to each signature file.

The signature files and keys are stored in database 46.

When the search spiders 8 search the Internet 6, the same process is repeated for each potentially infringing image which is found. Figure 3 therefore applies equally to the process carried out on suspect images. The suspect image file 30 is translated to a generic image 34, and the signature algorithms 36 to 42 then produce signature files for the suspect image. The key algorithms 44 are then used to produce keys for the suspect image, and the signature files and keys are stored in database 46.

In order to determine whether an IP infringement has taken place, the keys of the suspect image are first compared against the keys for the user's images. If the correlation is sufficiently high, then the next stage is to compare the signature files of the suspect image against those of the user's image. If the correlation for a number of different signatures is sufficiently high then the final stage is a visual comparison carried out by a person who is employed by, or acting for, the system operator.

It will be appreciated that this multi-stage process allows images which are very different from the user's images to be eliminated relatively quickly, and with less processing power than would be the case if the signatures, or even the image files themselves, were compared in the first instance.

The system user's media is either digitised by the system operator or provided in digital form to the system operator, either before or after its release. The media is processed with the same or similar algorithms to those used in its detection. The results are then stored in a central'DNA'database 46. The small integer'keys'mentioned above are used to simplify correlation between suspect and known media within database 46.

When a match between keys is found in the database, the original signature is retrieved and compared with the suspect signature. Methods based on least mean square difference values and other correlation techniques can be used. Genetic algorithms and other methods are used to find methods for processing the signatures into keys, maximising the effectiveness of the correlation at the fastest early stages of matching.

Key algorithms can be coded in C and C++ to maximise efficiency.

Figure 4 shows four stages in the process carried out by the system. In stage 1, the client 48 (ie the system user) provides the system operator with the media 50 that it wishes to protect. The media is provided to the system operator's headquarters 52 and/or national office 108, where the necessary signatures and keys are generated (at step 54) and stored in database 46 in accordance with the method of Figure 3. The client 48 also provides information relating to and identifying the client (at step 56), priority scheduling information (at step 58) which identifies for which of the client's media protection is most important, and at step 60 information relating to known or likely pirates or pirate websites at which the client 48 believes that infringement is likely to take place. All of this information is also stored in database 46.

Stage 2 represents schematically the process by which a pirate 62 (ie a person or organisation infringing the client's IP) uploads infringing media 64 to the Internet 6.

Stage 3 represents schematically the process carried out by the system in which the search spiders 8 search the Internet 6 for potentially infringing media. The media identified by the search spiders 8 is stored in a file cache 66, and is then analysed by media analysers 68, using information from the database 46. The media analysers 68 represent computers which carry out the comparisons of keys and signatures generated as mentioned in Figure 3 above. The data stored in the file cache 66 is gradually diminished as the media analysers 68 carry out the required analysis, and the file cache 66 therefore acts as a buffer.

At stage 4, the final comparisons between the client's media and potentially infringing media are carried out by human operators, who make the necessary visual or audio comparison. The client 48 is informed of the results, and if necessary the company legal department or an external law firm 72 can be involved.

Figure 5 shows examples of information displayed on computer screens. Step la represents the search spiders 8 (displayed on a computer screen) downloading domains to scan, which in Step Ib are fed into the database 46. Step 2 represents the Media Analyser 68 (displayed on a computer screen) feeding assumed matches into database 46. Step 3a shows the Match Control Panel 78 downloading assumed matches and feeding confirmed matches at Step 3b, into database 46. Mission Control Panel 74 (displayed on a computer screen) allows the overall performance and status of the system to be monitored, using information from database 46.

Referring to Figure 6, the database 46 is made available for interrogation by'spider servers', which are applications written in Java with an Enterprise Java Beans infrastructure. These manage groups of Search Spiders 8 distributed over several computers. Search Spiders 8 are the software agents that navigate and enumerate pages within an Internet domain. Having navigated and enumerated pages within an Internet domain the spiders 8 download digital media that falls within simple type and size characteristics. Media Analyser agents 68 then apply the signature algorithms to the previously downloaded media and log any suspect correlation within the database 46.

This is done asynchronously so that a file cache may be placed between the Search Spiders 8 and Media Analysers 68, making the most of Internet download bandwidth as it varies with time. Errors encountered by the Search Spiders 8 are also logged within database 46, and can be displayed on a computer screen 76. The results of matches can also be displayed on a suitable computer screen 78.

Targeting the Search Spiders 8 is managed by the central database 46, according to client priority information, logging the domains where previous infringements occur and gathering information from tip-offs (either from the client 48 or from suitable third parties) and human operators 70. Existing web content categorisation databases can be exploited to target known pirate sites. By feeding positive results back into the database, the Search Spiders 8 become self-learning and increasingly effective.

The human operators 70 respond to possible matches reported by the system via match control panel 78, either confirming or denying the match. They are provided with client information and a priority rating for the media found. The operator 70 then responds accordingly, informing the owner of the media and offering further company legal services and/or external lawyers. Human operators 70 oversee critical system parameters such as bandwidth and free server disk space via a central mission control panel 74. Error information from the search spiders 8 is also passed on to the operators 70 via the match control panel 78. Action taken, such as providing the spiders 8 with passwords to access particular sites, is then registered in the database 46.

Figure 7 shows a schematic overview of part of the Internet 6. Three national backbones 80 are shown, located in three different countries, and connected by three international links 82. Home users 84, businesses 86 and corporate web hosts 88 are connected to the Internet via Internet Service Providers (ISPs) 90, sometimes by the use of wireless or fibre-optic links 92. Hosting bunkers 94 are available in each country to allow businesses to obtain more rapid access to parts of the Internet in that country. In this regard it should be appreciated that the international links 82 are relatively slow at transmitting information compared to links within countries.

Figure 8 shows a general overview of database 46, with each box 150 representing a separate table in the database 46.

Figure 9 is a general overview of the system incorporated into the Internet 6 shown in Figure 7. A copy of database 46 is provided at a number of hosting bunkers 94, which may be located in different countries. Each hosting bunker 94 contains a"search spider farm", which is a collection of search spiders 8 residing on a number of computers 96.

A fault tolerant load balancer 98 shares the Internet bandwidth between the computers 96. The copies of database 46 are protected by a security firewall 100, and provided with local backup 102.

Still referring to Figure 9, there are also provided a number of national search centres 104, which essentially contain the same components as the hosting bunkers 94 except that they lack a copy of database 46. Instead, the results from the spiders are fed back to the copies of database 46 at other locations. Typically, the system would use one hosting bunker and two to three search centres per country.

The system also interacts with service users 106, which may include clients and informers. In addition to the company headquarters 52 of the system operator, there may also be separate national offices 108 also linked to the system.

A distributed and modular architecture is used throughout the system. The modularity allows for the rapid adoption of new media types and allows easy system maintenance.

Remote national search centres 104, located in Internet hosting bunkers, connect to the central database 46 or an intermediate replication of it. This allows the necessary download bandwidth to be achieved, unconstrained by limited international Internet links 82. Search routines for particular domains are allocated to particular national search centres 104 according to geographic hosting region, again facilitating rapid downloads.

The software components that form the core of the system have a high degree of modularity. This enables continual updating and allows for the possibility of redundancy in the system to boost reliability. New modules may be written to accommodate new digital media formats as they are invented, allowing their monitoring <img class="EMIRef" id="024180628-00090001" />

<tb> by <SEP> the <SEP> system. <SEP> Other <SEP> modules <SEP> allow <SEP> interfacing <SEP> with <SEP> all <SEP> popular <SEP> Internet <SEP> access <SEP> methods <tb> < ) <tb> including <SEP> HTTP, <SEP> FTP, <SEP> IRC, <SEP> ICQ, <SEP> RealMedia <SEP> etc. <SEP> and <SEP> provide <SEP> for <SEP> future <SEP> diversification. <tb>

Existing <SEP> embedded <SEP> signature <SEP> techniques <SEP> such <SEP> as <SEP> those <SEP> used <SEP> by <SEP> Digimarc, <SEP> (see <tb> www. digimarc. com) where that source media is altered to contain a signature, may also be incorporated into the system. This allows the system to be complementary to existing watermarking techniques, embracing them as part of the system.

In order to improve the detection response time and make best use of the available computing and bandwidth resources, the searches are split between broad sweeps of the Internet or network and narrowly focused surveys of known media hosting sites.

Intelligence for targeting these searches will be gathered from Internet categorisation databases, feedback from clients and industry associations, tip-offs from users, responses from a network of home users in return for incentives and from the results of previous searches.

The search results reported to the system users are prioritised according to the importance of the media and the hosting bandwidth and level of public accessibility to the infringing host. Using network statistic databases, the system can distinguish between sites that present an immediate threat and those that can only support a few simultaneous users. This information will dictate whether immediate contact is made with the client or whether the result should be recorded as part of a regular report.

Sample source code is set out on the following pages for implementing the spiders, media analysers and session bean in one embodiment of the invention. <img class="EMIRef" id="024180628-00110001" />

<tb> zu <tb> Example <SEP> 1 <SEP> : <SEP> Spider <SEP> Enterprise <SEP> Java <SEP> Bean <SEP> Source <SEP> Code <tb> <img class="EMIRef" id="024180628-00110002" />

<tb> import <SEP> java. <SEP> rmi. <SEP> RemoteException <SEP> ; <tb> import <SEP> javax. <SEP> ejb. <SEP> SessionBean <SEP> ; <tb> import javax. ejb. SessionContext ; import javax. naming. InitialContext ; <img class="EMIRef" id="024180628-00110003" />

<tb> import <SEP> javax. <SEP> rmi. <SEP> PortableRemoteObject <SEP> ; <tb> import <SEP> javax. <SEP> ejb. <SEP> DuplicateKeyException <SEP> ; <tb> import <SEP> javax. <SEP> ejb. <SEP> CreateException <SEP> ; <tb> import javax. ejb. FinderException; import javax. ejb. EJBException ; import java. util. Date ; import java. util. Vector ; import java. util. Enumeration; public class EJBSpiderBean implements SessionBean t public Integer enterScan (Integer DomainID, Date dateStart, String ScanStatus) throws DuplicateKeyException, CreateException, RemoteException { tblScan theEntry = null ; tblScanHome hometblScan = gettblScanHome () ; <img class="EMIRef" id="024180628-00110004" />

<tb> try <SEP> { <tb> theEntry <SEP> = <SEP> hometblScan. <SEP> create <SEP> (DomainID, <SEP> dateStart, <SEP> ScanStatus) <SEP> ; <tb> } <SEP> catch <SEP> (java. <SEP> rmi. <SEP> RemoteException <SEP> e) <SEP> { <tb> throw <SEP> new <SEP> EJBException <SEP> ("EJBSpiderBean, <SEP> enterScan <SEP> :"+e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> return <SEP> theEntry. <SEP> getScanID <SEP> () <SEP> ; <tb> ) <tb> public <SEP> void <SEP> setDateFinish <SEP> (Integer <SEP> ScanID, <SEP> Date <SEP> dateFinish) <tb> throws <SEP> RemoteException <tb> { <tb> tblScanHome <SEP> hometblScan=gettblScanHome <tb> tblScan <SEP> ts <SEP> = <SEP> null <SEP> ; <tb> try <SEP> { <tb> ts <SEP> = <SEP> hometblScan. <SEP> findByPrimaryKey <SEP> (ScanID) <SEP> ; <tb> } catch (FinderException fe) { System. out. println ("FinderException at EJBSpiderBean. setDateFinish") ; fe. printStackTrace () ; try ( enterError (ScanID, fe. getClass (). getName (), fe. getMessage (), "SpiderBean. setDateFinish (ScanID"+ScanID. intValue () +")","Medium", false); } catch (Exception re) { <img class="EMIRef" id="024180628-00110005" />

<tb> System. <SEP> out. <SEP> println <SEP> ("Exception <SEP> entering <SEP> error <SEP> in <SEP> spiderbean, <tb> setDateFinish <SEP> (scan. <SEP> findbyPrimaryKey)") <SEP> ; <tb> ) <tb> } <tb> ts. <SEP> setDateFinish <SEP> (dateFinish) <SEP> ; <tb> } public void setScanStatus (Integer ScanID, String ScanStatus) <img class="EMIRef" id="024180628-00110006" />

<tb> throws <SEP> RemoteException <tb> ( <tb> tblScanHome <SEP> hometblScan=gettblScanHome <SEP> () <SEP> ; <tb> tblScan <SEP> ts <SEP> = <SEP> null <SEP> ; <tb> try <SEP> f <tb> ts <SEP> = <SEP> hometblScan. <SEP> findByPrimaryKey <SEP> (ScanID) <SEP> ; <tb> } catch (FinderException fe) { System. out. println ("FinderException at EJBSpiderBean. setScanStatus"); fe. printStackTrace () ; try ( enterError (ScanID, fe. getClass (). getName (), fe. getMessage (), "SpiderBean. setScanStatus (ScanID"+ScanID. intValue () +")","Medium", false); } catch (Exception re) { System. out. println ("Exception entering error in spiderbean, setScanStatus (scan. findbyPrimaryKey)") ; } } ts. setScanStatus (ScanStatus) ; } public void enterError (Integer ScanID, String ErrorType, String <img class="EMIRef" id="024180628-00120001" />

<tb> ErrorMessage, <SEP> String <SEP> ErrorSource, <SEP> String <SEP> Priority, <SEP> boolean <SEP> Successlndex) <tb> throws <SEP> DuplicateKeyException, <SEP> CreateException <SEP> { <tb> tblError <SEP> theEntry <SEP> = <SEP> null <SEP> ; <tb> tblErrorHome <SEP> hometblError <SEP> = <SEP> gettblErrorHome <SEP> () <SEP> ; <tb> try <SEP> ( <tb> theEntry <SEP> = <SEP> hometblError. <SEP> create <SEP> (ScanID, <SEP> ErrorType, <SEP> ErrorMessage, <tb> ErrorSource, Priority, SuccessIndex) ; } catch (java. rmi. RemoteException e) { throw new EJBException ("EJBSpiderBean, enterError : "+e. getMessage ()) ; } } public Integer enterSourcePage (Integer ScanID, String FileName, String Path, String URL) <img class="EMIRef" id="024180628-00120002" />

<tb> throws <SEP> RemoteException, <SEP> DuplicateKeyException, <SEP> CreateException <SEP> { <tb> tblSourcePage theEntry = null; System. out. printinf"Scan ID"+ScanID) ; System. out. println ("FileName"+FileName) ; System. out. println ("Path"+Path) ; System. out. println ("URL"+URL); IblSourcePageHome hometblSourcePage = gettblSourcePageHome () ; try { theEntry = hometblSourcePage. create (ScanID, FileName, Path, URL); } catch (java. rmi. RemoteException e) { <img class="EMIRef" id="024180628-00120003" />

<tb> throw <SEP> new <SEP> EJBException <SEP> ("EJBSpiderBean, <SEP> enterSourcePage <SEP> : <tb> "+e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> return <SEP> theEntry. <SEP> getSourcePageID <SEP> () <SEP> ; <tb> } <tb> public <SEP> void <SEP> enterFoundMedia <SEP> (Integer <SEP> SourcePageID, <SEP> String <SEP> FileName, <SEP> String <tb> Path, <SEP> String <SEP> MediaType, <SEP> String <SEP> TestStatus) <tb> throws DuplicateKeyException, CreateException { tblFoundMedia theEntry = null; tblFoundMediaHome hometblFoundMedia = gettblFoundMediaHome () ; try { theEntry = hometblFoundMedia. create (SourcePageID, FileName, Path, MediaType, TestStatus); <img class="EMIRef" id="024180628-00130001" />

<tb> } <SEP> catch <SEP> (java. <SEP> rmi. <SEP> RemoteException <SEP> e) <SEP> { <tb> throw <SEP> new <SEP> EJBException <SEP> ("EJBSpiderBean, <SEP> enterFoundMedia <SEP> : <tb> "+e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> } <tb> public <SEP> Vector <SEP> getDomain <SEP> () <tb> { <tb> tblDomainHome <SEP> home <SEP> = <SEP> gettblDomainHome <SEP> () <SEP> ; <tb> Enumeration <SEP> enum <SEP> = <SEP> null <SEP> ; <tb> Vector <SEP> dom <SEP> = <SEP> new <SEP> Vector <SEP> () <SEP> ; <tb> tblDomain <SEP> next <SEP> ; <tb> Date <SEP> domNextScan <SEP> = <SEP> null <SEP> ; <tb> try <SEP> ( <tb> enum <SEP> = <SEP> home. <SEP> findDomains <SEP> () <SEP> ; <tb> System. <SEP> out. <SEP> println <SEP> ("finddomains <SEP> returned"+enum. <SEP> toString <SEP> ()) <SEP> ; <tb> } <SEP> catch <SEP> (Exception <SEP> ex) <SEP> { <tb> System. <SEP> out. <SEP> println <SEP> ("Exception <SEP> in <SEP> SpiderBean <SEP> getDOmain, <SEP> from <tb> findDomains") <SEP> ; <tb> ex. <SEP> printStackTrace <SEP> () <SEP> ; <tb> //DODO <SEP> 23/8/00 <SEP> Enter <SEP> errors <SEP> in <SEP> DB <tb> ) <tb> if <SEP> (enum <SEP> ! <SEP> =null) <tb> { <tb> next <SEP> = <SEP> (tblDomain) <SEP> enum. <SEP> nextElement <SEP> () <SEP> ; <tb> try <SEP> ( <tb> domNextScan <SEP> = <SEP> next. <SEP> getNextScanTimeO <SEP> ; <tb> } <SEP> catch <SEP> (RemoteException <SEP> re) <SEP> { <tb> System. <SEP> out. <SEP> println <SEP> ("Remote <SEP> exception <SEP> getting <SEP> next <SEP> scan <SEP> time") <SEP> ; <tb> } <tb> while <SEP> (enum. <SEP> hasMoreElements <SEP> ()) <tb> { <tb> tblDomain <SEP> td <SEP> = <SEP> (tblDomain) <SEP> enum. <SEP> nextElement <SEP> () <SEP> ; <tb> Date <SEP> d <SEP> = <SEP> null <SEP> ; <tb> try <SEP> ( <tb> d <SEP> = <SEP> td. <SEP> getNextScanTimeO <SEP> ; <tb> } <SEP> catch <SEP> (RemoteException <SEP> re) <SEP> { <tb> System. <SEP> out. <SEP> println <SEP> ("Remote <SEP> exception <SEP> getting <SEP> next <SEP> scan <SEP> time") <SEP> ; <tb> } <tb> if <SEP> (d. <SEP> before <SEP> (domNextScan)) <SEP> { <tb> next <SEP> = <SEP> td <SEP> ; <tb> domNextScan <SEP> = <SEP> d <SEP> ; <tb> } <tb> } <tb> try <SEP> { <tb> dom. <SEP> add <SEP> (next. <SEP> getDomainID <SEP> ()) <SEP> ; <tb> dom. <SEP> add <SEP> (next. <SEP> getDomainName <SEP> ()) <SEP> ; <tb> next. <SEP> setDateLastScan <SEP> (new <SEP> Date <SEP> ()) <SEP> ; <tb> } <SEP> catch <SEP> (RemoteException <SEP> re) <SEP> { <tb> System. <SEP> out. <SEP> println <SEP> ("Remote <SEP> Ex <SEP> at <SEP> SpiderBean. <SEP> getDomain") <SEP> ; <tb> re. <SEP> printStackTrace <SEP> () <SEP> ; <tb> } <tb> } <tb> return <SEP> dom <SEP> ; <tb> } <tb> private <SEP> tblDomainHome <SEP> gettblDomainHome <SEP> () <tb> { <tb> tblDomainHome <SEP> hometblDomain=null <SEP> ; <tb> try <SEP> ( <tb> InitialContext <SEP> ctx <SEP> = <SEP> new <SEP> InitialContext <SEP> () <SEP> ; <tb> Object objref = ctx. lookup ("ejb. tblDomainHome") ; hometblDomain = (tblDomainHome) PortableRemoteObject. narrow (objref, tblDomainHome. class) ; } catch (Exception NamingException) { NamingException. printStackTrace () ; } return hometblDomain ; } <img class="EMIRef" id="024180628-00140001" />

<tb> private <SEP> tblScanHome <SEP> gettblScanHome <SEP> () <tb> { <tb> tblScanHome <SEP> hometblScan=null <SEP> ; <tb> try <SEP> ( <tb> InitialContext <SEP> ctx <SEP> = <SEP> new <SEP> InitialContext <SEP> () <SEP> ; <tb> Object objref = ctx. lookup ("ejb. tblScanHome") hometblScan = (tblScanHome) PortableRemoteObject. narrow (objref, tblScanHome. class) ; } catch (Exception NamingException) { NamingException. printStackTrace() ; } return hometblScan ; } private tblErrorHome gettblErrorHome() { <img class="EMIRef" id="024180628-00140002" />

<tb> tblErrorHome <SEP> hometblError=null <SEP> ; <tb> try <SEP> { <tb> InitialContext <SEP> ctx <SEP> = <SEP> new <SEP> InitialContext <SEP> () <SEP> ; <tb> Object objref = ctx. lookup("ejb.tblErrorHome"); hometblError = (tbIErrorHome) PortableRemoteObject. narrow (objref, tblErrorHome. class) ; } catch (Exception NamingException) { NamingException. printStackTrace() ; } return hometblError ; } private tbISourcePageHome gettblSourcePageHome () { <img class="EMIRef" id="024180628-00140003" />

<tb> tbISourcePageHome <SEP> hometblSourcePage=null <SEP> ; <tb> try <SEP> ( <tb> InitialContext <SEP> ctx <SEP> = <SEP> new <SEP> InitialContext <SEP> () <SEP> ; <tb> Object objref = ctx. lookup("ejb.tblSourcePageHome"); hometblSourcePage = (tblSourcePageHome) PortableRemoteObject. narrow (objref, tblSourcePageHome. class) ; } catch (Exception NamingException) { NamingException. printStackTrace () ; } return hometblSourcePage ; } private tblFoundMediaHome gettblFoundMediaHome() <img class="EMIRef" id="024180628-00140004" />

<tb> { <tb> tblFoundMediaHome <SEP> hometblFoundMedia=null <SEP> ; <tb> try <SEP> { <tb> InitialContext <SEP> ctx <SEP> = <SEP> new <SEP> InitialContext <SEP> () <SEP> ; <tb> Object <SEP> objref <SEP> = <SEP> ctx. <SEP> lookup <SEP> ("ejb. <SEP> tblFoundMediaHome") <tb> hometblFoundMedia = (tbIFoundMediaHome) PortableRemoteObject. narrow (objref, tblFoundMediaHome. class); <img class="EMIRef" id="024180628-00150001" />

<tb> } <SEP> catch <SEP> (Exception <SEP> NamingException) <SEP> { <tb> NamingException. <SEP> printStackTrace <SEP> () <SEP> ; <tb> ) <tb> return <SEP> hometblFoundMedia <SEP> ; <tb> } <tb> public void setNoPages (Integer DomainID, Integer NoPages) throws RemoteException { tblDomainHome hometblDomain=gettblDomainHome() ; tblDomain td = null ; try { td = hometblDomain. findByPrimaryKey (DomainID) ; } catch (FinderException fe) { System. out. println ("FinderException at EJBSpiderBean. setNoPages"); fe. printStackTrace () ; } td. setNoPages (NoPages); } public void setNoDownloads (Integer DomainID, Integer NoDownloads) throws RemoteException { tblDomainHome hometblDomain=gettblDomainHome () ; tblDomain td = null; try { td = hometblDomain. findByPrimaryKey (DomainID) ; } catch (FinderException fe) { System. out. println ("FinderException at EJBSpiderBean. setNoDownloads") ; fe. printStackTrace () ; } <img class="EMIRef" id="024180628-00150002" />

<tb> td. <SEP> setNoDownloads <SEP> (NoDownloads) <SEP> ; <tb> } <tb> public <SEP> void <SEP> ejbCreate <SEP> () <SEP> {} <tb> public <SEP> void <SEP> setSessionContext <SEP> (SessionContext <SEP> context) <SEP> {} <tb> public void ejbRemove () {} public void ejbActivate () {} public void ejbpassivateo i I public void ejbLoad() { } public void ebjStore() { } } Example 2: Media Analyser Source Code import java. rmi. RemoteException ; import javax. ejb. SessionBean ; <img class="EMIRef" id="024180628-00160001" />

<tb> import <SEP> javax. <SEP> ejb. <SEP> SessionContext <SEP> ; <tb> import <SEP> javax. <SEP> naming. <SEP> InitialContext <SEP> ; <tb> import <SEP> javax. <SEP> rmi. <SEP> PortableRemoteObject <SEP> ; <tb> import <SEP> javax. <SEP> ejb. <SEP> DuplicateKeyException <SEP> ; <tb> import <SEP> javax. <SEP> ejb. <SEP> CreateException <SEP> ; <tb> import <SEP> javax. <SEP> ejb. <SEP> FinderException <SEP> ; <tb> import javax. ejb. EJBException ; import java. util. Date ; import java. util. Vector ; import java. util. Enumeration ; import java. util. Arrays ; <img class="EMIRef" id="024180628-00160002" />

<tb> public <SEP> class <SEP> EJBMediaAnalyserBean <SEP> implements <SEP> SessionBean <SEP> { <tb> public <SEP> Vector <SEP> getUnscannedMedia <SEP> () <tb> { <tb> //DODO <SEP> 29/8/00 <SEP> Find <SEP> by <SEP> scanID, <SEP> go <SEP> through <SEP> children <SEP> I <SEP> guess..... <tb> tblFoundMediaHome <SEP> hometblFoundMedia <SEP> = <SEP> gettblFoundMediaHome <SEP> () <SEP> ; <tb> Enumeration enum = null; Vector vec = new Vector () ; try { <img class="EMIRef" id="024180628-00160003" />

<tb> enum <SEP> = <SEP> hometblFoundMedia. <SEP> findUnscannedMedia <SEP> () <SEP> ; <tb> while <SEP> (enum. <SEP> hasMoreElements <SEP> ()) <SEP> { <tb> tblFoundMedia <SEP> theEntry <SEP> = <SEP> (tblFoundMedia) <SEP> enum. <SEP> nextElement <SEP> () <SEP> ; <tb> vec. <SEP> add <SEP> ( <SEP> (Integer) <SEP> theEntry. <SEP> getPrimaryKey <SEP> ()) <SEP> ; <tb> } } catch (Exception e) { throw new EJBException ("EJBMediaAnalyserBean, getUnscannedMedia: <img class="EMIRef" id="024180628-00160004" />

<tb> "+e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> return <SEP> vec <SEP> ; <tb> } <tb> public <SEP> String <SEP> getMediaFileName <SEP> (Integer <SEP> FoundMediaID) <tb> { <tb> tblFoundMediaHome hometblFoundMedia = gettblFoundMediaHome () ; tblFoundMedia theEntry = null; String s ="No such FoundMediaID" ; try { theEntry = hometblFoundMedia. findByPrimaryKey (FoundMediaID) ; s = theEntry. getFileName () ; } catch (Exception ex) { <img class="EMIRef" id="024180628-00160005" />

<tb> throw <SEP> new <SEP> EJBException <SEP> ("EJBMediaAnalyserBean, <SEP> getMediaPath <SEP> : <tb> "+ex. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> return <SEP> s <SEP> ; <tb> } <tb> public <SEP> boolean <SEP> compareFingerPrints <SEP> (Integer <SEP> FoundMediaID, <SEP> byte <SEP> [] <SEP> array) <tb> { <tb> //TODO <SEP> 24/8/00 <SEP> Can <SEP> move <SEP> this <SEP> into <SEP> a <SEP> finder <SEP> method <SEP> ? <SEP> Can <SEP> I <SEP> pass <SEP> enum <SEP> from <tb> entity <SEP> to <SEP> session <SEP> to <SEP> Client <SEP> ? <tb> //Eventually will need to pass pattern ID to narrow search... tblFingerPrintDataHome hometblFingerPrintData = gettblFingerPrintDataHome () ; boolean match = false; try { Enumeration enum = hometblFingerPrintData. findAllFingerPrintData () ; if (enum! =null) { <img class="EMIRef" id="024180628-00170001" />

<tb> while <SEP> (enum. <SEP> hasMoreElements <SEP> ()) <SEP> { <tb> tblFingerPrintData theEntry = (tblFingerPrintData) enum. nextElement () ; if (Arrays. equals (theEntry. getBlobData (), array) ) { match = true; //TODO 22/8/00 Similarity Index insertAssumedMatch (FoundMediaID, theEntry. getMediaID (), new <img class="EMIRef" id="024180628-00170002" />

<tb> Integer <SEP> (0)) <SEP> ; <tb> } <tb> } <tb> } <tb> } <SEP> catch <SEP> (Exception <SEP> e) <SEP> { <tb> throw <SEP> new <SEP> EJBException <SEP> ("EJBMediaAnalyserBean, <SEP> compareFingerPrints <SEP> : <tb> "+e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> try <tb> if <SEP> (match) <tb> { <tb> setTested <SEP> (FoundMediaID) <SEP> ; <tb> } <tb> else <tb> { <tb> deleteFoundMedia <SEP> (FoundMediaID) <SEP> ; <tb> } <tb> } <SEP> catch <SEP> (RemoteException <SEP> re) <SEP> { <tb> throw <SEP> new <SEP> EJBException <SEP> ("EJBMediaAnalyserBean, <tb> compareFingerPrints <SEP> (2ndEx) <SEP> :"+re. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> return <SEP> match <SEP> ; <tb> } <tb> private void insertAssumedMatch (Integer FoundMediaID, Integer MediaID, Integer SimilarityIndex) { tblAssumedMatch theEntry = null; tblAssumedMatchHome hometblAssumedMatch = gettblAssumedMatchHome () ; <img class="EMIRef" id="024180628-00170003" />

<tb> try <SEP> ( <tb> theEntry = hometblAssumedMatch. create (FoundMediaID, MediaID, SimilarityIndex) ; } catch (Exception e) { throw new EJBException ("EJBMediaAnalyserBean, insertAssumedMatch :"+ e. getMessage ()) ; } } public void insertFPD (Integer MediaID, byte [] Data, Integer Size) { tblFingerPrintData theEntry = null; <img class="EMIRef" id="024180628-00170004" />

<tb> tblFingerPrintDataHome <SEP> hometblFPD <SEP> = <SEP> gettblFingerPrintDataHomeO <SEP> ; <tb> try <SEP> { <tb> theEntry <SEP> = <SEP> hometblFPD. <SEP> create <SEP> (MediaID, <SEP> Data, <SEP> Size) <SEP> ; <tb> } <SEP> catch <SEP> (Exception <SEP> e) <SEP> { <tb> throw <SEP> new <SEP> EJBException <SEP> ("EJBMediaAnalyserBean, <SEP> insertFPD <SEP> : <tb> "+e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> <img class="EMIRef" id="024180628-00180001" />

<tb> } <tb> private <SEP> void <SEP> deleteFoundMedia <SEP> (Integer <SEP> FoundMediaID) <tb> throws <SEP> RemoteException <tb> { <tb> tblFoundMedia <SEP> theEntry <SEP> = <SEP> null <SEP> ; <tb> tblFoundMediaHome <SEP> hometblFoundMedia <SEP> = <SEP> gettblFoundMediaHome <SEP> () <SEP> ; <tb> try { theEntry = hometblFoundMedia. findByPrimaryKey (FoundMediaID) ; theEntry. remove () ; } catch (Exception e) { <img class="EMIRef" id="024180628-00180002" />

<tb> throw <SEP> new <SEP> EJBException <SEP> 'EJBMediaAnalyserBean, <SEP> deleteFoundMedia <SEP> : <tb> "+e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> } private void setTested (Integer FoundMediaID) <img class="EMIRef" id="024180628-00180003" />

<tb> throws <SEP> RemoteException <tb> { <tb> tblFoundMedia <SEP> theEntry <SEP> = <SEP> null <SEP> ; <tb> tblFoundMediaHome <SEP> hometblFoundMedia <SEP> = <SEP> gettblFoundMediaHome <SEP> () <SEP> ; <tb> try <SEP> { <tb> theEntry <SEP> = <SEP> hometblFoundMedia. <SEP> findByPrimaryKey <SEP> (FoundMediaID) <SEP> ; <tb> theEntry. setTestStatus ("Analysed") ; } catch (Exception e) { throw new EJBException ("EJBMediaAnalyserBean, setTested : "+e. getMessage ()) ; } } /* public void enterError (Integer ScanID, String ErrorType, String <img class="EMIRef" id="024180628-00180004" />

<tb> ErrorMessage, <SEP> String <SEP> ErrorSource, <SEP> String <SEP> Priority, <SEP> String <SEP> Successlndex) <tb> throws <SEP> DuplicateKeyException, <SEP> CreateException <SEP> { <tb> tblError theEntry = null; tblErrorHome hometblError = gettblErrorHome () ; try { theEntry = hometblError. create (ScanID, ErrorType, ErrorMessage, ErrorSource, Priority, Successlndex) ; <img class="EMIRef" id="024180628-00180005" />

<tb> } <SEP> catch <SEP> (java. <SEP> rmi. <SEP> RemoteException <SEP> e) <SEP> { <tb> throw <SEP> new <SEP> EJBException <SEP> ("EJBMediaAnalyserBean, <SEP> enterError <SEP> : <tb> "+e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> } <tb> } <SEP> */ <tb> private tblAssumedMatchHome gettblAssumedMatchHome () { <img class="EMIRef" id="024180628-00180006" />

<tb> tblAssumedMatchHome <SEP> hometblAssumedMatch=null <SEP> ; <tb> try <SEP> { <tb> InitialContext <SEP> ctx <SEP> = <SEP> new <SEP> InitialContext <SEP> () <SEP> ; <tb> Object objref = ctx. lookup ("ejb. tblAssumedMatchHome") ; hometblAssumedMatch = (tblAssumedMatchHome) PortableRemoteObject. narrow (objref, tblAssumedMatchHome. class); } catch (Exception NamingException) ( NamingException. printStackTrace () ; } <img class="EMIRef" id="024180628-00190001" />

<tb> return <SEP> hometblAssumedMatch <SEP> ; <tb> } <tb> private <SEP> tblFoundMediaHome <SEP> gettblFoundMediaHome <SEP> () <tb> { <tb> tblFoundMediaHome hometblFoundMedia=null; try { <img class="EMIRef" id="024180628-00190002" />

<tb> InitialContext <SEP> ctx <SEP> = <SEP> new <SEP> InitialContext <SEP> () <SEP> ; <tb> Object <SEP> objref <SEP> = <SEP> ctx. <SEP> lookup <SEP> ("ejb. <SEP> tblFoundMediaHome") <SEP> ; <tb> hometblFoundMedia = (tblFoundMediaHome) PortableRemoteObject. narrow (objref, tblFoundMediaHome. class); } catch (Exception NamingException) { NamingException. printStackTrace () ; } return hometblFoundMedia; } private tblFingerPrintDataHome gettblFingerPrintDatahome() { <img class="EMIRef" id="024180628-00190003" />

<tb> tblFingerPrintDataHome <SEP> hometblFingerPrintData=null <SEP> ; <tb> try <SEP> { <tb> InitialContext <SEP> ctx <SEP> = <SEP> new <SEP> InitialContext <SEP> () <SEP> ; <tb> Object <SEP> objref <SEP> = <SEP> ctx. <SEP> lookup <SEP> ("ejb. <SEP> tblFingerPrintDataHome") <tb> hometblFingerPrintData <SEP> = <tb> (tblFingerPrintDataHome) <SEP> PortableRemoteObject. <SEP> narrow <SEP> (objref, <tb> tblFingerPrintDataHome. <SEP> class) <SEP> ; <tb> } <SEP> catch <SEP> (Exception <SEP> NamingException) <SEP> { <tb> NamingException. <SEP> printStackTrace <SEP> () <SEP> ; <tb> } <tb> return <SEP> hometblFingerPrintData <SEP> ; <tb> } <tb> private <SEP> tblErrorHome <SEP> gettblErrorHome <SEP> () <tb> { <tb> tblErrorHome <SEP> hometblError=null <SEP> ; <tb> try <SEP> { <tb> InitialContext <SEP> ctx <SEP> = <SEP> new <SEP> InitialContext <SEP> () <SEP> ; <tb> Object <SEP> objref <SEP> = <SEP> ctx. <SEP> lookup <SEP> ("ejb. <SEP> tblErrorHome") <SEP> ; <tb> hometblError = (tblErrorHome) PortableRemoteObject. narrow (objref, tblErrorHome. class); } catch (Exception NamingException) { <img class="EMIRef" id="024180628-00190004" />

<tb> NamingException. <SEP> printStackTrace <SEP> () <SEP> ; <tb> } return hometblError ; } public void ejbCreate() { } public void setSessionContext (SessionContext context) { } public void ejbRemove () {} public void ejbActivate () {} public void ejbPassivate () {} public void ejbLoad() {} public void ejbstoreo I I } Example 3: Spider Source Code import java. io. * ; import java. net. * ; import java. util. * ; import java. rmi. RemoteException ; import javax. rmi. PortableRemoteObject ; class Spider { private String m~startURL ; private String mstrBaseDirectory ; <img class="EMIRef" id="024180628-00200001" />

<tb> private <SEP> String <SEP> mstrStartingBaseDirectory <SEP> ; <tb> private <SEP> BufferedReader <SEP> mbufferedReader <SEP> ; <tb> private <SEP> Hashtable <SEP> mmediaPathVector <SEP> ; <tb> private Hashtable m~foundLinks ; private Hashtable mservedLinks ; private int m~maxPage = 10; private Hashtable mexternalLinks ; private Integer m~ScanID ; private Integer m~DomainID ; private int noDownloads ; private EJBSpider spider; public Spider (EJBSpiderHome ejbSH) { <img class="EMIRef" id="024180628-00200002" />

<tb> try <tb> spider = (EJBSpider) PortableRemoteObject. narrow (ejbSH. create (), EJBSpider. class) ; } catch (Exception ex) { System. out. println ("Cannot create EJBSpider Session bean"); ex. printStackTrace () ; System. exit (1) ; } try { m bufferedreader = new BufferedReader (new FileReader ( "Spider. class") ) ; } catch (FileNotFoundException e) {System. out. println ("Error : " + e. toStringf) +""+ e. getMessage()) ; } m~mediaPathVector = new Hashtable () ; m~foundLinks = new Hashtable (); m~servedLinks = new Hashtable (); mexternalLinks = new Hashtable () ; <img class="EMIRef" id="024180628-00200003" />

<tb> noDownloads <SEP> = <SEP> 0 <SEP> ; <tb> } <tb> private <SEP> void <SEP> getBasicDirectory <SEP> (String <SEP> strUrl) <tb> { <tb> if <SEP> ( <SEP> (strUrl <SEP> == <SEP> null) <SEP> I <SEP> I <SEP> (strurl. <SEP> length <SEP> () <SEP> == <SEP> 0)) <tb> System. <SEP> out. <SEP> println <SEP> ("error <SEP> ! <SEP> ! <SEP> !... <SEP> netsearch <SEP> won't <SEP> work <SEP> without <tb> a proper base directory...") ; if( strUrl. endsWith ("/"))//ural ends with '/' -- > do not want to have this one (- : strUrl = strUrl. substring ( 0, (strUrl. length ()-l)) ; int lastOccurence = strUrl. lastIndexOf ( "/" ); //starting after http:// if (lastOccurence > 8) m~strBaseDirectory = strurl. substring ( 0, lastOccuren /*+ 1*/ ) ; <img class="EMIRef" id="024180628-00210001" />

<tb> else <SEP> mstrBaseDirectory <SEP> = <SEP> strUrl <SEP> ; <tb> if <SEP> ( <SEP> ! <SEP> mstrBaseDirectory. <SEP> endsWith <SEP> ("/")) <SEP> mstrBaseDirectory <SEP> = <tb> m-strBaseDirectory. <SEP> concat <SEP> ("/") <SEP> ; <tb> System. <SEP> out. <SEP> println <SEP> ( <tb> "&commat;m~strBaseDirectory <SEP> :"+ <SEP> m~strBaseDirectory) <SEP> ; <tb> } <tb> public <SEP> boolean <SEP> search <SEP> () <tb> { <tb> boolean back = false; try { Vector v = spider. getDomain (); mDomainID = (Integer) v. elementAt (0) ; <img class="EMIRef" id="024180628-00210002" />

<tb> /7m-DomainID <SEP> = <SEP> new <SEP> Integer <SEP> (2) <SEP> ; <tb> //mstartURL <SEP> ="http <SEP> ://www. <SEP> microsoft. <SEP> com" <SEP> ; <tb> mstartURL <SEP> = <SEP> (String) <SEP> v. <SEP> elementAt <SEP> (l) <SEP> ; <tb> System. <SEP> out. <SEP> println <SEP> ("DomainID"+mDomainID) <SEP> ; <tb> System. <SEP> out. <SEP> println("StartURL:"+mstartURL) <SEP> ; <tb> mScanID <SEP> = <SEP> spider. <SEP> enterScan <SEP> (m~DomainID, <SEP> new <SEP> Date <SEP> (), <tb> "Started") <SEP> ; <tb> } catch (Exception ex) { System. out. println ("Ex at spider getDomain or enterScan"); ex. printStackTrace () ; try { spider. enterError ( m~ScanID, ex. getClass (). getName (), ex. getMessage (),"Spider. search (getDom/enterScan)","Medium", false); } catch (Exception re) { <img class="EMIRef" id="024180628-00210003" />

<tb> System. <SEP> out. <SEP> println <SEP> ("Exception <SEP> entering <SEP> error <SEP> in <tb> search <SEP> (getDom/enterScan)") <SEP> ; <tb> } <tb> System. <SEP> exit <SEP> (1) <SEP> ; <tb> } <tb> getBasicDirectory <SEP> (m~startURL) <SEP> ; <tb> m~strStartingBaseDirectory <SEP> = <SEP> m~strBaseDirectory <SEP> ; <tb> System. <SEP> out. <SEP> println <SEP> ("getting <SEP> started..."+"\n") <SEP> ; <tb> mfoundLinks. <SEP> clear <SEP> () <SEP> ; <tb> m servedLinks. clear () ; String[] s = downloadHTMLPage (mstrStartingBaseDirectory) ; try { //Limit length of filename to 215 chars, otherwise get strange SQL Exception if (((String)s[0]).length() > 215) s [0] = ( (String) s [0]). substring (0, 215) ; <img class="EMIRef" id="024180628-00210004" />

<tb> Integer <SEP> spID <SEP> = <SEP> spider. <SEP> enterSourcePage <SEP> (m~ScanID, <SEP> s <SEP> [0], <SEP> s <SEP> [l], <tb> s [2]) ; Vector ar = new Vector () ; ar. add (m~strStartingBaseDirectory) ; ar. add (spID); <img class="EMIRef" id="024180628-00210005" />

<tb> m~foundLinks. <SEP> put <SEP> ( <SEP> new <SEP> Integerfm <SEP> foundLinks. <SEP> size <SEP> () <SEP> +1), <SEP> ar) <SEP> ; <tb> // first url (-: } catch (Exception re) ( System. out. println ("Unable to enter 1st SourcePage"); re. printStackTrace (); try { <img class="EMIRef" id="024180628-00220001" />

<tb> spider. <SEP> enterError <SEP> (mScanID, <SEP> re. <SEP> getClass <SEP> (). <SEP> getName <SEP> (), <tb> re. <SEP> getMessage <SEP> (),"Spider. <SEP> search <SEP> (Sourcepage)","Medium", <SEP> false) <SEP> ; <tb> } catch (Exception e) { System. out. println ("Exception entering error in search (enterSourcePage)") ; } System. exit (l) ; } System. out. println ("get links to all media at comlete homepage..." +"\n") ; if (getAllLinks ()) { System. out. println ("downloading htmlpages\n"); //downloadHtml~Pages() ; System. out. println ("downloading media"+"\n"); downloadMedia () ; back = true ; <img class="EMIRef" id="024180628-00220002" />

<tb> try <tb> System. <SEP> out. <SEP> printin("SCANSTATUS"+mScanID+"Scan <tb> completed") <SEP> ; <tb> spider. <SEP> setScanStatus <SEP> ( <SEP> mScanID,"Scan <SEP> completed") <SEP> ; <tb> spider. <SEP> setDateFinish <SEP> ( <SEP> mScanID, <SEP> new <SEP> Date <SEP> ()) <SEP> ; <tb> } catch (RemoteException re) { try { spider. enterError (mScanID, <img class="EMIRef" id="024180628-00220003" />

<tb> re. <SEP> getClass <SEP> (). <SEP> getName <SEP> (), <SEP> re. <SEP> getMessage <SEP> (), <tb> "Spider. <SEP> search <SEP> (setScanStatus/setDateFinish)","Medium", <SEP> false) <SEP> ; <tb> } <SEP> catch <SEP> (Exception <SEP> e) <SEP> { <tb> System. <SEP> out. <SEP> println <SEP> ("Exception <SEP> entering <SEP> error <SEP> in <tb> search <SEP> (setScanStatus/setDateFinish)") <SEP> ; <tb> } <tb> System. <SEP> out. <SEP> println <SEP> ("RemoteException <SEP> at <tb> setScanStatus/setDateFinish") <SEP> ; <tb> re. <SEP> printStackTrace <SEP> () <SEP> ; <tb> } <tb> } <tb> else <tb> back <SEP> = <SEP> false <SEP> ; <tb> for (int i = 1 ; i < m~externalLinks.size()+1 ; i++) System. out. println ( "m~exteernalLinks: " + (String) m~externalLinks. get (new Integer (i))) ; System. out. println( "m~externalLinks. size () :"+ m~externalLinks. size ()) ; System. out. println ( "m~servedLinks. size () :"+ m~servedLinks. size () ); System. out. println ("mfoundLinks. size () : " + m~foundLinks. size ()) ; System. out. println( "m~mediaPathVector. size () :"+ m~mediaPathvector. size () ) ; <img class="EMIRef" id="024180628-00220004" />

<tb> System. <SEP> out. <SEP> println <SEP> ("\nreturned <SEP> :"+ <SEP> back) <SEP> ; <tb> return <SEP> back <SEP> ; <tb> } <tb> private <SEP> boolean <SEP> getAllLinks <SEP> () <tb> { <tb> if <SEP> (m~foundLinks. <SEP> isEmpty <SEP> ()) <SEP> return <SEP> false <SEP> ; <SEP> does <SEP> not <tb> make <SEP> much <SEP> sense... <SEP> (- <SEP> : <tb> boolean <SEP> back <SEP> = <SEP> true <SEP> ; <tb> int counter = 0 ; while ( ! (m~foundLinks. isEmpty ()) & & back & & (counter < m~maxPage) ) { back = false; // want a proper end !! ! !! //get link out of m~foundLinks Enumeration keys = m~foundLinks.keys() ; String pageURL = new String (""); Integer sourcePageID = null; Integer key = new Integer (O) ; if (keys. hasMoreElements() ) { key = (Integer) keys. nextelement Vector a = (Vector)m~foundLinks. get (key); pageURL = (String) a. elementAt (O) ; sourcePageID = (Integer) a. elementAt (l) ; } //test link if ( (pageURL! = null) & & sourcePageID! =null & & (testIfWebpage( pageURL))) { System. out. println ("*** pageURL:"+ pageURL ); getBasicDirectory (pageURL) ;//update base directory try { URL strURL = new URL ( pageURL ); //System. out. println (strURL. toString()+";;;"); mbufferedReader = new BufferedReader (new InputStreamReader (strURL. openStream () )) ; String inputLine = new String () ; //get next Line http-code while ((inputLine = getNextLineHTTPCode ()) ! = null) { // System. out. println ("inputLine : "+ inputLine) ; //contains inputLine an important tag? int indexImportantTag =-1; while ( (indexImportantTag = getImportantTag ( <img class="EMIRef" id="024180628-00230001" />

<tb> inputLine)) <SEP> ! <SEP> =-1) <tb> System. <SEP> out. <SEP> println <SEP> ( <tb> "indexImportantTag <SEP> :"+ <SEP> indexImportantTag) <SEP> ; <tb> String <SEP> pathName <SEP> = <SEP> new <SEP> String <SEP> () <SEP> ; <tb> //which tag is it? ?? if ( (inputLine. substring ( indexImportantTag)). startsWith ( "href=" )) ! ( (inputLine. substring ( <img class="EMIRef" id="024180628-00230002" />

<tb> indexImportantTag)). <SEP> startsWith <SEP> ("HREF=" <tb> { <tb> //link <SEP> to <SEP> another <SEP> page <tb> inputLine = inputLine. substring ( index Important Tag +"href=". length ()) ; pathName = getpathName ( inputLine) ; // System. out. println( "pathName: " + pathName) ; if (pathName ! = null) dealWithPageLink (pathName, sourcePageID) ; } <img class="EMIRef" id="024180628-00240001" />

<tb> else//so <SEP> only"SCR"and"scr"stays <tb> over <tb> { <tb> //link <SEP> to <SEP> another <SEP> page <SEP> or <tb> media <SEP> ? <SEP> ? <SEP> ? <tb> inputLine = inputLine. substring ( indexImportantTag +"scr=". length () pathName = getPathName ( inputLine) ; // System. out. println ("pathName :" + pathName) ; <img class="EMIRef" id="024180628-00240002" />

<tb> iff <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> html")) <tb> II <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> HTML")) <tb> i <SEP> t <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> htm")) <tb> II <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> HTM")) <tb> Il <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> shtm")) <tb> II <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> SHTM")) <tb> ! <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> cgi")) <tb> II <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> CGI")) <tb> (pathname. endswith (". asp")) # (pathName.endsWith(".ASP")) <img class="EMIRef" id="024180628-00240003" />

<tb> ! <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> cfm")) <tb> II <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> CFM")) <tb> t <SEP> ! <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> jsp")) <tb> II <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> JSP")) <tb> ! <SEP> (pathName. <SEP> endsWith <SEP> (". <SEP> sml")) <tb> # (pathName. endsWith (". SML"))) { //link to another page if (pathName ! = null) dealWithPageLink (pathName, sourcePageID ); } else if ( (pathName. endsWith (". gif")) # (pathName.endsWith (". GIF")) (pathName. endsWith (". jpeg")) ! ! (pathName. endsWith (". JPEG")) (pathName. endsWith (". jpe") ) 3 (pathName.endsWith(".JPE")) (pathName. endsWith (". jpg")) H (pathName. endsWith (". JPG")) # (pathName.endsWith(".bmp")) # (pathName.endsWith(".BMP")) (pathName. endsWith (". tif")) (pathName. endsWith (". TIF"))) { //link to media if pathName ! =null) <img class="EMIRef" id="024180628-00240004" />

<tb> dealWithMediaLink <SEP> (pathName, <SEP> sourcePageID) <SEP> ; <tb> } <tb> else <tb> if <SEP> ( <SEP> (pathName <SEP> ! <SEP> = <SEP> null) <SEP> & & <tb> (pathName <SEP> ! <SEP> ="")) <tb> mexternalLinks. <SEP> put <SEP> ( <tb> new Integer (m~externalLinks. size ()), pathName) ; } }//while important tag }//while read line from url mbufferedReader. close () ; } catch (IOException e) { System. out. println ("Error:"+ e. toStringO +"" + e. getMessage()) ; e. printStackTrace () ; //counter--; back = false; //m~out. println ("S2"+ e. toStringO) ;//tell server about it try { spider. enterError ( mScanID, e. getClass (). getName (), e. getMessage (),"Spider. getAllLinks","Medium", false); } catch (Exception re) { <img class="EMIRef" id="024180628-00250001" />

<tb> System. <SEP> out. <SEP> println <SEP> ("Exception <SEP> entering <SEP> error <tb> in <SEP> getAllLinks") <SEP> ; <tb> } <tb> } <tb> //comletelely <SEP> read <SEP> page <SEP> gets <SEP> moved <SEP> from <SEP> found <SEP> to <SEP> served <tb> vector if{ saveLinkAsServed( pageURL ) ) if m~foundLinks. remove (key) ! = null) back = true ;//proper end counter ++; <img class="EMIRef" id="024180628-00250002" />

<tb> }//end <SEP> while <SEP> m <SEP> found <SEP> is <SEP> empty <SEP> ?... <tb> if <SEP> (pageURL <SEP> ! <SEP> = <SEP> null) <tb> { <tb> if <SEP> (mfoundLinks. <SEP> remove <SEP> (key <SEP> null) <tb> { counter back = true ;//link is no webpage... delete it and just get the next one... <img class="EMIRef" id="024180628-00250003" />

<tb>

System. <SEP> out. <SEP> println <SEP> ("... <SEP> i <tb> am <SEP> here <SEP> + <SEP> back..."+ <SEP> back) <SEP> ; <tb> } <tb> } <tb> } <tb> try <SEP> { <tb> spider. <SEP> setNoPages <SEP> ( <SEP> DomainID, <SEP> new <SEP> Integer <SEP> (counter) <tb> spider. <SEP> setScanStatusf <SEP> mScanID,"Got <SEP> All <SEP> Links") <SEP> ; <tb> } catch (RemoteException re) { System. out. println("RemoteException at setNoPages/setScanStatus") ; re. printStackTrace() ; <img class="EMIRef" id="024180628-00250004" />

<tb> try <SEP> { <tb> spider. <SEP> enterError <SEP> (mScanID, <SEP> re. <SEP> getClass <SEP> (). <SEP> getName <SEP> (), <tb> re. <SEP> getMessage <SEP> (),"Spider. <SEP> getAllLinks <SEP> (setNoPages/setScanStatus)","Medium", <tb> false) <SEP> ; <tb> } catch (Exception e) { System. out. println ("Exception entering error in getAllLinks (setNoPages/setScanStatus)") ; } } System. out. println ("getAllLinks () returned:"+ back) ; return back; } private String getNextLineHTTPCode () { // System. out. println( "... i am here String inputLine = new String () ; try { inputLine = m bufferedReader.readLine(); <img class="EMIRef" id="024180628-00260001" />

<tb> if <SEP> (inputLine <SEP> ! <SEP> = <SEP> null) <tb> ( <tb> //inputLine <SEP> = <SEP> inputLine. <SEP> toLowerCase <SEP> () <SEP> ;// <tb> webservers <SEP> are <SEP> case-sensitive <SEP> ! <SEP> ! <SEP> ! <tb> int index = inputLine. index0f ("") ; while ( inputLine. indexOf(" ") > -1) { StringBuffer strBuf = new StringBuffer (inputLine ); strBuf. deleteCharAt( index ); inputLine = strBuf. toString () ; <img class="EMIRef" id="024180628-00260002" />

<tb> index <SEP> = <SEP> inputLine. <SEP> indexOf <SEP> ("") <SEP> ; <tb> } <tb> } <tb> } <tb> catch <SEP> (IOException <SEP> e) <tb> ( <tb> System. <SEP> out. <SEP> println <SEP> ("Error <SEP> :"+ <SEP> e. <SEP> toString <SEP> () <SEP> +""+ <tb> e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> e. printStackTrace () ; try { spider. enterError ( ScanID, e. getClass (). getName (), e. getMessage (),"Spider. getNextLineHTTPCode", "Medium", false) ; } catch (Exception re) { System. out. println ("Exception entering error in getNextLineHTTPCode") ; } } <img class="EMIRef" id="024180628-00260003" />

<tb> return <SEP> inputLine <SEP> ; <tb> } <tb> private <SEP> boolean <SEP> testIfWebpage <SEP> (String <SEP> pageLink) <tb> { <tb> boolean back = false; if (pageLink. endsWith( "/" ) ) pageLink = pageLink. substring ( 0, (pageLink.length() -1) ); int lastOc = pageLink. lastIndexOf("."); String strEnd = new String () i if (lastOc ! =-1) strEnd = pageLink. substring (lastOc+1, pageLink. length ()) ; if ( (strEnd. compareToIgnoreCase ("html") == 0) II (strEnd. comparetoIgnoreCase("HTML") == 0) # <img class="EMIRef" id="024180628-00260004" />

<tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("htm") <SEP> == <SEP> 0) <SEP> I <SEP> I <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("HTM") <SEP> == <SEP> 0) <SEP> II <tb> <img class="EMIRef" id="024180628-00270001" />

<tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("shtm") <SEP> == <SEP> 0) <SEP> II <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("SHTM") <SEP> == <SEP> 0) <SEP> ici <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("sml") <SEP> == <SEP> 0) <SEP> I <SEP> I <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("SML") <SEP> == <SEP> 0) <SEP> il <tb> // <SEP> (strEnd. <SEP> compareToIgnoreCase <SEP> ("cgi") <SEP> == <SEP> 0) <SEP> II <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("CGI") <SEP> == <SEP> 0) <SEP> II <SEP> II <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("asp") <SEP> 0) <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("ASP") <SEP> == <SEP> 0) <SEP> II <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("cfm") <SEP> 0) <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("CFM") <SEP> == <SEP> 0) <SEP> il <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("jsp") <SEP> 0) <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("JSP") <SEP> == <SEP> 0) <SEP> ici <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("com") <SEP> == <SEP> 0) <SEP> II <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("net") <SEP> == <SEP> 0) <SEP> 11 <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("de") <SEP> ==0) <SEP> ! <SEP> ! <tb> (strEnd. <SEP> compareToIgnoreCase <SEP> ("uk") <SEP> == <SEP> 0)) <tb> back <SEP> = <SEP> true <SEP> ; <tb> System. <SEP> out. <SEP> println <SEP> ( <SEP> pageLink <SEP> +"testIfWebpage <SEP> returned <SEP> :"+ <tb> back) <SEP> ; <tb> return <SEP> back <SEP> ; <tb> } <tb> private <SEP> int <SEP> getImportantTag <SEP> (String <SEP> inputLine) <tb> ( <tb> System. <SEP> out. <SEP> println <SEP> ( <tb> "getImportantTaginputLine <SEP> :"+ <SEP> inputLine) <SEP> ; <tb> int <SEP> array <SEP> [] <SEP> = <SEP> new <SEP> int <SEP> [4] <SEP> ; <tb> array <SEP> [0] <SEP> = <SEP> inputLine. <SEP> index0f <SEP> ("href=\"") <SEP> ; <tb> array <SEP> [1] <SEP> = <SEP> inputLine. <SEP> indexOf <SEP> ("HREF=\"") <SEP> ; <tb> array <SEP> zu <SEP> inputLine. <SEP> indexOf <SEP> ("src=\"") <SEP> ; <tb> array <SEP> [3] <SEP> = <SEP> inputLine. <SEP> indexOf <SEP> ("SRC=\"") <SEP> ; <tb> Arrays. <SEP> sort <SEP> ( <SEP> array) <SEP> ; <tb> int <SEP> back <SEP> =-1 <SEP> ; <tb> for <SEP> (int <SEP> i <SEP> = <SEP> 0 <SEP> ; <SEP> i < array. <SEP> length <SEP> ; <SEP> i++) <tb> ( <tb> if <SEP> array <SEP> [i] <SEP> ! <SEP> =-1) <tb> { <tb> back <SEP> = <SEP> array <SEP> [i <tb> i= <SEP> 369 <SEP> ; <tb> ) <tb> } <tb> System. <SEP> out. <SEP> println <SEP> ( <tb> "getImportantTag&num;back <SEP> :"+ <SEP> back) <SEP> ; <tb> return <SEP> back <SEP> ; <tb> ) <tb> private <SEP> String <SEP> getPathName <SEP> (String <SEP> inputLine) <tb> ( <tb> System. <SEP> out. <SEP> println <SEP> ( <tb> "getPathName&num;in <SEP> :"+ <SEP> inputLine) <SEP> ; <tb> String <SEP> pathName <SEP> = <SEP> new <SEP> String <SEP> () <SEP> ; <tb> //pathName <SEP> has <SEP> to <SEP> be <SEP> BEFORE <SEP> next <SEP> ImportantTag <tb> // <SEP> (as <SEP> important <SEP> Tag <SEP> for <SEP> this <SEP> path <SEP> has <SEP> been <SEP> removed <SEP> befor <SEP> calling <tb> this <SEP> methode) <tb> int <SEP> nextlndex <SEP> = <SEP> getImportantTag <SEP> (inputLine) <SEP> ; <tb> <img class="EMIRef" id="024180628-00280001" />

<tb> System. <SEP> out. <SEP> println <SEP> ("nextlndex <SEP> : <tb> "+ <SEP> nextlndex) <SEP> ; <tb> if <SEP> (nextIndex <SEP> == <SEP> 0) <tb> return <SEP> null <SEP> ;//must <SEP> not <SEP> be-important <SEP> tag <SEP> is <SEP> supposed <SEP> to <SEP> be <tb> cut <tb> else { int pathStartIndex = inputLine.indexOf( "\""); <img class="EMIRef" id="024180628-00280002" />

<tb> int <SEP> pathEndIndex <SEP> = <SEP> inputLine. <SEP> index0f <SEP> ("\"", <SEP> pathStartIndex+1 <tb> ) <SEP> ; <tb> if <SEP> (pathStartIndex <SEP> ==-1) <tb> { <tb> //path <SEP> is <SEP> in <SEP> next <SEP> line-- > <SEP> get <SEP> it <SEP> ! <tb> inputLine = getNextLineHTTPCode() ; //new search for path pathStartIndex = inputLine. indexOff"\"") ; pathEndIndex = inputLine.indexOf( "\"", pathStartIndex ); nextlndex = getImportanTag( inputLine ); if pathStartIndex ==-1) return null ;//must not be <img class="EMIRef" id="024180628-00280003" />

<tb> ! <SEP> ! <SEP> ! <SEP> (something <SEP> went <SEP> wrong <SEP> ! <SEP> ! <SEP> !) <tb> } <tb> if <SEP> (pathEndIndex <SEP> ==-1) <SEP> pathEndIndex <SEP> = <SEP> inputLine. <SEP> length <SEP> () <SEP> ; <SEP> il <tb> last"got <SEP> lost <SEP> (- <SEP> : <tb> // System. out. println ( "pathStartIndex : "+ pathStartIndex +"pathEndIndex : "+ pathEndIndex); //get pathname-finally (- : pathName = inputLine. substring (pathStartIndex+1, pathEndIndex ); } // System. out. println ( "getPathName&num;back : "+ pathName) ; <img class="EMIRef" id="024180628-00280004" />

<tb> return <SEP> pathName <SEP> ; <tb> } <tb> private <SEP> void <SEP> dealWithPageLink <SEP> (String <SEP> foundpathName, <SEP> Integer <SEP> sourcePageID <tb> ) <tb> { System. out. println ( "dealWithPageLink&num;begin : "+ foundpathName) ; iff (foundpathName. endsWith (". mp3")) # (foundpathName. endsWith (". MP3")) <img class="EMIRef" id="024180628-00280005" />

<tb> I <SEP> I <SEP> (f <SEP> oundpathname. <SEP> endswi <SEP> th <SEP> avi <tb> (foundpathName. <SEP> endsWith <SEP> (". <SEP> AVI")) <tb> I <SEP> I <SEP> (foundpathName. <SEP> endsWith <SEP> (". <SEP> mov")) <SEP> II <tb> (foundpathName. <SEP> endsWith <SEP> (". <SEP> MOV")) <tb> Il <SEP> (foundpathName. <SEP> endsWith <SEP> (". <SEP> wav")) <SEP> II <tb> (foundpathName. <SEP> endsWith <SEP> (". <SEP> WAV")) <tb> I <SEP> i <SEP> (foundpathName. <SEP> endsWith <SEP> (". <SEP> mp2")) <SEP> II <tb> (foundpathName. <SEP> endsWith <SEP> (". <SEP> MP2"))) <tb> { <tb> dealWithMediaLink <SEP> (foundpathName, <SEP> sourcePageID) <SEP> ; <tb> return <SEP> ; <tb> } <tb> <img class="EMIRef" id="024180628-00290001" />

<tb> if <SEP> (foundpathName <SEP> =="" <SEP> ! <SEP> ! <tb> foundpathName. <SEP> toLowerCase <SEP> (). <SEP> indexOf <SEP> ("mailto <SEP> : <SEP> ! <SEP> =-1) <SEP> return <SEP> ; <tb> //pre-preparations <tb> String <SEP> temp <SEP> = <SEP> mstrBaseDirectory <SEP> ; <tb> foundpathName <SEP> = <SEP> foundpathName. <SEP> trim <SEP> () <SEP> ; <tb> if <SEP> (foundpathName. <SEP> endsWith <SEP> ("/")) <SEP> foundpathName <SEP> = <tb> foundpathName. <SEP> substring <SEP> (0, <SEP> foundpathName. <SEP> length <SEP> ()-1 <tb> if <SEP> (foundpathName. <SEP> startsWith <SEP> ("/")) <tb> { <tb> foundpathName <SEP> = <SEP> foundpathName. <SEP> substring <SEP> ( <SEP> 1 <tb> getBasicDirectory <SEP> (m-strStartingBaseDirectory <tb> foundpathName <SEP> = <SEP> m~strBaseDirectory <SEP> + <SEP> foundpathName <SEP> ; <tb> System. <SEP> out. <SEP> println <SEP> ("i <SEP> am <SEP> here"+ <SEP> foundpathName) <SEP> ; <tb> } <tb> while <SEP> (foundpathName. <SEP> indexOf <SEP> ("../") <SEP> ! <SEP> =-1)//go <SEP> back <SEP> in <tb> directories <tb> { <tb> StringBuffer <SEP> strBuf <SEP> = <SEP> new <SEP> StringBuffer <SEP> (foundpathName) <SEP> ; <tb> getBasicDirectory <SEP> (m <SEP> strBaseDirectory) <SEP> ; <tb> int <SEP> oc <SEP> = <SEP> foundpathName. <SEP> indexOf <SEP> ("../") <SEP> ; <tb> strBuf <SEP> = <SEP> strBuf. <SEP> replace <SEP> (0, <SEP> (oc+3), <SEP> mstrBaseDirectory) <SEP> ; <tb> foundpathName <SEP> = <SEP> strBuf. <SEP> toString <SEP> () <SEP> ; <tb> } <tb> if <SEP> ( <SEP> ( <SEP> (last <SEP> Directory <SEP> (m <SEP> strBaseDi <SEP> rectory <SEP> compareToIgnoreCase <tb> firstDirectory <SEP> (foundpathName))) <SEP> ==0) <tb> { <tb> //to <SEP> avoid <SEP> : <SEP> http <SEP> ://.../dirl/dirl/example. <SEP> htm <tb> getBasicDirectory <SEP> (mstrBaseDirectory) <SEP> ; <tb> foundpathName <SEP> = <SEP> mstrBaseDirectory. <SEP> concat <SEP> (foundpathName) <SEP> ; <tb> } <tb> if <SEP> ( <SEP> (foundpathName.startsWith(mstrBaseDirectory)) <SEP> j <SEP> ! <tb> (foundpathName. <SEP> startsWith(m~strStartingBaseDirectory))) <tb> { <tb> //already <SEP> formated <tb> TODO <SEP> Can <SEP> sort <SEP> this <SEP> out <SEP> better <SEP> when <SEP> use <SEP> String <SEP> hash <SEP> function <tb> because <tb> //Then <SEP> we <SEP> know <SEP> where <SEP> the <SEP> path <SEP> hashs <SEP> to. <tb>

Enumeration <SEP> enum <SEP> = <SEP> mfoundLinks. <SEP> elements <SEP> () <SEP> ; <tb> boolean <SEP> indexA <SEP> = <SEP> false <SEP> ; <tb> while <SEP> (enum. <SEP> hasMoreElements <SEP> () <SEP> & & <SEP> ! <SEP> indexA) <SEP> ( <tb> Vector <SEP> v <SEP> = <SEP> (Vector) <SEP> enum. <SEP> nextelement <tb> if <tb> ( <SEP> ( <SEP> (String) <SEP> v. <SEP> elementAt <SEP> (0)). <SEP> equalsIgnoreCase <SEP> (foundpathName)) <tb> indexA <SEP> = <SEP> true <SEP> ; <tb> } <tb> /*enum <SEP> = <SEP> m <SEP> serverlinks. <SEP> elementso <SEP> ; <tb> boolean <SEP> indexB <SEP> = <SEP> false <SEP> ; <tb> while <SEP> (enum. <SEP> hasMoreElementsO <SEP> & & <SEP> ! <SEP> index) <SEP> { <tb> Object <SEP> [] <SEP> obj <SEP> = <SEP> enum. <SEP> nextElementO <SEP> ; <tb> if <SEP> ( <SEP> (String) <SEP> obj <SEP> [0]. <SEP> equalsIgnoreCase <SEP> (foundpathName)) <tb> indexB <SEP> = <SEP> true <SEP> ; <tb> } <SEP> *I <tb> boolean <SEP> indexB <SEP> = <SEP> mservedLinks. <SEP> contains <SEP> (foundpathName) <SEP> ; <tb> if <SEP> ( <SEP> ! <SEP> indexA <SEP> & & <SEP> ! <SEP> index) <tb> { <tb> int key = m~foundLinks. size (); while (m-f oundlinks. contains Key (new Integer (key))) key++; String [] s = downloadHTMLPage (foundpathName); try { //Limit length of filename to 215 chars, otherwise get strange SQL Exception <img class="EMIRef" id="024180628-00300001" />

<tb> if <SEP> (((String) <SEP> s <SEP> [0]). <SEP> length <SEP> () <SEP> > 215) <tb> s <SEP> [0] <SEP> = <SEP> ( <SEP> (String) <SEP> s <SEP> [0]). <SEP> substring <SEP> (0, <SEP> 215) <SEP> ; <tb> Integer <SEP> spID <SEP> = <SEP> spider. <SEP> enterSourcePage <SEP> (mScanID, <tb> s [0], s [l], s [2]) ; Vector ar = new Vector () ; ar. add (foundpathName) ; <img class="EMIRef" id="024180628-00300002" />

<tb> ar. <SEP> add <SEP> (spID) <SEP> ; <tb> mfoundLinks. <SEP> put <SEP> ( <SEP> new <SEP> Integer <SEP> (key), <SEP> ar) <SEP> ; <tb> } <SEP> catch <SEP> (Exception <SEP> re) <SEP> { <tb> System. <SEP> out. <SEP> println <SEP> ("Unable <SEP> to <SEP> enter <SEP> SourcePage") <SEP> ; <tb> re. printStackTrace () ; try { spider. enterError ( ScanID, re. getClass (). getName (), re. getMessage (), "Spider. dealWithPageLink (enterSourcePage) ","Medium", false); } catch (Exception e) { System. out. println ("Exception entering error in dealWithPageLink (enterSourcePage) ") ; } } } } <img class="EMIRef" id="024180628-00300003" />

<tb> else <tb> { <tb> if <SEP> ( <SEP> (foundpathName. <SEP> startsWith <SEP> ("http <SEP> ://")) <SEP> ! <SEP> <tb> (foundpathName. <SEP> startsWith <SEP> ("HTTP <SEP> ://")) <tb> Il <SEP> (foundpathName. <SEP> startsWith <SEP> ("ftp <SEP> ://")) <SEP> ici <tb> (foundpathName. <SEP> startsWith <SEP> ("FTP <SEP> ://")) <tb> Il <SEP> (foundpathName. <SEP> startsWith <SEP> ("www.")) <SEP> II <tb> (foundpathName. <SEP> startsWith <SEP> ("WWW."))) <tb> { <tb> m <SEP> externalLinks. <SEP> put <SEP> ( <SEP> new <SEP> Integer <SEP> (mexternalLinks. <SEP> size <SEP> () <tb> + <SEP> 1), <SEP> foundpathName) <SEP> ;//found <SEP> link <SEP> away <SEP> from <SEP> baseDirectory <tb> } <tb> else <tb> { <tb> if ( ! (foundpathName.startsWith(m strBaseDirectory )) ) {//relative link foundpathName = m~strBaseDirectory. concat (foundpathName) ; Enumeration enum = m~foundLinks. elements () ; boolean indexA = false; <img class="EMIRef" id="024180628-00300004" />

<tb> while <SEP> (enum. <SEP> hasMoreElements <SEP> () <SEP> & & <SEP> ! <SEP> indexA) <SEP> { <tb> Vector v = (Vector) enum. nextElement (); if (String) v. elementAt (0)). equalsIgnoreCase (foundpathName)) indexA = true; } boolean indexB = m~servedLinks. contains (foundpathName); /* enum = mserverLinks. elements (); boolean indexB = false; while (enum. hasMoreElements() & & ! index) { Object [] obj = enum. nextElement () ; if ((String)obj[0].equalsIgnoreCase(foundpathName)) <img class="EMIRef" id="024180628-00310001" />

<tb> indexB <SEP> = <SEP> true <SEP> ; <tb> } <SEP> *I <tb> if <SEP> iindexA <SEP> & & <SEP> iindexB) <tb> { <tb> int key = m~foundLinks. size (); while (mfoundLinks. contains Key (new Integer (key))) key++; String [] s = downloadHTMLPage (foundpathName) ; try { //Limit length of filename to 215 chars, otherwise get strange SQL Exception <img class="EMIRef" id="024180628-00310002" />

<tb> if <SEP> ( <SEP> ( <SEP> (String) <SEP> s <SEP> [0]). <SEP> length <SEP> () <SEP> > 215) <tb> s <SEP> [0] <SEP> = <tb> ( <SEP> (String) <SEP> s <SEP> [0]). <SEP> substring <SEP> (0, <SEP> 215) <SEP> ; <tb> Integer <SEP> spID <SEP> = <tb> spider. <SEP> enterSourcePage <SEP> (m~ScanID, <SEP> seD], <SEP> s[l], <SEP> s[2]) <SEP> ; <tb> Vector v = new Vector (); v. add (foundpathName) ; <img class="EMIRef" id="024180628-00310003" />

<tb> v. <SEP> add <SEP> (spID) <SEP> ; <tb> mfoundLinks. <SEP> put <SEP> ( <SEP> new <SEP> Integer <SEP> (key), <tb> v) <SEP> ; <tb> } <SEP> catch <SEP> (Exception <SEP> re) <SEP> { <tb> System. out. println ("Unable to enter SourcePage"); re. printStackTrace(); <img class="EMIRef" id="024180628-00310004" />

<tb> try <SEP> ( <tb> spider. <SEP> enterError <SEP> ( <SEP> ScanID, <tb> re. <SEP> getClass <SEP> (). <SEP> getName <SEP> (), <SEP> re. <SEP> getMessage <SEP> (), <tb> "Spider. <SEP> dealWithPageLink <SEP> (enterSourcePage2)","Medium", <SEP> false) <SEP> ; <tb> } catch (Exception e) { System. out. println ("Exception entering error in dealWithPageLink (enterSourcePage2)") ; } } <img class="EMIRef" id="024180628-00310005" />

<tb> } <tb> } <tb> } <tb> } <tb> m <SEP> strBaseDirectory <SEP> = <SEP> temp <SEP> ; <tb> System. <SEP> out. <SEP> println <SEP> ( <tb> "dealWithPageLinktfoundpathName <SEP> :"+ <SEP> foundpathName) <SEP> ; <tb> } <tb> private <SEP> void <SEP> dealWithMediaLink <SEP> (String <SEP> foundpathName, <SEP> Integer <SEP> sourcePageID <tb> ) <tb> { // System. out. println ( <img class="EMIRef" id="024180628-00310006" />

<tb> "dealWithMediaLink&num;foundpathName <SEP> :"+ <SEP> foundpathName) <SEP> ; <tb> if <SEP> (foundpathName <SEP> =="") <SEP> return <SEP> ; <tb> //save <SEP> media <SEP> link <tb> String temp = m~strBaseDirectory ; if (foundpathName. endsWith ("/") ) foundpathName = foundpathName. substring (0, foundpathName. length ()-1 if (foundpathName. startsWith ("/")) { foundpathName = foundpathName.substring( 1 ); <img class="EMIRef" id="024180628-00320001" />

<tb> getBasicDirectory <SEP> (m~strStartingBaseDirectory) <SEP> ; <tb> foundpathName <SEP> = <SEP> mstrBaseDirectory <SEP> + <SEP> foundpathName <SEP> ; <tb> //System. <SEP> out. <SEP> printin <SEP> ("i <SEP> am <SEP> here"+ <SEP> foundpathName) <SEP> ; <tb> } <tb> while <SEP> (foundpathName. <SEP> indexOf <SEP> ("../") <SEP> ! <SEP> =-1) <tb> { StringBuffer strBuf = new StringBuffer ( foundpathName ); getBasicDirectory ( m~strBaseDirectory ) ; int oc = foundpathName. indexOf ("../") ; strBuf = strBuf. replace( 0, (oc+3), mstrBaseDirectory) ; foundpathName = strBuf. toString () ; } if ( (lastDirectory ( mstrBaseDirectory)). compareToIgnoreCase ( firstDirectory( (foundpathName))) == 0) { to avoid: http ://.../dirl/dirl/example. gif getBasicDirectory m strBaseDirectory) ; foundpathName = m~strBaseDirectory. concat (foundpathName) ; } if( foundpathName. startsWith ("/")) foundpathName = foundpathName. substring (1) ; <img class="EMIRef" id="024180628-00320002" />

<tb> if <SEP> ( <SEP> (foundpathName. <SEP> startsWith <SEP> ( <SEP> ~strBaseDirectory)) <SEP> H <tb> (foundpathName. <SEP> startsWith <SEP> ( <SEP> mstrStartingBaseDirectory))) <tb> { TODO sort out hashTable keys Enumeration enum = mmediaPathVector. elements () ; boolean inVector = false; <img class="EMIRef" id="024180628-00320003" />

<tb> while <SEP> (enum. <SEP> hasMoreElements <SEP> () <SEP> & & <SEP> ! <SEP> inVector) <SEP> { <tb> Vector <SEP> v <SEP> = <SEP> (Vector) <SEP> enum. <SEP> nextElement <SEP> () <SEP> ; <tb> if ( ( (String) v. elementAt (0)). equals (foundpathName)) inVector = true; } if ( ! investor Vector vec = new Vector () ; vec. add (foundpathName); vec. add (sourcePageID) ; <img class="EMIRef" id="024180628-00320004" />

<tb> mmediaPathVector. <SEP> put <SEP> ( <SEP> new <tb> Integer <SEP> (mmediaPathVector. <SEP> size <SEP> () <SEP> + <SEP> 1), <SEP> vec) <SEP> ; <SEP> readily <SEP> formated <SEP> link <tb> } <tb> } <tb> else <tb> { <tb> iff <SEP> ! <SEP> foundpathName. <SEP> startsWith <SEP> ( <SEP> mstrBaseDirectory)) <tb> { <tb> foundpathName <SEP> = <tb> m~strBaseDirectory. <SEP> concat <SEP> (foundpathName) <SEP> ; <tb> //relative <SEP> link <tb> Enumeration <SEP> enum <SEP> = <SEP> mmediaPathVector. <SEP> elements <SEP> () <SEP> ; <tb> boolean inVector = false; while (enum. hasMoreElements() & & ! inVector) { Vector v = (Vector) enum. nextElement () ; if ( ( (String) v. elementAt (0)). equals (foundpathName)) <img class="EMIRef" id="024180628-00320005" />

<tb> inVector <SEP> = <SEP> true <SEP> ; <tb> } <tb> iff'inVector) ( Vector vec = new Vector (); vec. add (foundpathName) ; <img class="EMIRef" id="024180628-00330001" />

<tb> vec. <SEP> add <SEP> (sourcePageID) <SEP> ; <tb> mmediaPathVector. <SEP> put <SEP> (new <tb> Integer <SEP> (mmediaPathVector. <SEP> size <SEP> () <SEP> + <SEP> 1), <SEP> vec) <SEP> readily <SEP> formated <SEP> link <tb> } <tb> } } m~strBaseDirectory=temp; // System. out. println ( "dealWithMediaLink&num;foundpathName : " + foundpathName ); } <img class="EMIRef" id="024180628-00330002" />

<tb> private <SEP> String <SEP> firstDirectory <SEP> (String <SEP> link)//returns <SEP> ONLY <SEP> the <SEP> name <SEP> (no <tb> "or/) <tb> { <tb> System. <SEP> out. <SEP> println <SEP> ("firstDirectorylink <SEP> :" <tb> + <SEP> link) <SEP> ; <tb> int firstOc =-1; if (link. startsWith ("/")) firstOc = 1; else firstOc = 0; int nextOc = link.indexOf("/", firstOc) ; String back = new String () ; if ( (nextOc ! =-1) & & (firstOc ! =-1)) back = link. substring ( firstOc, nextOc ); else back = link; <img class="EMIRef" id="024180628-00330003" />

<tb> System. <SEP> out. <SEP> println <SEP> ("firstDirectory&num;back <SEP> :"+ <SEP> back) <SEP> ; <tb> return back ; } <img class="EMIRef" id="024180628-00330004" />

<tb> private <SEP> String <SEP> lastDirectory <SEP> (String <SEP> link)//returns <SEP> ONLY <SEP> the <SEP> name <SEP> (no <tb> "or/) <tb> { <tb> System. <SEP> out. <SEP> println <SEP> ("lastDirectory&num;link <SEP> :"+ <tb> link) ; if (link. endsWith ("/")) link = link. substring (0, link. length()-1); <img class="EMIRef" id="024180628-00330005" />

<tb> else <SEP> if <SEP> ( <SEP> (link. <SEP> lastIndexOf <SEP> ("/")) <SEP> ! <SEP> =-1) <tb> link <SEP> = <SEP> link. <SEP> substring <SEP> (link. <SEP> lastIndexOf <SEP> ("/"), <SEP> link. <SEP> length <SEP> ()1) <SEP> ; <tb> int <SEP> nextOc <SEP> = <SEP> link. <SEP> lastIndexOf <SEP> ("/") <SEP> ; <tb> String <SEP> back <SEP> = <SEP> new <SEP> String <SEP> () <SEP> ; <tb> if ( nextOc != -1 ) back = link. substring (nextOc + 1, link. length ()) ; else back = link; System. out. println ( "lastDirectory&num;back: " + back) ; return back; } <img class="EMIRef" id="024180628-00330006" />

<tb> private <SEP> boolean <SEP> saveLinkAsServed <SEP> (String <SEP> link) <tb> { <tb> if <SEP> (link <SEP> =="") <SEP> return <SEP> false <SEP> ; <tb> <img class="EMIRef" id="024180628-00340001" />

<tb> int <SEP> key <SEP> = <SEP> mservedLinks. <SEP> size <SEP> () <SEP> ; <tb> while <SEP> mservedLinks. <SEP> containsKey <SEP> (new <SEP> Integer <SEP> (key))) <SEP> key++ <SEP> ; <tb> mservedLinks. <SEP> put <SEP> ( <SEP> new <SEP> Integer <SEP> (key), <SEP> link) <SEP> ; <tb> return <SEP> true <SEP> ; <tb> } <tb> private void downloadMedia() { downloading media for (int i = 1; i < m~mediaPathVector. size ()/*+1*/; i++) { //getting just the names <img class="EMIRef" id="024180628-00340002" />

<tb> Vector <SEP> vec <SEP> = <SEP> (Vector) <SEP> mmediaPathVector. <SEP> get <SEP> (new <SEP> Integer <SEP> (i)) <SEP> ; <tb> 7 <tb> String <SEP> fileName <SEP> = <SEP> (String) <SEP> vec. <SEP> elementAt <SEP> (O) <SEP> ; <tb> System. out. println( fileName ); int lastOc = fileName. lastIndexOf ("/"); if (lastOc ! =-1) fileName = fileName. substring (lastOc + 1 ); //Limit length of filename to 215 chars if (fileName. length () > 215) fileName = (fileName). substring (0,215) ; <img class="EMIRef" id="024180628-00340003" />

<tb> //Check <SEP> for <SEP> chars <SEP> not <SEP> allowed <SEP> by <SEP> windows. <tb> fileName=fileName. <SEP> replace <SEP> (' <SEP> : <tb> fileName=fileName. <SEP> replace <SEP> (' <SEP> ; <tb> fileName=fileName. <SEP> replace <SEP> ? <tb> fileName=fileName. <SEP> replace('"','&commat;') <tb> fileName=fileName. <SEP> replace <SEP> (' < ','&commat;') <tb> fileName=fileName. <SEP> replace <SEP> (' > ','&commat;') <tb> fileName=fileName. replace (' #', '&commat;'); /* String fileName = ((String) (m~mediaPathVector. get (new Integer (i)))) ; fileName = f ilename. replace <img class="EMIRef" id="024180628-00340004" />

<tb> fileName <SEP> = <SEP> fileName. <SEP> replace <SEP> (' <SEP> :','*') <SEP> ; <tb> *1 <tb> try <tb> { <tb> URL mediaURL = new URL ( (String) vec. elementAt (O)) ; InputStream mediaIn = mediaURL. openStream () ; //*** preparation for check <img class="EMIRef" id="024180628-00340005" />

<tb> //save <SEP> media <SEP> in <SEP> file-- > <SEP> found <SEP> a <SEP> db <SEP> match <SEP> ! <SEP> ! <SEP> ! <tb> File <SEP> file <SEP> = <SEP> new <SEP> File <SEP> ( <tb> "c <SEP> :/netsertion/spider/downloadMedia/"+ <SEP> fileName) <SEP> ; <tb> FileOutputStream fileOut = new FileOutputStream ( file ); byte mediaInData [] = new byte [500] ; int sumReading = 0; int reading median. read (mediaInData, 0, 500) ; sumReading = reading ; <img class="EMIRef" id="024180628-00340006" />

<tb> while <SEP> (reading <SEP> ! <SEP> =-1) <tb> { <tb> fileOut. <SEP> writef <SEP> mediaInData, <SEP> 0, <SEP> reading) <SEP> ; <tb> reading <SEP> = <SEP> median. <SEP> read <SEP> (mediaInData, <SEP> 0, <SEP> 500) <SEP> ; <tb> sumReading <SEP> += <SEP> reading <SEP> ; <tb> ) <tb> spider. <SEP> enterFoundMedia <SEP> ( <SEP> (Integer) <SEP> vec. <SEP> elementAt <SEP> (1), <tb> fileName,"c <SEP> :/netsertion/spider/downloadMedia","TODO","Unscanned") <SEP> ; <tb> noDownloads++ ;//counting how many downloads has been done } catch (RemoteException re) { System. out. println ("RemoteException at downloadMedia") ; re. printStackTrace () ; } <img class="EMIRef" id="024180628-00350001" />

<tb> catch <SEP> (Exception <SEP> e) <tb> { <tb> System. <SEP> out. <SEP> println <SEP> ("Error <SEP> :"+ <SEP> e. <SEP> toString <SEP> () <SEP> +""+ <tb> e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> e. <SEP> printStackTrace <SEP> () <SEP> ; <tb> try <SEP> ( <tb> spider. <SEP> enterError <SEP> (mScanID, <tb> e. <SEP> getClass <SEP> (). <SEP> getName <SEP> (), <SEP> e. <SEP> getMessage <SEP> (),"Spider. <SEP> downloadMedia","Medium", <tb> false); } catch (Exception re) { System. out. println ("Exception entering error in downloadMedia"); } } } try { spider. setNoDownloads (m DomainID, new Integer (noDownloads)); } catch (RemoteException re) { <img class="EMIRef" id="024180628-00350002" />

<tb> System. <SEP> out. <SEP> println <SEP> ("RemoteException <SEP> setting <SEP> NoDownloads") <SEP> ; <tb> } <tb> } <tb> private <SEP> void <SEP> downloadHtml~Pages <SEP> () <tb> { <tb> downloading <SEP> media <tb> for <SEP> (int <SEP> i <SEP> = <SEP> 1 <SEP> ; <SEP> i < mservedLinks. <SEP> size <SEP> ()/*+1*/ <SEP> ; <SEP> i++) <tb> { System. out. println( "###' + ((String) (m~servedLinks. get (new Integer (i)))). toString ()) ; //getting just the names String fileName = ((String)(m~servedLinks. get ( new Integer (i) ) ) ) ; int lastOc = fileName. lastIndexOf ("/") ; if (lastOc ! =-1) fileName = fileName. substring (lastOc + 1 //Limit length of filename to 215 chars if (fileName. length () > 215) fileName = (fileName). substring (0, 215) ; <img class="EMIRef" id="024180628-00350003" />

<tb> //Check <SEP> for <SEP> chars <SEP> not <SEP> allowed <SEP> by <SEP> windows. <tb> fileName=fileName. <SEP> replace <SEP> (':','&commat;') <tb> fileName=fileName. <SEP> replace <SEP> ('*','&commat;') <tb> fileName=fileName. <SEP> replace <SEP> ( <SEP> ? <tb> fileName=fileName. <SEP> replace <SEP> ('"','&commat;') <tb> fileName=fileName. <SEP> replace <SEP> (' < ','&commat;') <tb> fileName=fileName. <SEP> replace <SEP> (' > ','&commat;') <tb> fileName=fileName. <SEP> replace <SEP> (I <tb> /* <SEP> String <SEP> fileName <SEP> = <SEP> ( <SEP> (String) <SEP> (mservedLinks. <SEP> get <SEP> (new <SEP> Integer <SEP> (i) <tb> ))) <SEP> ; <tb> fileName <SEP> = <SEP> fileName. <SEP> replace <SEP> ( <tb> <img class="EMIRef" id="024180628-00360001" />

<tb> fileName <SEP> = <SEP> fileName. <SEP> replace <SEP> (' <SEP> :','*') <SEP> ; <tb> *1 <tb> String <SEP> inputLine <SEP> = <SEP> new <SEP> String <SEP> () <SEP> ; <tb> try { Integer a = new Integer (i) ; String s = (String)m~servedLinks. get ( a) ; URL httpURL = new URL (s) ; BufferedReader reader = new BufferedReader (new InputStreamReader (httpURL. openStream ())) ; File file = new File ("html-pages/" + fileName) ; FileWriter writer = new FileWriter (file); while ( (inputLine = reader. readLine ()) ! = null) { <img class="EMIRef" id="024180628-00360002" />

<tb> //System. <SEP> out. <SEP> println <SEP> ("inputLine <SEP> :"+ <SEP> inputLine) <SEP> ; <tb> writer. <SEP> write <SEP> ( <SEP> inputLine <SEP> +"\n") <SEP> ; <tb> } <tb> reader. <SEP> close <SEP> () <SEP> ; <tb> writer. <SEP> close <SEP> () <SEP> ; <tb> } <tb> catch <SEP> (Exception <SEP> e) <tb> { <tb> System. <SEP> out. <SEP> println <SEP> ("Error <SEP> :"+ <SEP> e. <SEP> toString <SEP> () <SEP> +""+ <tb> e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> e. printStackTrace () ; try ( spider. enterError m~ScanID, e. getClass (). getName (), e. getMessage(), "Spider.downloadHtml~Pages", "Medium", false); } catch (Exception re) ( System. out. println ("Exception entering error in downloadHTMLPages"); <img class="EMIRef" id="024180628-00360003" />

<tb> } <tb> } <tb> } <tb> } <tb> private <SEP> String <SEP> [] <SEP> downloadHTMLPage <SEP> (String <SEP> page) <tb> { String fileName = page; if (fileName.endsWith("/"l)) fileName = fileName. substring (0, fileName. length ()-1); int lastOc = fileName. lastIndexOf ("/") ; if( (lastOc ! =-1) fileName = fileName. substring (lastOc + 1) ; //Limit length of filename to 215 chars if (fileName. length () > 215) fileName = (fileName). substring (0, 215); <img class="EMIRef" id="024180628-00360004" />

<tb> //Check <SEP> for <SEP> chars <SEP> not <SEP> allowed <SEP> by <SEP> windows. <tb> fileName=fileName. <SEP> replace <SEP> (' <SEP> :','&commat;') <SEP> ; <tb> fileName=fileName. <SEP> replace <SEP> ('*','&commat;') <SEP> ; <tb> fileName=fileName. <SEP> replace <SEP> (' <SEP> ?','&commat;') <SEP> ; <tb> fileName=fileName. <SEP> replace <SEP> ('"','&commat;') <SEP> ; <tb> fileName=fileName. <SEP> replace <SEP> (' < ','&commat;') <SEP> ; <tb> <img class="EMIRef" id="024180628-00370001" />

<tb> fileName=fileName. <SEP> replace <SEP> (' > ','&commat;') <SEP> ; <tb> fileName=fileName. <SEP> replace <SEP> (' <SEP> !','&commat;') <SEP> ; <tb> System. out. println ("FILENAME:"+fileName) ; String inputLine = new String () ; try { URL httpURL = new URL (page) ; <img class="EMIRef" id="024180628-00370002" />

<tb> BufferedReader <SEP> reader <SEP> = <SEP> new <SEP> BufferedReader <SEP> (new <tb> InputStreamReader <SEP> (httpURL. <SEP> openStreamO)) <SEP> ; <tb> File <SEP> file <SEP> = <SEP> new <SEP> Filef"c <SEP> :/netsertion/spider/html-pages/" <SEP> + <tb> fileName) <SEP> ; <tb> FileWriter <SEP> writer <SEP> = <SEP> new <SEP> FileWriter <SEP> (file) <SEP> ; <tb> while ( (inputLine = reader. readLine null) { //System. out. println ("inputLine : " + inptLine ); writer. write ( inputLine +"\n"); } reader. close () ; writer. close () ; } <img class="EMIRef" id="024180628-00370003" />

<tb> catch <SEP> (Exception <SEP> e) <tb> { <tb> System. <SEP> out. <SEP> println <SEP> ("Error <SEP> :"+ <SEP> e. <SEP> toStringO <SEP> +""+ <tb> e. <SEP> getMessage <SEP> ()) <SEP> ; <tb> e. printStackTrace () ; try { spider. enterError ( mScanID, e. getClass (). getName (), e. getMessage (),"Spider. downloadHTMLPage", "Medium", false); } catch (Exception re) { <img class="EMIRef" id="024180628-00370004" />

<tb> System. <SEP> out. <SEP> println <SEP> ("Exception <SEP> entering <SEP> error <SEP> in <tb> downloadHTMLPage"); } } <img class="EMIRef" id="024180628-00370005" />

<tb> String <SEP> [] <SEP> s <SEP> = <SEP> {fileName,"c <SEP> :/netsertion/spider/html-pages/", <SEP> page <tb> return <SEP> s <SEP> ; <tb> I <tb> } <tb>

Claims (14)

  1. CLAIMS: 1. A method of protecting intellectual property rights in a user's media on a network of computers, the method comprising the steps of : a) receiving user media in which the user enjoys intellectual property rights; b) generating at least one user digital identification signature from each item of said user media; c) searching said network of computers for potentially infringing media; d) generating at least one suspect digital identification signature for each item of said potentially infringing media; e) comparing said user and suspect digital identification signatures to determine their degree of similarity; and f) producing a notification if said degree of similarity exceeds a predetermined level.
  2. 2. A method as claimed in claim 1, which further comprises the steps of : a) generating at least one user key, consisting of between one and ten integers, from the user's media; b) generating a suspect key, consisting of between one and ten integers, from each item of potentially infringing media; and c) before carrying out step (e) of claim 1, comparing said user and suspect keys for each item of potentially infringing media to determine their degree of similarity, and carrying out step (e) of claim 1 only if the degree of similarity of the keys exceeds a predetermined value.
  3. 3. A method as claimed in claim 1 or 2, which further includes the step of human operators comparing items of user media against items of potentially infringing media for which said notification has been produced.
  4. 4. A method as claimed in any preceding claim in which said searching step is carried out by a number of search spiders.
  5. 5. A method as claimed in claim 4, wherein media retrieved by said search spiders is stored in a file cache prior to any of said comparing steps being carried out.
  6. 6. A method as claimed in any preceding claim, wherein said user media, potentially infringing media and digital identification signatures are stored in a central database.
  7. 7. A method as claimed in claim 6, wherein copies of said database are provided at a number of hosting bunkers.
  8. 8. A method as claimed in claim 7, wherein at least some of said hosting bunkers reside in different countries.
  9. 9. A method as claimed in any one of claims 6 to 8, when also dependent directly or indirectly on claim 4 or 5, wherein each bunker is provided with said search spiders.
  10. 10. A method as claimed in any preceding claim, wherein said network of computers is the Internet.
  11. 11. A method as claimed in any preceding claim, wherein at least 5 user digital identification signatures are used for each item of said user media.
  12. 12. A method as claimed in any preceding claim, wherein at least 5 user digital identification signatures are used for each item of said potentially infringing media.
  13. 13. A method as claimed in any preceding claim, wherein each digital identification signature is stored in a digital identification signature file.
  14. 14. A system for carrying out the method of any preceding claim, the system comprising at least one computer programmed to generate said user and suspect digital identification signatures, and to compare said signatures to determine their degree of similarity.
GB0024360A 2000-10-05 2000-10-05 Protection of intellectual property rights on a network Withdrawn GB2369203A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0024360A GB2369203A (en) 2000-10-05 2000-10-05 Protection of intellectual property rights on a network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0024360A GB2369203A (en) 2000-10-05 2000-10-05 Protection of intellectual property rights on a network

Publications (2)

Publication Number Publication Date
GB0024360D0 GB0024360D0 (en) 2000-11-22
GB2369203A true GB2369203A (en) 2002-05-22

Family

ID=9900708

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0024360A Withdrawn GB2369203A (en) 2000-10-05 2000-10-05 Protection of intellectual property rights on a network

Country Status (1)

Country Link
GB (1) GB2369203A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7707224B2 (en) 2006-11-03 2010-04-27 Google Inc. Blocking of unlicensed audio content in video files on a video hosting website
US7987368B2 (en) 2005-10-28 2011-07-26 Microsoft Corporation Peer-to-peer networks with protections
US8180920B2 (en) 2006-10-13 2012-05-15 Rgb Networks, Inc. System and method for processing content
US8411752B2 (en) 2008-10-29 2013-04-02 Nds Limited Video signature
US8627509B2 (en) 2007-07-02 2014-01-07 Rgb Networks, Inc. System and method for monitoring content
US8640179B1 (en) 2000-09-14 2014-01-28 Network-1 Security Solutions, Inc. Method for using extracted features from an electronic work
US9135674B1 (en) 2007-06-19 2015-09-15 Google Inc. Endpoint based video fingerprinting
US9247276B2 (en) 2008-10-14 2016-01-26 Imagine Communications Corp. System and method for progressive delivery of media content
US9282131B2 (en) 2009-01-20 2016-03-08 Imagine Communications Corp. System and method for splicing media files
US9294728B2 (en) 2006-01-10 2016-03-22 Imagine Communications Corp. System and method for routing content
US9473812B2 (en) 2008-09-10 2016-10-18 Imagine Communications Corp. System and method for delivering content
US9633014B2 (en) 2009-04-08 2017-04-25 Google Inc. Policy based video content syndication

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0567680A1 (en) * 1992-04-30 1993-11-03 International Business Machines Corporation Pattern recognition and validation, especially for hand-written signatures
US5521984A (en) * 1993-06-10 1996-05-28 Verification Technologies, Inc. System for registration, identification and verification of items utilizing unique intrinsic features
US5544255A (en) * 1994-08-31 1996-08-06 Peripheral Vision Limited Method and system for the capture, storage, transport and authentication of handwritten signatures
US5987456A (en) * 1997-10-28 1999-11-16 University Of Masschusetts Image retrieval by syntactic characterization of appearance

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0567680A1 (en) * 1992-04-30 1993-11-03 International Business Machines Corporation Pattern recognition and validation, especially for hand-written signatures
US5521984A (en) * 1993-06-10 1996-05-28 Verification Technologies, Inc. System for registration, identification and verification of items utilizing unique intrinsic features
US5544255A (en) * 1994-08-31 1996-08-06 Peripheral Vision Limited Method and system for the capture, storage, transport and authentication of handwritten signatures
US5987456A (en) * 1997-10-28 1999-11-16 University Of Masschusetts Image retrieval by syntactic characterization of appearance

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9824098B1 (en) 2000-09-14 2017-11-21 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with identified action information
US9807472B1 (en) 2000-09-14 2017-10-31 Network-1 Technologies, Inc. Methods for using extracted feature vectors to perform an action associated with a product
US9781251B1 (en) 2000-09-14 2017-10-03 Network-1 Technologies, Inc. Methods for using extracted features and annotations associated with an electronic media work to perform an action
US9832266B1 (en) 2000-09-14 2017-11-28 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with identified action information
US10108642B1 (en) 2000-09-14 2018-10-23 Network-1 Technologies, Inc. System for using extracted feature vectors to perform an action associated with a work identifier
US10073862B1 (en) 2000-09-14 2018-09-11 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with selected identified image
US8640179B1 (en) 2000-09-14 2014-01-28 Network-1 Security Solutions, Inc. Method for using extracted features from an electronic work
US8656441B1 (en) 2000-09-14 2014-02-18 Network-1 Technologies, Inc. System for using extracted features from an electronic work
US8782726B1 (en) 2000-09-14 2014-07-15 Network-1 Technologies, Inc. Method for taking action based on a request related to an electronic media work
US8904464B1 (en) 2000-09-14 2014-12-02 Network-1 Technologies, Inc. Method for tagging an electronic media work to perform an action
US8904465B1 (en) 2000-09-14 2014-12-02 Network-1 Technologies, Inc. System for taking action based on a request related to an electronic media work
US10063936B1 (en) 2000-09-14 2018-08-28 Network-1 Technologies, Inc. Methods for using extracted feature vectors to perform an action associated with a work identifier
US10063940B1 (en) 2000-09-14 2018-08-28 Network-1 Technologies, Inc. System for using extracted feature vectors to perform an action associated with a work identifier
US9256885B1 (en) 2000-09-14 2016-02-09 Network-1 Technologies, Inc. Method for linking an electronic media work to perform an action
US10057408B1 (en) 2000-09-14 2018-08-21 Network-1 Technologies, Inc. Methods for using extracted feature vectors to perform an action associated with a work identifier
US9536253B1 (en) 2000-09-14 2017-01-03 Network-1 Technologies, Inc. Methods for linking an electronic media work to perform an action
US9558190B1 (en) 2000-09-14 2017-01-31 Network-1 Technologies, Inc. System and method for taking action with respect to an electronic media work
US9544663B1 (en) 2000-09-14 2017-01-10 Network-1 Technologies, Inc. System for taking action with respect to a media work
US9348820B1 (en) 2000-09-14 2016-05-24 Network-1 Technologies, Inc. System and method for taking action with respect to an electronic media work and logging event information related thereto
US10205781B1 (en) 2000-09-14 2019-02-12 Network-1 Technologies, Inc. Methods for using extracted features to perform an action associated with selected identified image
US9883253B1 (en) 2000-09-14 2018-01-30 Network-1 Technologies, Inc. Methods for using extracted feature vectors to perform an action associated with a product
US9529870B1 (en) 2000-09-14 2016-12-27 Network-1 Technologies, Inc. Methods for linking an electronic media work to perform an action
US9282359B1 (en) 2000-09-14 2016-03-08 Network-1 Technologies, Inc. Method for taking action with respect to an electronic media work
US9538216B1 (en) 2000-09-14 2017-01-03 Network-1 Technologies, Inc. System for taking action with respect to a media work
US9805066B1 (en) 2000-09-14 2017-10-31 Network-1 Technologies, Inc. Methods for using extracted features and annotations associated with an electronic media work to perform an action
US7987368B2 (en) 2005-10-28 2011-07-26 Microsoft Corporation Peer-to-peer networks with protections
US9294728B2 (en) 2006-01-10 2016-03-22 Imagine Communications Corp. System and method for routing content
US8180920B2 (en) 2006-10-13 2012-05-15 Rgb Networks, Inc. System and method for processing content
US8301658B2 (en) 2006-11-03 2012-10-30 Google Inc. Site directed management of audio components of uploaded video files
US9336367B2 (en) 2006-11-03 2016-05-10 Google Inc. Site directed management of audio components of uploaded video files
US9424402B2 (en) 2006-11-03 2016-08-23 Google Inc. Blocking of unlicensed audio content in video files on a video hosting website
US7707224B2 (en) 2006-11-03 2010-04-27 Google Inc. Blocking of unlicensed audio content in video files on a video hosting website
US9135674B1 (en) 2007-06-19 2015-09-15 Google Inc. Endpoint based video fingerprinting
US8627509B2 (en) 2007-07-02 2014-01-07 Rgb Networks, Inc. System and method for monitoring content
US9473812B2 (en) 2008-09-10 2016-10-18 Imagine Communications Corp. System and method for delivering content
US9247276B2 (en) 2008-10-14 2016-01-26 Imagine Communications Corp. System and method for progressive delivery of media content
US8411752B2 (en) 2008-10-29 2013-04-02 Nds Limited Video signature
US9282131B2 (en) 2009-01-20 2016-03-08 Imagine Communications Corp. System and method for splicing media files
US9633014B2 (en) 2009-04-08 2017-04-25 Google Inc. Policy based video content syndication

Also Published As

Publication number Publication date
GB0024360D0 (en) 2000-11-22

Similar Documents

Publication Publication Date Title
US6401118B1 (en) Method and computer program product for an online monitoring search engine
US7372976B2 (en) Content indexing and searching using content identifiers and associated metadata
AU762283B2 (en) Content addressable information encapsulation, representation, and transfer
Mohay Computer and intrusion forensics
KR100781730B1 (en) System and method for electronically managing composite documents
US9275053B2 (en) Decoding a watermark and processing in response thereto
US9910856B2 (en) Information source agent systems and methods for distributed data storage and management using content signatures
US7095871B2 (en) Digital asset management and linking media signals with related data using watermarks
US7415731B2 (en) Content addressable information encapsulation, representation, and transfer
US7587617B2 (en) Data repository and method for promoting network storage of data
US20120324227A1 (en) System For Generating Fingerprints Based On Information Extracted By A Content Delivery Network Server
US6735699B1 (en) Method and system for monitoring use of digital works
CN101226537B (en) Creation and persistence of action metadata
US7228565B2 (en) Event reporting between a reporting computer and a receiving computer
US20120185505A1 (en) Methods and computer program products for accelerated web browsing
KR101084768B1 (en) Issuing a digital rights managementdrm license for content based on cross-forest directory information
US20080250159A1 (en) Cybersquatter Patrol
US7243147B2 (en) Systems and methods for the detection and management of network assets
US7613704B2 (en) Enterprise digital asset management system and method
US8645838B2 (en) Method for enhancing content using persistent content identification
US20070016951A1 (en) Systems and methods for identifying sources of malware
US8020209B2 (en) System and method of monitoring and controlling application files
US20040260933A1 (en) Method of preventing tampering of program by using unique number, method of upgrading obfuscated program, and apparatus thereof
US8683031B2 (en) Methods and systems for scanning and monitoring content on a network
JP4320195B2 (en) File storage service system, the file management apparatus, a file management method, id assignment nas server, and the file reading process

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)