GB2369203A

GB2369203A - Protection of intellectual property rights on a network

Info

Publication number: GB2369203A
Application number: GB0024360A
Authority: GB
Inventors: Joseph Matthews
Original assignee: Assertion Ltd
Current assignee: Assertion Ltd
Priority date: 2000-10-05
Filing date: 2000-10-05
Publication date: 2002-05-22
Also published as: GB0024360D0

Abstract

A method of protecting a users intellectual property rights comprises the steps of creating a digital identification for the users media, searching the network for possible infringing media and creating a digital identifier for the infringing media, comparing the two identifiers and producing a notification if the two identifiers are similar. Preferably at least five identifiers are produced for the original and offending media. Preferably, the notification is made to a human operator for a final determination. Preferably the search of the network is done using search spiders and the media retrieved is stored in a cache prior to the comparison step. The identifiers may be stored in a central database and copies held on a number of machines which may be in different countries, each machine having its own search spider. In one embodiment the network may be the Internet.

Description

PROTECTION OF INTELLECTUAL PROPERTY RIGHTS ON A NETWORK OF COMPUTERS The invention relates to methods of protecting intellectual property rights in a user's media on a network of computers, and particularly, although not exclusively, on the Internet.

The Internet, described below in relation to Figure 7, has provided a means of sharing digital media and content, located on publicly accessible networked computers, between vast and increasing numbers of users. The increased availability of information has coincided with improvements in network bandwidth and data compression techniques, greatly improving the quality of media available to general users.

This present situation has created new opportunities for media creators and publishers, exploiting new distribution channels and new domestic, academic and commercial audiences. The possibility of copyright infringement and hence the loss of valuable intellectual property is a very real difficulty associated with Internet media distribution.

Digital Rights Management (DRM) (see for example www. magex. com, and www. intertrust. com) systems based on encrypted delivery packages can help but they are often complex to use. Their effectiveness is limited as once the media has been delivered and has been released from the DRM container, unauthorised redistribution in an unencrypted form to unlicensed users can take place.

The recent widespread use of peer-to-peer media sharing networks such as Napster (see www. techweb. com/wire/story/TWB 20000821S0003) has also increased the possibility of copyright infringement.

The invention seeks to overcome at least some of the problems of the prior art.

According to the invention there is provided a method of protecting intellectual property rights on a network of computers, and a system for carrying out such a method, as set out in the accompanying claims.

A specific embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawing, in which: Figure 1 shows a schematic overview of the system; Figure 2 shows one way of using the system to generate revenue; Figure 3 shows an example of signature and key generation for image files; Figure 4 shows four stages in the process carried out by the system; Figure 5 shows examples of information displayed on computer screens to allow the overall performance and status of the system to be monitored; Figure 6 is a further overview of the system, showing error, match and mission control panels used for displaying system information to operators; Figure 7 shows a schematic overview of part of the Internet; Figure 8 shows a general overview of the database; and Figure 9 is a general overview of the system incorporated into the Internet.

A system representing one embodiment of the invention will be described. It is assumed that the system is controlled and maintained by a system operator, which may be a company, for the benefit of system users, who may be clients of the system operator.

Figure 1 shows a schematic overview of the system. A company control centre 2 makes use of a database 46 (an overview of which is shown in Figure 8) in order to search the Internet 6 using search spiders 8. Search Spiders 8 are the software agents that navigate and enumerate pages within an Internet domain. The search spiders 8 can search for data relating to for example music, sounds generally, images, video, software and text.

The system therefor has benefits for the music, film, publishing, image and software industries for example, as indicated by boxes 2,4, 10,12 and 14 respectively. The system can also be used by business users 16 and website designers 18 among others.

Figure 2 shows one way of using the system to generate revenue. At step 20 the system operator provides its terms of business to potential users of the system, which may include other website owners. The system operator may contact directly (marketing, sales) and indirectly (Terms & Conditions, resellers, PR) digital media content providers to offer IP protection.

At step 22 users of the system have their IP protected by the search spiders 8 which search for unauthorised use of the IP on other websites. The users may pay the system operator for this service, which directly creates revenue 24.

If unauthorised use of the user's IP is detected, the user can be notified, and the user can either use the company legal department or be directed to a suitable law firm (step 26) selected by the system operator. The system operator can form relationships or even partnerships with such law firms, which can also generate revenue 24 in the form of commission or retainers paid by the law firm to the system operator.

At step 28 revenue 24 can be generated from the information retrieved by the search spiders 8. The data retrieved by the search spiders 8 can be used to form profiles of different websites, which may be useful to other businesses or to businesses planning new ventures.

Referring to Figure 3, the system used identifies digital media without altering the media in any way. Unlike digital watermarking techniques (see for example www. cc. gatech. edu/-mjm/dw/watermarking. html) that alter the source media, the present system identifies media according to high-level content information. The present system uses the'look'of an image or the'sound'of a music file rather than its binary data content, therefore providing detection that is resilient to media changes.

The system uses detection methods which use multiple characterisation algorithms. Each type of media, for example images, has a corresponding group of signatures with

'adapter'or translator algorithms to translate the different binary file formats within the . '\ group eg. fromjpg, gif, bmp, tiff to a generic format. About ten algorithms may be used for each media type. In the case of images, they examine colour content, characteristic shapes, textures, scattered samples, Fourier harmonics (see for example http://mathworld. wolfram. com/Fourier Series. html) and so on. Each signature algorithm produces a binary result of several kilobytes in size as a record of the result. This is further processed into several small'key'integer values to provide very broad categorisation, allowing for a specified degree of correlation within the databases.

Figure 3 shows an example relating to image files, but the same process is applicable to files relating to other media. The starting point are the image files 30 representing the user's images, which can be in a number of different formats. These are translated into a generic format 34 by translator 32. A number of different signature algorithms, represented 36 to 42 although typically about 10 are used, are then used to process the generic image 34. Each signature algorithm 36 to 42 produces a signature file, having a size of I to 2 KB, which identifies the generic image 34. The signature files may relate to different characteristics of the image, such as shape and colour for example.

A number of key algorithms, represented generally by 44, are then used to process each signature file to produce a key, consisting of between I and 10 integers, corresponding to each signature file.

The signature files and keys are stored in database 46.

When the search spiders 8 search the Internet 6, the same process is repeated for each potentially infringing image which is found. Figure 3 therefore applies equally to the process carried out on suspect images. The suspect image file 30 is translated to a generic image 34, and the signature algorithms 36 to 42 then produce signature files for the suspect image. The key algorithms 44 are then used to produce keys for the suspect image, and the signature files and keys are stored in database 46.

In order to determine whether an IP infringement has taken place, the keys of the suspect image are first compared against the keys for the user's images. If the correlation is sufficiently high, then the next stage is to compare the signature files of the suspect image against those of the user's image. If the correlation for a number of different signatures is sufficiently high then the final stage is a visual comparison carried out by a person who is employed by, or acting for, the system operator.

It will be appreciated that this multi-stage process allows images which are very different from the user's images to be eliminated relatively quickly, and with less processing power than would be the case if the signatures, or even the image files themselves, were compared in the first instance.

The system user's media is either digitised by the system operator or provided in digital form to the system operator, either before or after its release. The media is processed with the same or similar algorithms to those used in its detection. The results are then stored in a central'DNA'database 46. The small integer'keys'mentioned above are used to simplify correlation between suspect and known media within database 46.

When a match between keys is found in the database, the original signature is retrieved and compared with the suspect signature. Methods based on least mean square difference values and other correlation techniques can be used. Genetic algorithms and other methods are used to find methods for processing the signatures into keys, maximising the effectiveness of the correlation at the fastest early stages of matching.

Key algorithms can be coded in C and C++ to maximise efficiency.

Figure 4 shows four stages in the process carried out by the system. In stage 1, the client 48 (ie the system user) provides the system operator with the media 50 that it wishes to protect. The media is provided to the system operator's headquarters 52 and/or national office 108, where the necessary signatures and keys are generated (at step 54) and stored in database 46 in accordance with the method of Figure 3. The client 48 also provides information relating to and identifying the client (at step 56), priority scheduling information (at step 58) which identifies for which of the client's media protection is most important, and at step 60 information relating to known or likely pirates or pirate websites at which the client 48 believes that infringement is likely to take place. All of this information is also stored in database 46.

Stage 2 represents schematically the process by which a pirate 62 (ie a person or organisation infringing the client's IP) uploads infringing media 64 to the Internet 6.

Stage 3 represents schematically the process carried out by the system in which the search spiders 8 search the Internet 6 for potentially infringing media. The media identified by the search spiders 8 is stored in a file cache 66, and is then analysed by media analysers 68, using information from the database 46. The media analysers 68 represent computers which carry out the comparisons of keys and signatures generated as mentioned in Figure 3 above. The data stored in the file cache 66 is gradually diminished as the media analysers 68 carry out the required analysis, and the file cache 66 therefore acts as a buffer.

At stage 4, the final comparisons between the client's media and potentially infringing media are carried out by human operators, who make the necessary visual or audio comparison. The client 48 is informed of the results, and if necessary the company legal department or an external law firm 72 can be involved.

Figure 5 shows examples of information displayed on computer screens. Step la represents the search spiders 8 (displayed on a computer screen) downloading domains to scan, which in Step Ib are fed into the database 46. Step 2 represents the Media Analyser 68 (displayed on a computer screen) feeding assumed matches into database 46. Step 3a shows the Match Control Panel 78 downloading assumed matches and feeding confirmed matches at Step 3b, into database 46. Mission Control Panel 74 (displayed on a computer screen) allows the overall performance and status of the system to be monitored, using information from database 46.

Referring to Figure 6, the database 46 is made available for interrogation by'spider servers', which are applications written in Java with an Enterprise Java Beans infrastructure. These manage groups of Search Spiders 8 distributed over several computers. Search Spiders 8 are the software agents that navigate and enumerate pages within an Internet domain. Having navigated and enumerated pages within an Internet domain the spiders 8 download digital media that falls within simple type and size characteristics. Media Analyser agents 68 then apply the signature algorithms to the previously downloaded media and log any suspect correlation within the database 46.

This is done asynchronously so that a file cache may be placed between the Search Spiders 8 and Media Analysers 68, making the most of Internet download bandwidth as it varies with time. Errors encountered by the Search Spiders 8 are also logged within database 46, and can be displayed on a computer screen 76. The results of matches can also be displayed on a suitable computer screen 78.

Targeting the Search Spiders 8 is managed by the central database 46, according to client priority information, logging the domains where previous infringements occur and gathering information from tip-offs (either from the client 48 or from suitable third parties) and human operators 70. Existing web content categorisation databases can be exploited to target known pirate sites. By feeding positive results back into the database, the Search Spiders 8 become self-learning and increasingly effective.

The human operators 70 respond to possible matches reported by the system via match control panel 78, either confirming or denying the match. They are provided with client information and a priority rating for the media found. The operator 70 then responds accordingly, informing the owner of the media and offering further company legal services and/or external lawyers. Human operators 70 oversee critical system parameters such as bandwidth and free server disk space via a central mission control panel 74. Error information from the search spiders 8 is also passed on to the operators 70 via the match control panel 78. Action taken, such as providing the spiders 8 with passwords to access particular sites, is then registered in the database 46.

Figure 7 shows a schematic overview of part of the Internet 6. Three national backbones 80 are shown, located in three different countries, and connected by three international links 82. Home users 84, businesses 86 and corporate web hosts 88 are connected to the Internet via Internet Service Providers (ISPs) 90, sometimes by the use of wireless or fibre-optic links 92. Hosting bunkers 94 are available in each country to allow businesses to obtain more rapid access to parts of the Internet in that country. In this regard it should be appreciated that the international links 82 are relatively slow at transmitting information compared to links within countries.

Figure 8 shows a general overview of database 46, with each box 150 representing a separate table in the database 46.

Figure 9 is a general overview of the system incorporated into the Internet 6 shown in Figure 7. A copy of database 46 is provided at a number of hosting bunkers 94, which may be located in different countries. Each hosting bunker 94 contains a"search spider farm", which is a collection of search spiders 8 residing on a number of computers 96.

A fault tolerant load balancer 98 shares the Internet bandwidth between the computers 96. The copies of database 46 are protected by a security firewall 100, and provided with local backup 102.

Still referring to Figure 9, there are also provided a number of national search centres 104, which essentially contain the same components as the hosting bunkers 94 except that they lack a copy of database 46. Instead, the results from the spiders are fed back to the copies of database 46 at other locations. Typically, the system would use one hosting bunker and two to three search centres per country.

The system also interacts with service users 106, which may include clients and informers. In addition to the company headquarters 52 of the system operator, there may also be separate national offices 108 also linked to the system.

A distributed and modular architecture is used throughout the system. The modularity allows for the rapid adoption of new media types and allows easy system maintenance.

Remote national search centres 104, located in Internet hosting bunkers, connect to the central database 46 or an intermediate replication of it. This allows the necessary download bandwidth to be achieved, unconstrained by limited international Internet links 82. Search routines for particular domains are allocated to particular national search centres 104 according to geographic hosting region, again facilitating rapid downloads.

The software components that form the core of the system have a high degree of modularity. This enables continual updating and allows for the possibility of redundancy in the system to boost reliability. New modules may be written to accommodate new digital media formats as they are invented, allowing their monitoring

by the system. Other modules allow interfacing with all popular Internet access methods < ) including HTTP, FTP, IRC, ICQ, RealMedia etc. and provide for future diversification.

Existing embedded signature techniques such as those used by Digimarc, (see www. digimarc. com) where that source media is altered to contain a signature, may also be incorporated into the system. This allows the system to be complementary to existing watermarking techniques, embracing them as part of the system.

In order to improve the detection response time and make best use of the available computing and bandwidth resources, the searches are split between broad sweeps of the Internet or network and narrowly focused surveys of known media hosting sites.

Intelligence for targeting these searches will be gathered from Internet categorisation databases, feedback from clients and industry associations, tip-offs from users, responses from a network of home users in return for incentives and from the results of previous searches.

The search results reported to the system users are prioritised according to the importance of the media and the hosting bandwidth and level of public accessibility to the infringing host. Using network statistic databases, the system can distinguish between sites that present an immediate threat and those that can only support a few simultaneous users. This information will dictate whether immediate contact is made with the client or whether the result should be recorded as part of a regular report.

Sample source code is set out on the following pages for implementing the spiders, media analysers and session bean in one embodiment of the invention.

zu Example 1 : Spider Enterprise Java Bean Source Code

import java. rmi. RemoteException ; import javax. ejb. SessionBean ; import javax. ejb. SessionContext ; import javax. naming. InitialContext ;

import javax. rmi. PortableRemoteObject ; import javax. ejb. DuplicateKeyException ; import javax. ejb. CreateException ; import javax. ejb. FinderException; import javax. ejb. EJBException ; import java. util. Date ; import java. util. Vector ; import java. util. Enumeration; public class EJBSpiderBean implements SessionBean t public Integer enterScan (Integer DomainID, Date dateStart, String ScanStatus) throws DuplicateKeyException, CreateException, RemoteException { tblScan theEntry = null ; tblScanHome hometblScan = gettblScanHome () ;

try { theEntry = hometblScan. create (DomainID, dateStart, ScanStatus) ; } catch (java. rmi. RemoteException e) { throw new EJBException ("EJBSpiderBean, enterScan :"+e. getMessage ()) ; } return theEntry. getScanID () ; ) public void setDateFinish (Integer ScanID, Date dateFinish) throws RemoteException { tblScanHome hometblScan=gettblScanHome tblScan ts = null ; try { ts = hometblScan. findByPrimaryKey (ScanID) ; } catch (FinderException fe) { System. out. println ("FinderException at EJBSpiderBean. setDateFinish") ; fe. printStackTrace () ; try ( enterError (ScanID, fe. getClass (). getName (), fe. getMessage (), "SpiderBean. setDateFinish (ScanID"+ScanID. intValue () +")","Medium", false); } catch (Exception re) {

System. out. println ("Exception entering error in spiderbean, setDateFinish (scan. findbyPrimaryKey)") ; ) } ts. setDateFinish (dateFinish) ; } public void setScanStatus (Integer ScanID, String ScanStatus)

throws RemoteException ( tblScanHome hometblScan=gettblScanHome () ; tblScan ts = null ; try f ts = hometblScan. findByPrimaryKey (ScanID) ; } catch (FinderException fe) { System. out. println ("FinderException at EJBSpiderBean. setScanStatus"); fe. printStackTrace () ; try ( enterError (ScanID, fe. getClass (). getName (), fe. getMessage (), "SpiderBean. setScanStatus (ScanID"+ScanID. intValue () +")","Medium", false); } catch (Exception re) { System. out. println ("Exception entering error in spiderbean, setScanStatus (scan. findbyPrimaryKey)") ; } } ts. setScanStatus (ScanStatus) ; } public void enterError (Integer ScanID, String ErrorType, String

ErrorMessage, String ErrorSource, String Priority, boolean Successlndex) throws DuplicateKeyException, CreateException { tblError theEntry = null ; tblErrorHome hometblError = gettblErrorHome () ; try ( theEntry = hometblError. create (ScanID, ErrorType, ErrorMessage, ErrorSource, Priority, SuccessIndex) ; } catch (java. rmi. RemoteException e) { throw new EJBException ("EJBSpiderBean, enterError : "+e. getMessage ()) ; } } public Integer enterSourcePage (Integer ScanID, String FileName, String Path, String URL)

throws RemoteException, DuplicateKeyException, CreateException { tblSourcePage theEntry = null; System. out. printinf"Scan ID"+ScanID) ; System. out. println ("FileName"+FileName) ; System. out. println ("Path"+Path) ; System. out. println ("URL"+URL); IblSourcePageHome hometblSourcePage = gettblSourcePageHome () ; try { theEntry = hometblSourcePage. create (ScanID, FileName, Path, URL); } catch (java. rmi. RemoteException e) {

throw new EJBException ("EJBSpiderBean, enterSourcePage : "+e. getMessage ()) ; } return theEntry. getSourcePageID () ; } public void enterFoundMedia (Integer SourcePageID, String FileName, String Path, String MediaType, String TestStatus) throws DuplicateKeyException, CreateException { tblFoundMedia theEntry = null; tblFoundMediaHome hometblFoundMedia = gettblFoundMediaHome () ; try { theEntry = hometblFoundMedia. create (SourcePageID, FileName, Path, MediaType, TestStatus);

} catch (java. rmi. RemoteException e) { throw new EJBException ("EJBSpiderBean, enterFoundMedia : "+e. getMessage ()) ; } } public Vector getDomain () { tblDomainHome home = gettblDomainHome () ; Enumeration enum = null ; Vector dom = new Vector () ; tblDomain next ; Date domNextScan = null ; try ( enum = home. findDomains () ; System. out. println ("finddomains returned"+enum. toString ()) ; } catch (Exception ex) { System. out. println ("Exception in SpiderBean getDOmain, from findDomains") ; ex. printStackTrace () ; //DODO 23/8/00 Enter errors in DB ) if (enum ! =null) { next = (tblDomain) enum. nextElement () ; try ( domNextScan = next. getNextScanTimeO ; } catch (RemoteException re) { System. out. println ("Remote exception getting next scan time") ; } while (enum. hasMoreElements ()) { tblDomain td = (tblDomain) enum. nextElement () ; Date d = null ; try ( d = td. getNextScanTimeO ; } catch (RemoteException re) { System. out. println ("Remote exception getting next scan time") ; } if (d. before (domNextScan)) { next = td ; domNextScan = d ; } } try { dom. add (next. getDomainID ()) ; dom. add (next. getDomainName ()) ; next. setDateLastScan (new Date ()) ; } catch (RemoteException re) { System. out. println ("Remote Ex at SpiderBean. getDomain") ; re. printStackTrace () ; } } return dom ; } private tblDomainHome gettblDomainHome () { tblDomainHome hometblDomain=null ; try ( InitialContext ctx = new InitialContext () ; Object objref = ctx. lookup ("ejb. tblDomainHome") ; hometblDomain = (tblDomainHome) PortableRemoteObject. narrow (objref, tblDomainHome. class) ; } catch (Exception NamingException) { NamingException. printStackTrace () ; } return hometblDomain ; }

private tblScanHome gettblScanHome () { tblScanHome hometblScan=null ; try ( InitialContext ctx = new InitialContext () ; Object objref = ctx. lookup ("ejb. tblScanHome") hometblScan = (tblScanHome) PortableRemoteObject. narrow (objref, tblScanHome. class) ; } catch (Exception NamingException) { NamingException. printStackTrace() ; } return hometblScan ; } private tblErrorHome gettblErrorHome() {

tblErrorHome hometblError=null ; try { InitialContext ctx = new InitialContext () ; Object objref = ctx. lookup("ejb.tblErrorHome"); hometblError = (tbIErrorHome) PortableRemoteObject. narrow (objref, tblErrorHome. class) ; } catch (Exception NamingException) { NamingException. printStackTrace() ; } return hometblError ; } private tbISourcePageHome gettblSourcePageHome () {

tbISourcePageHome hometblSourcePage=null ; try ( InitialContext ctx = new InitialContext () ; Object objref = ctx. lookup("ejb.tblSourcePageHome"); hometblSourcePage = (tblSourcePageHome) PortableRemoteObject. narrow (objref, tblSourcePageHome. class) ; } catch (Exception NamingException) { NamingException. printStackTrace () ; } return hometblSourcePage ; } private tblFoundMediaHome gettblFoundMediaHome()

{ tblFoundMediaHome hometblFoundMedia=null ; try { InitialContext ctx = new InitialContext () ; Object objref = ctx. lookup ("ejb. tblFoundMediaHome") hometblFoundMedia = (tbIFoundMediaHome) PortableRemoteObject. narrow (objref, tblFoundMediaHome. class);

} catch (Exception NamingException) { NamingException. printStackTrace () ; ) return hometblFoundMedia ; } public void setNoPages (Integer DomainID, Integer NoPages) throws RemoteException { tblDomainHome hometblDomain=gettblDomainHome() ; tblDomain td = null ; try { td = hometblDomain. findByPrimaryKey (DomainID) ; } catch (FinderException fe) { System. out. println ("FinderException at EJBSpiderBean. setNoPages"); fe. printStackTrace () ; } td. setNoPages (NoPages); } public void setNoDownloads (Integer DomainID, Integer NoDownloads) throws RemoteException { tblDomainHome hometblDomain=gettblDomainHome () ; tblDomain td = null; try { td = hometblDomain. findByPrimaryKey (DomainID) ; } catch (FinderException fe) { System. out. println ("FinderException at EJBSpiderBean. setNoDownloads") ; fe. printStackTrace () ; }

td. setNoDownloads (NoDownloads) ; } public void ejbCreate () {} public void setSessionContext (SessionContext context) {} public void ejbRemove () {} public void ejbActivate () {} public void ejbpassivateo i I public void ejbLoad() { } public void ebjStore() { } } Example 2: Media Analyser Source Code import java. rmi. RemoteException ; import javax. ejb. SessionBean ;

import javax. ejb. SessionContext ; import javax. naming. InitialContext ; import javax. rmi. PortableRemoteObject ; import javax. ejb. DuplicateKeyException ; import javax. ejb. CreateException ; import javax. ejb. FinderException ; import javax. ejb. EJBException ; import java. util. Date ; import java. util. Vector ; import java. util. Enumeration ; import java. util. Arrays ;

public class EJBMediaAnalyserBean implements SessionBean { public Vector getUnscannedMedia () { //DODO 29/8/00 Find by scanID, go through children I guess..... tblFoundMediaHome hometblFoundMedia = gettblFoundMediaHome () ; Enumeration enum = null; Vector vec = new Vector () ; try {

enum = hometblFoundMedia. findUnscannedMedia () ; while (enum. hasMoreElements ()) { tblFoundMedia theEntry = (tblFoundMedia) enum. nextElement () ; vec. add ( (Integer) theEntry. getPrimaryKey ()) ; } } catch (Exception e) { throw new EJBException ("EJBMediaAnalyserBean, getUnscannedMedia:

"+e. getMessage ()) ; } return vec ; } public String getMediaFileName (Integer FoundMediaID) { tblFoundMediaHome hometblFoundMedia = gettblFoundMediaHome () ; tblFoundMedia theEntry = null; String s ="No such FoundMediaID" ; try { theEntry = hometblFoundMedia. findByPrimaryKey (FoundMediaID) ; s = theEntry. getFileName () ; } catch (Exception ex) {

throw new EJBException ("EJBMediaAnalyserBean, getMediaPath : "+ex. getMessage ()) ; } return s ; } public boolean compareFingerPrints (Integer FoundMediaID, byte [] array) { //TODO 24/8/00 Can move this into a finder method ? Can I pass enum from entity to session to Client ? //Eventually will need to pass pattern ID to narrow search... tblFingerPrintDataHome hometblFingerPrintData = gettblFingerPrintDataHome () ; boolean match = false; try { Enumeration enum = hometblFingerPrintData. findAllFingerPrintData () ; if (enum! =null) {

while (enum. hasMoreElements ()) { tblFingerPrintData theEntry = (tblFingerPrintData) enum. nextElement () ; if (Arrays. equals (theEntry. getBlobData (), array) ) { match = true; //TODO 22/8/00 Similarity Index insertAssumedMatch (FoundMediaID, theEntry. getMediaID (), new

Integer (0)) ; } } } } catch (Exception e) { throw new EJBException ("EJBMediaAnalyserBean, compareFingerPrints : "+e. getMessage ()) ; } try if (match) { setTested (FoundMediaID) ; } else { deleteFoundMedia (FoundMediaID) ; } } catch (RemoteException re) { throw new EJBException ("EJBMediaAnalyserBean, compareFingerPrints (2ndEx) :"+re. getMessage ()) ; } return match ; } private void insertAssumedMatch (Integer FoundMediaID, Integer MediaID, Integer SimilarityIndex) { tblAssumedMatch theEntry = null; tblAssumedMatchHome hometblAssumedMatch = gettblAssumedMatchHome () ;

try ( theEntry = hometblAssumedMatch. create (FoundMediaID, MediaID, SimilarityIndex) ; } catch (Exception e) { throw new EJBException ("EJBMediaAnalyserBean, insertAssumedMatch :"+ e. getMessage ()) ; } } public void insertFPD (Integer MediaID, byte [] Data, Integer Size) { tblFingerPrintData theEntry = null;

tblFingerPrintDataHome hometblFPD = gettblFingerPrintDataHomeO ; try { theEntry = hometblFPD. create (MediaID, Data, Size) ; } catch (Exception e) { throw new EJBException ("EJBMediaAnalyserBean, insertFPD : "+e. getMessage ()) ; }

} private void deleteFoundMedia (Integer FoundMediaID) throws RemoteException { tblFoundMedia theEntry = null ; tblFoundMediaHome hometblFoundMedia = gettblFoundMediaHome () ; try { theEntry = hometblFoundMedia. findByPrimaryKey (FoundMediaID) ; theEntry. remove () ; } catch (Exception e) {

throw new EJBException 'EJBMediaAnalyserBean, deleteFoundMedia : "+e. getMessage ()) ; } } private void setTested (Integer FoundMediaID)

throws RemoteException { tblFoundMedia theEntry = null ; tblFoundMediaHome hometblFoundMedia = gettblFoundMediaHome () ; try { theEntry = hometblFoundMedia. findByPrimaryKey (FoundMediaID) ; theEntry. setTestStatus ("Analysed") ; } catch (Exception e) { throw new EJBException ("EJBMediaAnalyserBean, setTested : "+e. getMessage ()) ; } } /* public void enterError (Integer ScanID, String ErrorType, String

ErrorMessage, String ErrorSource, String Priority, String Successlndex) throws DuplicateKeyException, CreateException { tblError theEntry = null; tblErrorHome hometblError = gettblErrorHome () ; try { theEntry = hometblError. create (ScanID, ErrorType, ErrorMessage, ErrorSource, Priority, Successlndex) ;

} catch (java. rmi. RemoteException e) { throw new EJBException ("EJBMediaAnalyserBean, enterError : "+e. getMessage ()) ; } } */ private tblAssumedMatchHome gettblAssumedMatchHome () {

tblAssumedMatchHome hometblAssumedMatch=null ; try { InitialContext ctx = new InitialContext () ; Object objref = ctx. lookup ("ejb. tblAssumedMatchHome") ; hometblAssumedMatch = (tblAssumedMatchHome) PortableRemoteObject. narrow (objref, tblAssumedMatchHome. class); } catch (Exception NamingException) ( NamingException. printStackTrace () ; }

return hometblAssumedMatch ; } private tblFoundMediaHome gettblFoundMediaHome () { tblFoundMediaHome hometblFoundMedia=null; try {

InitialContext ctx = new InitialContext () ; Object objref = ctx. lookup ("ejb. tblFoundMediaHome") ; hometblFoundMedia = (tblFoundMediaHome) PortableRemoteObject. narrow (objref, tblFoundMediaHome. class); } catch (Exception NamingException) { NamingException. printStackTrace () ; } return hometblFoundMedia; } private tblFingerPrintDataHome gettblFingerPrintDatahome() {

tblFingerPrintDataHome hometblFingerPrintData=null ; try { InitialContext ctx = new InitialContext () ; Object objref = ctx. lookup ("ejb. tblFingerPrintDataHome") hometblFingerPrintData = (tblFingerPrintDataHome) PortableRemoteObject. narrow (objref, tblFingerPrintDataHome. class) ; } catch (Exception NamingException) { NamingException. printStackTrace () ; } return hometblFingerPrintData ; } private tblErrorHome gettblErrorHome () { tblErrorHome hometblError=null ; try { InitialContext ctx = new InitialContext () ; Object objref = ctx. lookup ("ejb. tblErrorHome") ; hometblError = (tblErrorHome) PortableRemoteObject. narrow (objref, tblErrorHome. class); } catch (Exception NamingException) {

NamingException. printStackTrace () ; } return hometblError ; } public void ejbCreate() { } public void setSessionContext (SessionContext context) { } public void ejbRemove () {} public void ejbActivate () {} public void ejbPassivate () {} public void ejbLoad() {} public void ejbstoreo I I } Example 3: Spider Source Code import java. io. * ; import java. net. * ; import java. util. * ; import java. rmi. RemoteException ; import javax. rmi. PortableRemoteObject ; class Spider { private String m~startURL ; private String mstrBaseDirectory ;

private String mstrStartingBaseDirectory ; private BufferedReader mbufferedReader ; private Hashtable mmediaPathVector ; private Hashtable m~foundLinks ; private Hashtable mservedLinks ; private int m~maxPage = 10; private Hashtable mexternalLinks ; private Integer m~ScanID ; private Integer m~DomainID ; private int noDownloads ; private EJBSpider spider; public Spider (EJBSpiderHome ejbSH) {

try spider = (EJBSpider) PortableRemoteObject. narrow (ejbSH. create (), EJBSpider. class) ; } catch (Exception ex) { System. out. println ("Cannot create EJBSpider Session bean"); ex. printStackTrace () ; System. exit (1) ; } try { m bufferedreader = new BufferedReader (new FileReader ( "Spider. class") ) ; } catch (FileNotFoundException e) {System. out. println ("Error : " + e. toStringf) +""+ e. getMessage()) ; } m~mediaPathVector = new Hashtable () ; m~foundLinks = new Hashtable (); m~servedLinks = new Hashtable (); mexternalLinks = new Hashtable () ;

noDownloads = 0 ; } private void getBasicDirectory (String strUrl) { if ( (strUrl == null) I I (strurl. length () == 0)) System. out. println ("error ! ! !... netsearch won't work without a proper base directory...") ; if( strUrl. endsWith ("/"))//ural ends with '/' -- > do not want to have this one (- : strUrl = strUrl. substring ( 0, (strUrl. length ()-l)) ; int lastOccurence = strUrl. lastIndexOf ( "/" ); //starting after http:// if (lastOccurence > 8) m~strBaseDirectory = strurl. substring ( 0, lastOccuren /*+ 1*/ ) ;

else mstrBaseDirectory = strUrl ; if ( ! mstrBaseDirectory. endsWith ("/")) mstrBaseDirectory = m-strBaseDirectory. concat ("/") ; System. out. println ( "&commat;m~strBaseDirectory :"+ m~strBaseDirectory) ; } public boolean search () { boolean back = false; try { Vector v = spider. getDomain (); mDomainID = (Integer) v. elementAt (0) ;

/7m-DomainID = new Integer (2) ; //mstartURL ="http ://www. microsoft. com" ; mstartURL = (String) v. elementAt (l) ; System. out. println ("DomainID"+mDomainID) ; System. out. println("StartURL:"+mstartURL) ; mScanID = spider. enterScan (m~DomainID, new Date (), "Started") ; } catch (Exception ex) { System. out. println ("Ex at spider getDomain or enterScan"); ex. printStackTrace () ; try { spider. enterError ( m~ScanID, ex. getClass (). getName (), ex. getMessage (),"Spider. search (getDom/enterScan)","Medium", false); } catch (Exception re) {

System. out. println ("Exception entering error in search (getDom/enterScan)") ; } System. exit (1) ; } getBasicDirectory (m~startURL) ; m~strStartingBaseDirectory = m~strBaseDirectory ; System. out. println ("getting started..."+"\n") ; mfoundLinks. clear () ; m servedLinks. clear () ; String[] s = downloadHTMLPage (mstrStartingBaseDirectory) ; try { //Limit length of filename to 215 chars, otherwise get strange SQL Exception if (((String)s[0]).length() > 215) s [0] = ( (String) s [0]). substring (0, 215) ;

Integer spID = spider. enterSourcePage (m~ScanID, s [0], s [l], s [2]) ; Vector ar = new Vector () ; ar. add (m~strStartingBaseDirectory) ; ar. add (spID);

m~foundLinks. put ( new Integerfm foundLinks. size () +1), ar) ; // first url (-: } catch (Exception re) ( System. out. println ("Unable to enter 1st SourcePage"); re. printStackTrace (); try {

spider. enterError (mScanID, re. getClass (). getName (), re. getMessage (),"Spider. search (Sourcepage)","Medium", false) ; } catch (Exception e) { System. out. println ("Exception entering error in search (enterSourcePage)") ; } System. exit (l) ; } System. out. println ("get links to all media at comlete homepage..." +"\n") ; if (getAllLinks ()) { System. out. println ("downloading htmlpages\n"); //downloadHtml~Pages() ; System. out. println ("downloading media"+"\n"); downloadMedia () ; back = true ;

try System. out. printin("SCANSTATUS"+mScanID+"Scan completed") ; spider. setScanStatus ( mScanID,"Scan completed") ; spider. setDateFinish ( mScanID, new Date ()) ; } catch (RemoteException re) { try { spider. enterError (mScanID,

re. getClass (). getName (), re. getMessage (), "Spider. search (setScanStatus/setDateFinish)","Medium", false) ; } catch (Exception e) { System. out. println ("Exception entering error in search (setScanStatus/setDateFinish)") ; } System. out. println ("RemoteException at setScanStatus/setDateFinish") ; re. printStackTrace () ; } } else back = false ; for (int i = 1 ; i < m~externalLinks.size()+1 ; i++) System. out. println ( "m~exteernalLinks: " + (String) m~externalLinks. get (new Integer (i))) ; System. out. println( "m~externalLinks. size () :"+ m~externalLinks. size ()) ; System. out. println ( "m~servedLinks. size () :"+ m~servedLinks. size () ); System. out. println ("mfoundLinks. size () : " + m~foundLinks. size ()) ; System. out. println( "m~mediaPathVector. size () :"+ m~mediaPathvector. size () ) ;

System. out. println ("\nreturned :"+ back) ; return back ; } private boolean getAllLinks () { if (m~foundLinks. isEmpty ()) return false ; does not make much sense... (- : boolean back = true ; int counter = 0 ; while ( ! (m~foundLinks. isEmpty ()) & & back & & (counter < m~maxPage) ) { back = false; // want a proper end !! ! !! //get link out of m~foundLinks Enumeration keys = m~foundLinks.keys() ; String pageURL = new String (""); Integer sourcePageID = null; Integer key = new Integer (O) ; if (keys. hasMoreElements() ) { key = (Integer) keys. nextelement Vector a = (Vector)m~foundLinks. get (key); pageURL = (String) a. elementAt (O) ; sourcePageID = (Integer) a. elementAt (l) ; } //test link if ( (pageURL! = null) & & sourcePageID! =null & & (testIfWebpage( pageURL))) { System. out. println ("*** pageURL:"+ pageURL ); getBasicDirectory (pageURL) ;//update base directory try { URL strURL = new URL ( pageURL ); //System. out. println (strURL. toString()+";;;"); mbufferedReader = new BufferedReader (new InputStreamReader (strURL. openStream () )) ; String inputLine = new String () ; //get next Line http-code while ((inputLine = getNextLineHTTPCode ()) ! = null) { // System. out. println ("inputLine : "+ inputLine) ; //contains inputLine an important tag? int indexImportantTag =-1; while ( (indexImportantTag = getImportantTag (

inputLine)) ! =-1) System. out. println ( "indexImportantTag :"+ indexImportantTag) ; String pathName = new String () ; //which tag is it? ?? if ( (inputLine. substring ( indexImportantTag)). startsWith ( "href=" )) ! ( (inputLine. substring (

indexImportantTag)). startsWith ("HREF=" { //link to another page inputLine = inputLine. substring ( index Important Tag +"href=". length ()) ; pathName = getpathName ( inputLine) ; // System. out. println( "pathName: " + pathName) ; if (pathName ! = null) dealWithPageLink (pathName, sourcePageID) ; }

else//so only"SCR"and"scr"stays over { //link to another page or media ? ? ? inputLine = inputLine. substring ( indexImportantTag +"scr=". length () pathName = getPathName ( inputLine) ; // System. out. println ("pathName :" + pathName) ;

iff (pathName. endsWith (". html")) II (pathName. endsWith (". HTML")) i t (pathName. endsWith (". htm")) II (pathName. endsWith (". HTM")) Il (pathName. endsWith (". shtm")) II (pathName. endsWith (". SHTM")) ! (pathName. endsWith (". cgi")) II (pathName. endsWith (". CGI")) (pathname. endswith (". asp")) # (pathName.endsWith(".ASP"))

! (pathName. endsWith (". cfm")) II (pathName. endsWith (". CFM")) t ! (pathName. endsWith (". jsp")) II (pathName. endsWith (". JSP")) ! (pathName. endsWith (". sml")) # (pathName. endsWith (". SML"))) { //link to another page if (pathName ! = null) dealWithPageLink (pathName, sourcePageID ); } else if ( (pathName. endsWith (". gif")) # (pathName.endsWith (". GIF")) (pathName. endsWith (". jpeg")) ! ! (pathName. endsWith (". JPEG")) (pathName. endsWith (". jpe") ) 3 (pathName.endsWith(".JPE")) (pathName. endsWith (". jpg")) H (pathName. endsWith (". JPG")) # (pathName.endsWith(".bmp")) # (pathName.endsWith(".BMP")) (pathName. endsWith (". tif")) (pathName. endsWith (". TIF"))) { //link to media if pathName ! =null)

dealWithMediaLink (pathName, sourcePageID) ; } else if ( (pathName ! = null) & & (pathName ! ="")) mexternalLinks. put ( new Integer (m~externalLinks. size ()), pathName) ; } }//while important tag }//while read line from url mbufferedReader. close () ; } catch (IOException e) { System. out. println ("Error:"+ e. toStringO +"" + e. getMessage()) ; e. printStackTrace () ; //counter--; back = false; //m~out. println ("S2"+ e. toStringO) ;//tell server about it try { spider. enterError ( mScanID, e. getClass (). getName (), e. getMessage (),"Spider. getAllLinks","Medium", false); } catch (Exception re) {

System. out. println ("Exception entering error in getAllLinks") ; } } //comletelely read page gets moved from found to served vector if{ saveLinkAsServed( pageURL ) ) if m~foundLinks. remove (key) ! = null) back = true ;//proper end counter ++;

}//end while m found is empty ?... if (pageURL ! = null) { if (mfoundLinks. remove (key null) { counter back = true ;//link is no webpage... delete it and just get the next one...

System. out. println ("... i am here + back..."+ back) ; } } } try { spider. setNoPages ( DomainID, new Integer (counter) spider. setScanStatusf mScanID,"Got All Links") ; } catch (RemoteException re) { System. out. println("RemoteException at setNoPages/setScanStatus") ; re. printStackTrace() ;

try { spider. enterError (mScanID, re. getClass (). getName (), re. getMessage (),"Spider. getAllLinks (setNoPages/setScanStatus)","Medium", false) ; } catch (Exception e) { System. out. println ("Exception entering error in getAllLinks (setNoPages/setScanStatus)") ; } } System. out. println ("getAllLinks () returned:"+ back) ; return back; } private String getNextLineHTTPCode () { // System. out. println( "... i am here String inputLine = new String () ; try { inputLine = m bufferedReader.readLine();

if (inputLine ! = null) ( //inputLine = inputLine. toLowerCase () ;// webservers are case-sensitive ! ! ! int index = inputLine. index0f ("") ; while ( inputLine. indexOf(" ") > -1) { StringBuffer strBuf = new StringBuffer (inputLine ); strBuf. deleteCharAt( index ); inputLine = strBuf. toString () ;

index = inputLine. indexOf ("") ; } } } catch (IOException e) ( System. out. println ("Error :"+ e. toString () +""+ e. getMessage ()) ; e. printStackTrace () ; try { spider. enterError ( ScanID, e. getClass (). getName (), e. getMessage (),"Spider. getNextLineHTTPCode", "Medium", false) ; } catch (Exception re) { System. out. println ("Exception entering error in getNextLineHTTPCode") ; } }

return inputLine ; } private boolean testIfWebpage (String pageLink) { boolean back = false; if (pageLink. endsWith( "/" ) ) pageLink = pageLink. substring ( 0, (pageLink.length() -1) ); int lastOc = pageLink. lastIndexOf("."); String strEnd = new String () i if (lastOc ! =-1) strEnd = pageLink. substring (lastOc+1, pageLink. length ()) ; if ( (strEnd. compareToIgnoreCase ("html") == 0) II (strEnd. comparetoIgnoreCase("HTML") == 0) #

(strEnd. compareToIgnoreCase ("htm") == 0) I I (strEnd. compareToIgnoreCase ("HTM") == 0) II

(strEnd. compareToIgnoreCase ("shtm") == 0) II (strEnd. compareToIgnoreCase ("SHTM") == 0) ici (strEnd. compareToIgnoreCase ("sml") == 0) I I (strEnd. compareToIgnoreCase ("SML") == 0) il // (strEnd. compareToIgnoreCase ("cgi") == 0) II (strEnd. compareToIgnoreCase ("CGI") == 0) II II (strEnd. compareToIgnoreCase ("asp") 0) (strEnd. compareToIgnoreCase ("ASP") == 0) II (strEnd. compareToIgnoreCase ("cfm") 0) (strEnd. compareToIgnoreCase ("CFM") == 0) il (strEnd. compareToIgnoreCase ("jsp") 0) (strEnd. compareToIgnoreCase ("JSP") == 0) ici (strEnd. compareToIgnoreCase ("com") == 0) II (strEnd. compareToIgnoreCase ("net") == 0) 11 (strEnd. compareToIgnoreCase ("de") ==0) ! ! (strEnd. compareToIgnoreCase ("uk") == 0)) back = true ; System. out. println ( pageLink +"testIfWebpage returned :"+ back) ; return back ; } private int getImportantTag (String inputLine) ( System. out. println ( "getImportantTaginputLine :"+ inputLine) ; int array [] = new int [4] ; array [0] = inputLine. index0f ("href=\"") ; array [1] = inputLine. indexOf ("HREF=\"") ; array zu inputLine. indexOf ("src=\"") ; array [3] = inputLine. indexOf ("SRC=\"") ; Arrays. sort ( array) ; int back =-1 ; for (int i = 0 ; i < array. length ; i++) ( if array [i] ! =-1) { back = array [i i= 369 ; ) } System. out. println ( "getImportantTag&num;back :"+ back) ; return back ; ) private String getPathName (String inputLine) ( System. out. println ( "getPathName&num;in :"+ inputLine) ; String pathName = new String () ; //pathName has to be BEFORE next ImportantTag // (as important Tag for this path has been removed befor calling this methode) int nextlndex = getImportantTag (inputLine) ;

System. out. println ("nextlndex : "+ nextlndex) ; if (nextIndex == 0) return null ;//must not be-important tag is supposed to be cut else { int pathStartIndex = inputLine.indexOf( "\"");

int pathEndIndex = inputLine. index0f ("\"", pathStartIndex+1 ) ; if (pathStartIndex ==-1) { //path is in next line-- > get it ! inputLine = getNextLineHTTPCode() ; //new search for path pathStartIndex = inputLine. indexOff"\"") ; pathEndIndex = inputLine.indexOf( "\"", pathStartIndex ); nextlndex = getImportanTag( inputLine ); if pathStartIndex ==-1) return null ;//must not be

! ! ! (something went wrong ! ! !) } if (pathEndIndex ==-1) pathEndIndex = inputLine. length () ; il last"got lost (- : // System. out. println ( "pathStartIndex : "+ pathStartIndex +"pathEndIndex : "+ pathEndIndex); //get pathname-finally (- : pathName = inputLine. substring (pathStartIndex+1, pathEndIndex ); } // System. out. println ( "getPathName&num;back : "+ pathName) ;

return pathName ; } private void dealWithPageLink (String foundpathName, Integer sourcePageID ) { System. out. println ( "dealWithPageLink&num;begin : "+ foundpathName) ; iff (foundpathName. endsWith (". mp3")) # (foundpathName. endsWith (". MP3"))

I I (f oundpathname. endswi th avi (foundpathName. endsWith (". AVI")) I I (foundpathName. endsWith (". mov")) II (foundpathName. endsWith (". MOV")) Il (foundpathName. endsWith (". wav")) II (foundpathName. endsWith (". WAV")) I i (foundpathName. endsWith (". mp2")) II (foundpathName. endsWith (". MP2"))) { dealWithMediaLink (foundpathName, sourcePageID) ; return ; }

if (foundpathName =="" ! ! foundpathName. toLowerCase (). indexOf ("mailto : ! =-1) return ; //pre-preparations String temp = mstrBaseDirectory ; foundpathName = foundpathName. trim () ; if (foundpathName. endsWith ("/")) foundpathName = foundpathName. substring (0, foundpathName. length ()-1 if (foundpathName. startsWith ("/")) { foundpathName = foundpathName. substring ( 1 getBasicDirectory (m-strStartingBaseDirectory foundpathName = m~strBaseDirectory + foundpathName ; System. out. println ("i am here"+ foundpathName) ; } while (foundpathName. indexOf ("../") ! =-1)//go back in directories { StringBuffer strBuf = new StringBuffer (foundpathName) ; getBasicDirectory (m strBaseDirectory) ; int oc = foundpathName. indexOf ("../") ; strBuf = strBuf. replace (0, (oc+3), mstrBaseDirectory) ; foundpathName = strBuf. toString () ; } if ( ( (last Directory (m strBaseDi rectory compareToIgnoreCase firstDirectory (foundpathName))) ==0) { //to avoid : http ://.../dirl/dirl/example. htm getBasicDirectory (mstrBaseDirectory) ; foundpathName = mstrBaseDirectory. concat (foundpathName) ; } if ( (foundpathName.startsWith(mstrBaseDirectory)) j ! (foundpathName. startsWith(m~strStartingBaseDirectory))) { //already formated TODO Can sort this out better when use String hash function because //Then we know where the path hashs to.

Enumeration enum = mfoundLinks. elements () ; boolean indexA = false ; while (enum. hasMoreElements () & & ! indexA) ( Vector v = (Vector) enum. nextelement if ( ( (String) v. elementAt (0)). equalsIgnoreCase (foundpathName)) indexA = true ; } /*enum = m serverlinks. elementso ; boolean indexB = false ; while (enum. hasMoreElementsO & & ! index) { Object [] obj = enum. nextElementO ; if ( (String) obj [0]. equalsIgnoreCase (foundpathName)) indexB = true ; } *I boolean indexB = mservedLinks. contains (foundpathName) ; if ( ! indexA & & ! index) { int key = m~foundLinks. size (); while (m-f oundlinks. contains Key (new Integer (key))) key++; String [] s = downloadHTMLPage (foundpathName); try { //Limit length of filename to 215 chars, otherwise get strange SQL Exception

if (((String) s [0]). length () > 215) s [0] = ( (String) s [0]). substring (0, 215) ; Integer spID = spider. enterSourcePage (mScanID, s [0], s [l], s [2]) ; Vector ar = new Vector () ; ar. add (foundpathName) ;

ar. add (spID) ; mfoundLinks. put ( new Integer (key), ar) ; } catch (Exception re) { System. out. println ("Unable to enter SourcePage") ; re. printStackTrace () ; try { spider. enterError ( ScanID, re. getClass (). getName (), re. getMessage (), "Spider. dealWithPageLink (enterSourcePage) ","Medium", false); } catch (Exception e) { System. out. println ("Exception entering error in dealWithPageLink (enterSourcePage) ") ; } } } }

else { if ( (foundpathName. startsWith ("http ://")) ! (foundpathName. startsWith ("HTTP ://")) Il (foundpathName. startsWith ("ftp ://")) ici (foundpathName. startsWith ("FTP ://")) Il (foundpathName. startsWith ("www.")) II (foundpathName. startsWith ("WWW."))) { m externalLinks. put ( new Integer (mexternalLinks. size () + 1), foundpathName) ;//found link away from baseDirectory } else { if ( ! (foundpathName.startsWith(m strBaseDirectory )) ) {//relative link foundpathName = m~strBaseDirectory. concat (foundpathName) ; Enumeration enum = m~foundLinks. elements () ; boolean indexA = false;

while (enum. hasMoreElements () & & ! indexA) { Vector v = (Vector) enum. nextElement (); if (String) v. elementAt (0)). equalsIgnoreCase (foundpathName)) indexA = true; } boolean indexB = m~servedLinks. contains (foundpathName); /* enum = mserverLinks. elements (); boolean indexB = false; while (enum. hasMoreElements() & & ! index) { Object [] obj = enum. nextElement () ; if ((String)obj[0].equalsIgnoreCase(foundpathName))

indexB = true ; } *I if iindexA & & iindexB) { int key = m~foundLinks. size (); while (mfoundLinks. contains Key (new Integer (key))) key++; String [] s = downloadHTMLPage (foundpathName) ; try { //Limit length of filename to 215 chars, otherwise get strange SQL Exception

if ( ( (String) s [0]). length () > 215) s [0] = ( (String) s [0]). substring (0, 215) ; Integer spID = spider. enterSourcePage (m~ScanID, seD], s[l], s[2]) ; Vector v = new Vector (); v. add (foundpathName) ;

v. add (spID) ; mfoundLinks. put ( new Integer (key), v) ; } catch (Exception re) { System. out. println ("Unable to enter SourcePage"); re. printStackTrace();

try ( spider. enterError ( ScanID, re. getClass (). getName (), re. getMessage (), "Spider. dealWithPageLink (enterSourcePage2)","Medium", false) ; } catch (Exception e) { System. out. println ("Exception entering error in dealWithPageLink (enterSourcePage2)") ; } }

} } } } m strBaseDirectory = temp ; System. out. println ( "dealWithPageLinktfoundpathName :"+ foundpathName) ; } private void dealWithMediaLink (String foundpathName, Integer sourcePageID ) { // System. out. println (

"dealWithMediaLink&num;foundpathName :"+ foundpathName) ; if (foundpathName =="") return ; //save media link String temp = m~strBaseDirectory ; if (foundpathName. endsWith ("/") ) foundpathName = foundpathName. substring (0, foundpathName. length ()-1 if (foundpathName. startsWith ("/")) { foundpathName = foundpathName.substring( 1 );

getBasicDirectory (m~strStartingBaseDirectory) ; foundpathName = mstrBaseDirectory + foundpathName ; //System. out. printin ("i am here"+ foundpathName) ; } while (foundpathName. indexOf ("../") ! =-1) { StringBuffer strBuf = new StringBuffer ( foundpathName ); getBasicDirectory ( m~strBaseDirectory ) ; int oc = foundpathName. indexOf ("../") ; strBuf = strBuf. replace( 0, (oc+3), mstrBaseDirectory) ; foundpathName = strBuf. toString () ; } if ( (lastDirectory ( mstrBaseDirectory)). compareToIgnoreCase ( firstDirectory( (foundpathName))) == 0) { to avoid: http ://.../dirl/dirl/example. gif getBasicDirectory m strBaseDirectory) ; foundpathName = m~strBaseDirectory. concat (foundpathName) ; } if( foundpathName. startsWith ("/")) foundpathName = foundpathName. substring (1) ;

if ( (foundpathName. startsWith ( ~strBaseDirectory)) H (foundpathName. startsWith ( mstrStartingBaseDirectory))) { TODO sort out hashTable keys Enumeration enum = mmediaPathVector. elements () ; boolean inVector = false;

while (enum. hasMoreElements () & & ! inVector) { Vector v = (Vector) enum. nextElement () ; if ( ( (String) v. elementAt (0)). equals (foundpathName)) inVector = true; } if ( ! investor Vector vec = new Vector () ; vec. add (foundpathName); vec. add (sourcePageID) ;

mmediaPathVector. put ( new Integer (mmediaPathVector. size () + 1), vec) ; readily formated link } } else { iff ! foundpathName. startsWith ( mstrBaseDirectory)) { foundpathName = m~strBaseDirectory. concat (foundpathName) ; //relative link Enumeration enum = mmediaPathVector. elements () ; boolean inVector = false; while (enum. hasMoreElements() & & ! inVector) { Vector v = (Vector) enum. nextElement () ; if ( ( (String) v. elementAt (0)). equals (foundpathName))

inVector = true ; } iff'inVector) ( Vector vec = new Vector (); vec. add (foundpathName) ;

vec. add (sourcePageID) ; mmediaPathVector. put (new Integer (mmediaPathVector. size () + 1), vec) readily formated link } } } m~strBaseDirectory=temp; // System. out. println ( "dealWithMediaLink&num;foundpathName : " + foundpathName ); }

private String firstDirectory (String link)//returns ONLY the name (no "or/) { System. out. println ("firstDirectorylink :" + link) ; int firstOc =-1; if (link. startsWith ("/")) firstOc = 1; else firstOc = 0; int nextOc = link.indexOf("/", firstOc) ; String back = new String () ; if ( (nextOc ! =-1) & & (firstOc ! =-1)) back = link. substring ( firstOc, nextOc ); else back = link;

System. out. println ("firstDirectory&num;back :"+ back) ; return back ; }

private String lastDirectory (String link)//returns ONLY the name (no "or/) { System. out. println ("lastDirectory&num;link :"+ link) ; if (link. endsWith ("/")) link = link. substring (0, link. length()-1);

else if ( (link. lastIndexOf ("/")) ! =-1) link = link. substring (link. lastIndexOf ("/"), link. length ()1) ; int nextOc = link. lastIndexOf ("/") ; String back = new String () ; if ( nextOc != -1 ) back = link. substring (nextOc + 1, link. length ()) ; else back = link; System. out. println ( "lastDirectory&num;back: " + back) ; return back; }

private boolean saveLinkAsServed (String link) { if (link =="") return false ;

int key = mservedLinks. size () ; while mservedLinks. containsKey (new Integer (key))) key++ ; mservedLinks. put ( new Integer (key), link) ; return true ; } private void downloadMedia() { downloading media for (int i = 1; i < m~mediaPathVector. size ()/*+1*/; i++) { //getting just the names

Vector vec = (Vector) mmediaPathVector. get (new Integer (i)) ; 7 String fileName = (String) vec. elementAt (O) ; System. out. println( fileName ); int lastOc = fileName. lastIndexOf ("/"); if (lastOc ! =-1) fileName = fileName. substring (lastOc + 1 ); //Limit length of filename to 215 chars if (fileName. length () > 215) fileName = (fileName). substring (0,215) ;

//Check for chars not allowed by windows. fileName=fileName. replace (' : fileName=fileName. replace (' ; fileName=fileName. replace ? fileName=fileName. replace('"','&commat;') fileName=fileName. replace (' < ','&commat;') fileName=fileName. replace (' > ','&commat;') fileName=fileName. replace (' #', '&commat;'); /* String fileName = ((String) (m~mediaPathVector. get (new Integer (i)))) ; fileName = f ilename. replace

fileName = fileName. replace (' :','*') ; *1 try { URL mediaURL = new URL ( (String) vec. elementAt (O)) ; InputStream mediaIn = mediaURL. openStream () ; //*** preparation for check

//save media in file-- > found a db match ! ! ! File file = new File ( "c :/netsertion/spider/downloadMedia/"+ fileName) ; FileOutputStream fileOut = new FileOutputStream ( file ); byte mediaInData [] = new byte [500] ; int sumReading = 0; int reading median. read (mediaInData, 0, 500) ; sumReading = reading ;

while (reading ! =-1) { fileOut. writef mediaInData, 0, reading) ; reading = median. read (mediaInData, 0, 500) ; sumReading += reading ; ) spider. enterFoundMedia ( (Integer) vec. elementAt (1), fileName,"c :/netsertion/spider/downloadMedia","TODO","Unscanned") ; noDownloads++ ;//counting how many downloads has been done } catch (RemoteException re) { System. out. println ("RemoteException at downloadMedia") ; re. printStackTrace () ; }

catch (Exception e) { System. out. println ("Error :"+ e. toString () +""+ e. getMessage ()) ; e. printStackTrace () ; try ( spider. enterError (mScanID, e. getClass (). getName (), e. getMessage (),"Spider. downloadMedia","Medium", false); } catch (Exception re) { System. out. println ("Exception entering error in downloadMedia"); } } } try { spider. setNoDownloads (m DomainID, new Integer (noDownloads)); } catch (RemoteException re) {

System. out. println ("RemoteException setting NoDownloads") ; } } private void downloadHtml~Pages () { downloading media for (int i = 1 ; i < mservedLinks. size ()/*+1*/ ; i++) { System. out. println( "###' + ((String) (m~servedLinks. get (new Integer (i)))). toString ()) ; //getting just the names String fileName = ((String)(m~servedLinks. get ( new Integer (i) ) ) ) ; int lastOc = fileName. lastIndexOf ("/") ; if (lastOc ! =-1) fileName = fileName. substring (lastOc + 1 //Limit length of filename to 215 chars if (fileName. length () > 215) fileName = (fileName). substring (0, 215) ;

//Check for chars not allowed by windows. fileName=fileName. replace (':','&commat;') fileName=fileName. replace ('*','&commat;') fileName=fileName. replace ( ? fileName=fileName. replace ('"','&commat;') fileName=fileName. replace (' < ','&commat;') fileName=fileName. replace (' > ','&commat;') fileName=fileName. replace (I /* String fileName = ( (String) (mservedLinks. get (new Integer (i) ))) ; fileName = fileName. replace (

fileName = fileName. replace (' :','*') ; *1 String inputLine = new String () ; try { Integer a = new Integer (i) ; String s = (String)m~servedLinks. get ( a) ; URL httpURL = new URL (s) ; BufferedReader reader = new BufferedReader (new InputStreamReader (httpURL. openStream ())) ; File file = new File ("html-pages/" + fileName) ; FileWriter writer = new FileWriter (file); while ( (inputLine = reader. readLine ()) ! = null) {

//System. out. println ("inputLine :"+ inputLine) ; writer. write ( inputLine +"\n") ; } reader. close () ; writer. close () ; } catch (Exception e) { System. out. println ("Error :"+ e. toString () +""+ e. getMessage ()) ; e. printStackTrace () ; try ( spider. enterError m~ScanID, e. getClass (). getName (), e. getMessage(), "Spider.downloadHtml~Pages", "Medium", false); } catch (Exception re) ( System. out. println ("Exception entering error in downloadHTMLPages");

} } } } private String [] downloadHTMLPage (String page) { String fileName = page; if (fileName.endsWith("/"l)) fileName = fileName. substring (0, fileName. length ()-1); int lastOc = fileName. lastIndexOf ("/") ; if( (lastOc ! =-1) fileName = fileName. substring (lastOc + 1) ; //Limit length of filename to 215 chars if (fileName. length () > 215) fileName = (fileName). substring (0, 215);

//Check for chars not allowed by windows. fileName=fileName. replace (' :','&commat;') ; fileName=fileName. replace ('*','&commat;') ; fileName=fileName. replace (' ?','&commat;') ; fileName=fileName. replace ('"','&commat;') ; fileName=fileName. replace (' < ','&commat;') ;

fileName=fileName. replace (' > ','&commat;') ; fileName=fileName. replace (' !','&commat;') ; System. out. println ("FILENAME:"+fileName) ; String inputLine = new String () ; try { URL httpURL = new URL (page) ;

BufferedReader reader = new BufferedReader (new InputStreamReader (httpURL. openStreamO)) ; File file = new Filef"c :/netsertion/spider/html-pages/" + fileName) ; FileWriter writer = new FileWriter (file) ; while ( (inputLine = reader. readLine null) { //System. out. println ("inputLine : " + inptLine ); writer. write ( inputLine +"\n"); } reader. close () ; writer. close () ; }

catch (Exception e) { System. out. println ("Error :"+ e. toStringO +""+ e. getMessage ()) ; e. printStackTrace () ; try { spider. enterError ( mScanID, e. getClass (). getName (), e. getMessage (),"Spider. downloadHTMLPage", "Medium", false); } catch (Exception re) {

System. out. println ("Exception entering error in downloadHTMLPage"); } }

String [] s = {fileName,"c :/netsertion/spider/html-pages/", page return s ; I }

Claims

CLAIMS: 1. A method of protecting intellectual property rights in a user's media on a network of computers, the method comprising the steps of : a) receiving user media in which the user enjoys intellectual property rights; b) generating at least one user digital identification signature from each item of said user media; c) searching said network of computers for potentially infringing media; d) generating at least one suspect digital identification signature for each item of said potentially infringing media; e) comparing said user and suspect digital identification signatures to determine their degree of similarity; and f) producing a notification if said degree of similarity exceeds a predetermined level.
2. A method as claimed in claim 1, which further comprises the steps of : a) generating at least one user key, consisting of between one and ten integers, from the user's media; b) generating a suspect key, consisting of between one and ten integers, from each item of potentially infringing media; and c) before carrying out step (e) of claim 1, comparing said user and suspect keys for each item of potentially infringing media to determine their degree of similarity, and carrying out step (e) of claim 1 only if the degree of similarity of the keys exceeds a predetermined value.
3. A method as claimed in claim 1 or 2, which further includes the step of human operators comparing items of user media against items of potentially infringing media for which said notification has been produced.
4. A method as claimed in any preceding claim in which said searching step is carried out by a number of search spiders.
5. A method as claimed in claim 4, wherein media retrieved by said search spiders is stored in a file cache prior to any of said comparing steps being carried out.
6. A method as claimed in any preceding claim, wherein said user media, potentially infringing media and digital identification signatures are stored in a central database.
7. A method as claimed in claim 6, wherein copies of said database are provided at a number of hosting bunkers.
8. A method as claimed in claim 7, wherein at least some of said hosting bunkers reside in different countries.
9. A method as claimed in any one of claims 6 to 8, when also dependent directly or indirectly on claim 4 or 5, wherein each bunker is provided with said search spiders.
10. A method as claimed in any preceding claim, wherein said network of computers is the Internet.
11. A method as claimed in any preceding claim, wherein at least 5 user digital identification signatures are used for each item of said user media.
12. A method as claimed in any preceding claim, wherein at least 5 user digital identification signatures are used for each item of said potentially infringing media.
13. A method as claimed in any preceding claim, wherein each digital identification signature is stored in a digital identification signature file.
14. A system for carrying out the method of any preceding claim, the system comprising at least one computer programmed to generate said user and suspect digital identification signatures, and to compare said signatures to determine their degree of similarity.