GB2379290A - Information classification system - Google Patents

Information classification system Download PDF

Info

Publication number
GB2379290A
GB2379290A GB0120823A GB0120823A GB2379290A GB 2379290 A GB2379290 A GB 2379290A GB 0120823 A GB0120823 A GB 0120823A GB 0120823 A GB0120823 A GB 0120823A GB 2379290 A GB2379290 A GB 2379290A
Authority
GB
United Kingdom
Prior art keywords
information
code
classification code
classification
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0120823A
Other versions
GB0120823D0 (en
Inventor
Nicholas Frearson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FREAR WEB SOLUTIONS Ltd
Original Assignee
FREAR WEB SOLUTIONS Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FREAR WEB SOLUTIONS Ltd filed Critical FREAR WEB SOLUTIONS Ltd
Priority to GB0120823A priority Critical patent/GB2379290A/en
Publication of GB0120823D0 publication Critical patent/GB0120823D0/en
Priority to PCT/GB2002/003830 priority patent/WO2003021481A2/en
Publication of GB2379290A publication Critical patent/GB2379290A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Abstract

A classification code generation system generates classification code for information. A user interface provides selectable hierarchical classifications for information. Received user selections of the hierarchical classifications are used to generate classification code and the classification code is stored in association with the information and/or in a database with identification information identifying the information.

Description

<Desc/Clms Page number 1>
INFORMATION CLASSIFICATION SYSTEM The present invention generally relates to an information classification method and apparatus for generating classification code for information for storage in association with the information and/or in a searchable database.
The infrastructure, content and consumer usage of the Internet has been growing both organically and exponentially since its origin in the 1980's. Many consumers, businesses as well as individuals are still wary of using the Internet, primarily because of the perceived difficulty that exists in finding specific and particularly local information from it. This problem is being tackled in part by some companies who are generating local directories of local Internet content. However, the Internet is a global phenomenon that requires a global solution.
Currently, websites can be found in many varied guises, generated from a multitude of independent sources. Practically the only constant governing their generation has been by the use of the Hyper Text Meta Language (HTML) as the framework to which the many other website development languages available today have been attached.
To find a website, the interested observer must either know the Universal Resource Locator (URL) for a particular website, or use one of the many Search Engines which exist to help them find the information that they require. The term Search Engine is used to describe a number of methods for acquiring, storing and retrieving information.
All of which are transparent to the normal user. All Internet Search Engines maintain a database of websites that they categorize in some nominal way so as to facilitate finding information from them. Search Engines range from the general purpose, cataloguing information on everyday subjects across a broad spectrum, to the very specific targeting the cataloguing of information on narrow and specialist subjects. The problem arises with how the Search Engine retrieves information from a website or database in the first instance and how the collected information is catalogued accurately and consistently to
<Desc/Clms Page number 2>
facilitate quick and accurate searching by the consumer. For example, there is no standard for categorizing the geographical location of a website, an increasingly necessary requirement as the Internet evolves from primarily an information provider into a more commercially exploitable market place with a local market as well as a global one.
Currently search engine companies use two main methods for collecting and collating website information.
The first method involves website owners or maintainers registering their website or websites with a Search Engine company. The company reviews or indexes the website by hand and at some point adds this index to their register or database. The problem
WiLli uiis me+Utvu I Lkla+t +Litt%-,--P+ e Le or with this method is that the quality of the information that is stored about the website or database is only as good as the reviewer or the reviewers remit. There is no universal standard. Consequentially, the owner of the website or database has no control over the content of the references made to their website, and most importantly, how visible that website will be to the user trying to find it.
The second method involves the Search Engine Company utilizing a Spider or search programme to scour the Internet looking for websites to index. This method does not, in theory, require you to register your website at all although there is no guarantee as to when and if the Spider will find and index your website. In fact this type of Search Engine company also recommend that you register your website with them so that they can target it to be indexed by the Spider in some nominal timeframe. When the Spider visits the website it has a number of alternative methods to choose from to help it to gather information. For instance, it may look for the occurrence of key words or phrases in the text or special tags in the HTML code, or may index whole pages, checking the number of times that words appear and classifying them according to some pre-defined scheme. This information is later processed by the Search Engine company and stored in a database. The method of processing and subsequent retrieval of this data depends on which company is operating a particular Search Engine, the search criteria and retrieval method invariably being different from company to company.
<Desc/Clms Page number 3>
The problem arising with either method of searching and indexing described above is that, since both have evolved organically along with the growth of the internet, there is no one clear-cut method of webpage or database indexing in use today. Also, because of the way that the system has evolved, the information that the Search Engine companies index about a particular website may be incomplete or inaccurate. The website designer currently has little or no control over this. As a result a whole industry has grown up who's sole task is to advise website developers how to design their websites so that they have the best chance of being accurately indexed by'specific'Search Engine companies.
Knowledge that often costs a substantial amount of money. With no guaranteed outcome. In addition to giving the Search Engine companies great power over the internet designer this means that the ability of the interested observer to find that vital piece of information they are looking for can only ever be as good as the indexing capabilities and the database management of the Search Engine companies and the targeting of the original data by the website designer.
All of this has lead to the surprising situation which we see today whereby finding the information that one is looking for on the Internet is not an exact science and carrying out essentially simple search tasks such as the listing all websites for furniture shops in Cambridge, England, for example, produces varied, poor and sometimes wildly inaccurate results.
The present invention provides a method and apparatus for generating a classification code for information which can be stored in association with the information and/or in a searchable database. A classification code is generated by providing a user with a user interface comprising selectable classifications for the information. Thus a user such as the owner of the information, or an administrator can select the appropriate classifications for information using a menu system. In this way the classifications comprise constrained sets which can be arranged with some hierarchy. The classification codes are generated in response to the user selections and are associated with the data. This association can be by storing the classification codes in association with the information, or storing the classification codes linked to the information in a data store such as a database.
<Desc/Clms Page number 4>
The present invention thus provides a simple means by which a user can manually classify data without knowledge of the classification system or the classification codes used.
Although the present invention is particularly suited to the classification of web pages HTML code and other web page content) over a network such as the Internet, the present invention is applicable to any information stored electronically in a computer system. For example, the information can comprise audio video and text, and even computer programs and computer code.
In one embodiment of the present invention, the classification code is stored in association with the information and in a database along with identification information identifying the information that is input by a user. Thus the database contains identification information identifying the user and the classification code and is thus searchable using the classification code in order to identify required information.
The storing of the classification code in association with the information enables the information to be searched itself using the classification code. This enables search engines to search for information and build up their own database. The identification information can comprise any pointer to the information. For example, it can comprise a file name, a logical address, or a universal resource locator (URL).
In one embodiment, classification code is stored in a logical location linked to the logical location of the information. For example, the classification code can be stored in a file linked to the file containing the information.
In another embodiment of the present invention, the classification code is added to the information. This embodiment of the present embodiment is particularly suited to information in the form of a markup language wherein the classification code can be added to the information as a tagged code. In particular, in web pages that comprise hypertext mark-up language, the tagged code comprises one or more meta tags. Thus this embodiment of the present invention enables a search engine to perform conventional parsing of the hypertext mark-up language in order to identify relevant
<Desc/Clms Page number 5>
parameters to be stored in the database. The meta tag containing the classification code can be identified during the parsing operation and it can be stored for the web page.
Thus search engines can build up a database which can be searched using the classification code.
Another aspect of the present invention provides an information search method and apparatus for use with the database classification code generated on the basis of the user selections. A user inputs a classification code that is used to look-up corresponding identification information in the database and this is then output. Thus in accordance with this aspect of the present invention, the use of the classification codes provides a structured search strategy for information classified in accordance with the classification code.
In accordance with another aspect of the present invention, there is provided a method and apparatus for generating a searchable database using a database of classification codes and information identifiers. The information identifiers for information are read and the information is then located and retrieved using the identification information.
Searchable parameters are then extracted from the information and stored in the database in association with the corresponding classification codes and identification information.
The present invention is particularly suited for implementation using a computer system comprising one or a number of network computers. The present invention can however be implemented in dedicated hardware or a combination of dedicated hardware and software. Thus the present invention can be implemented using a general-purpose computer suitably programmed. A computer program for control a general-purpose computer thus comes within the scope of the present invention and the present invention encompasses a carrier medium carrying computer readable code. The carrier medium can comprise a transient medium such as an electrical, optical, microwave, acoustic, or radio frequency signal. An example of such a signal is a TCP/IP signal carrying computer code over an IP network such as the Internet. The carrier medium can also comprise a storage medium such as a floppy disk, CD-ROM, magnetic tape device, or solid state memory device.
<Desc/Clms Page number 6>
Embodiments of the present invention will now be described with reference to the accompanying drawings, in which: Figure 1 is a schematic diagram of a generalized embodiment of the present invention; Figure 2 is a flow diagram illustrating the method of operation of the embodiment of Figure 1 ; Figure 3 is a schematic diagram of a specific embodiment of the present invention; Figure 4 is a flow diagram illustrating the generation of the classification code; Figure 5 is a flow diagram illustrating the spidering operation to complete the data in the classification database; Figures 6a and 6b illustrate the user interface for use by a user to select the classification
data and to enter personal information ; ua. a aAu. v W svn s vs a < 1vn Figure 7 is a table illustrating the main character blocks of the classification code; Figure 7b is a table illustrating the component blocks of the location string; Figure 7c is a table illustrating the component blocks of the subject type string; Figure 7d is a table illustrating the component block of the vector string; Figure 8 is a diagram of a user interface providing an information request form for performing a search; Figures 9a to 9e illustrate the operations of a user interface during a search; Figure 10 is a schematic diagram of a second specific embodiment of the present invention; Figure 11 is a flow diagram illustrating the operation of the embodiment of Figure 10; Figure 12 is a schematic diagram of a third specific embodiment of the present invention; and Figure 13 is a flow diagram illustrating the operation of the embodiment of Figure 12.
A generalized embodiment of the present invention will now be described with reference to Figures 1 and 2.
Figure I schematically illustrates a generalized classification system. Data 2 is accessed by a classifier 1 and classification data 3 is generated which is associated with the
<Desc/Clms Page number 7>
corresponding data 2. A classification database 4 is also formed in which is stored the classification data and data identifiers identifying the corresponding data.
The classifier 1 includes a user interface la for generating an interface to a user to enable a user to make classification selections from a menu based interface. The selections made by the user are received by a classification data generator 1 b within the classifier 1 for the generation of the classification data for the data. The classification data 3 is then input for association with the data 2. If the user inputs identification data using the user interface la, the classification data and the identification data can be entered into the classification database 4.
Referring to Figure 2, in step Sl, a user enters identification data (an ID) for the data to be classified. This ID can be any pointer to a logical location of the data. For example, it can be a filename, or for web pages it can be a universal resource locator (URL).
Using the ID, the classification database 4 can be accessed to look-up to see whether the data has already been classified. If the look-up operation determines that the data has not already been classified (step S3), a new record is made in the database with the ID (step S4).
If the look-up operation in the classification database determines that there is already classification data for the ID (step S3), a list ofIDs is displayed and a request is output to the user to identify whether to reclassify or add a classification to the ID (step S5). A user can select an ID and select to reclassify or add a classification to the classification database (step S6). If a user wishes to reclassify the data, the user can select which classification data to reclassify (step S7) if there is more than one and the record identifier is then opened (step S8). If a user chooses to add a new classification, a new entry is made in the database with the ID selected (step S9).
Thus from step S4, step S8, or step S9, a record is opened. A user can then enter personal data using the interface 1 a (step S 10). This data is entered into the classification database in the record. Classification menus are then displayed (step S 11) and a user makes selections of the classification criteria (step S12). The classification criteria can comprise a bounded set having some hierarchical structure. This constrains
<Desc/Clms Page number 8>
the classification by the user to within the structure of the classification provided. Once a user has entered the selections, classification code is generated by the classification data generator 1 b (step S 13) and the generated code and other data entered by the user is entered in the record and the record in the classification database is closed (step S14).
Classification code is then output as classification data 3 for association with the data 2 (step S 15). The classification data 3 can be associated with the data 2 and be external to the data, or it can be inserted into the data itself.
A first specific embodiment of the present invention will now be described with reference to Figures 3 to 9.
Figure 3 is a schematic diagram of a network of computers. In this embodiment a network 10 comprises an Internet Protocol (IP) network which can comprise the Internet, a local area network, or a wide area network. A classifier server 20 is connected to the network 10 and comprises a web server 21 for serving classifier pages 22 to a web browser over the network 10. A classifier application 23 is provided to perform the classification operation and for storing classification data in a classification database 26. A search engine 24 is provided interfaced to the web server 21 to allow the classification database 26 to be searched in a conventional web search manner. A spider application 25 is also provided for performing a spidering operation to access web pages having classification data entries in the classification database 26 in order to extract additional information and add it into the classification database 26.
A user's computer 40 comprises a web browser 42 for accessing the web server 21 at the classifier server 20. A File Transfer Protocol (FTP) client 41 is provided for uploading the web pages to a web host 50. An offline copy of the user's web pages 43 is stored in the user's computer and can be edited by an HTML editor 44. Thus a user is able to modify their web pages and upload them using the FTP client 41 to a web host 50 over the network 10. At the web host 50 a web server 52 is provided to serve the user's web pages 53 to any web browsers wishing to access them. An FTP server 51 is provided to be responsive to the FTP client 41 to allow the uploading of the user's web pages 43.
<Desc/Clms Page number 9>
A search engine server 30 is provided which operates in accordance with the principles of a conventional web search engine. A web server 31 is provided to provide the interface and a search engine 32 is provided for searching a classification database 34.
The search engine server also includes a spider application 33 for performing a spidering operation to search for web pages to be entered into the classification database 34. The search engine 32 differs from conventional search engines in that a search criteria that can be entered by a user is the classification code. The classification database 34 contains entries identifying web pages and where applicable the classification code for the web page. The spider application 33 performs a spidering operation in order to identify web pages, parse the HTML and enter the data into the classification database 34.
The principles of operation of this embodiment of the present invention in order to generate the classification database 26 will now be described with reference to the flow diagram of Figure 4.
In order to classify a web site, or a web page, a user uses their web browser 42 on user's computer 40 to access the web server 21 at the classifier server 20 (step S20). The web server 21 returns the classifier pages 22 and the user enters personal details, the URL of the web page or web site to the classified, and selects classifying criteria from the displayed menus (step S21). The classifier application 23 implemented at the classifier server 20 receives the entered details and user selections from the web server 21 and generates classification code and HTML code. The HTML code is returned to the web server 21 for display in a window on the web page (step S22). The classifier application 23 enters the classification code, the URL, and the other data entered by the user as a record in the classification database 26 (step S23).
The user is thus able to use the HTML code output in the window on the web page and they can use their HTML editor 44 to appropriately modify the user's web pages 43.
The modified HTML can then be uploaded using the FTP client 41 via the FTP server 51. In this way the user's web pages 53 at the web host 50 are updated to include the classification code as an HTML tag (step S24).
<Desc/Clms Page number 10>
So far the classification database 26 thus has entered in it a classification code, corresponding URL, and other data entered by the user. The spider application 25 at the classifier server 20 can thus use the URLs in the classification database 26 in order to perform a conventional spidering operation in order to add further data by retrieving and processing the relevant web pages (step S25 and step S26). The classification database is then complete and accessible over the network. The classification database 26 can be searched by the search engine 24 using the classification codes (step S27).
This embodiment of the present invention also enables third party search engines to operate using the classification codes. A search engine 32 in a search engine server 30 must be of modified form to enable searching by classification codes. A spider application 33 must also be of modified form to enable the classification codes stored
within the tags in the web pages to be identified during the conventional HTML parsing .--n...-.-t,-.....-,---r---.----.-......-a.-si. ng operation (step S30). The URL, classification code and other data obtained by the parsing operation can then be entered in the classification database (step S31). The classification database 34 is thus updated and accessible over the network 10 and is searchable by classifications for web pages that have been identified by the spider application which carry the classification code. If the classification system is adopted widely across the Internet, the number of web pages containing the classification code will be large and thus the classification system provides a simple method for accurate searching for information.
Figures 6a and 6b are a diagram illustrating the user interface allowing the user to enter data, the URL, and to make classification selections. Figures 6a and 6b comprise a user form that is made available as one of the classifier pages 22 served by the web server 21.
In this embodiment of the present invention, the classification code defines the type of subject and its geographic location in the world. The user form will guide the user through the various stages of defining the type of subject and its geographical location in the world. This is done through a number of drop-down menus that provide options for each location category. The categories in both the location and description groups cascade from wider limits to narrower limits each time a category is selected.
<Desc/Clms Page number 11>
Therefore, it is possible to download a new page from the server 21 defining the next category in the chain each time a selection is made. In this way, the apparent overhead incurred while downloading the data for each category at the start (which may be quite large) will be minimized since data will only be downloaded when it is needed.
Figure 6a illustrates the part of the user form for entering personal data. In Figure 6b, the part of the user form for entering the URL and for making the classification selections is illustrated. The geographic selections are based on country, county, city and town. Also, selections can be made to indicate the proximity of the business or subject to a city and town. The subject of the website can be classified in accordance with a major category and a minor category. Thus in this way the subject and location related to the website can be selected in accordance with a classification hierarchy.
Figures 7a to 7d illustrate the classification code structure in accordance with embodiments of the present invention. The code comprises a number or numbers that can be incorporated into the HTML header code in web pages. The code uniquely identifies the subject associated with a website or web page both by content and by location. More than one code can be used to identify different aspects of the subject matter. The code is derived from specific information requested from the website originator and developer, or maintainer by means of the electronic form described hereinabove.
The classification code in this embodiment of the present invention comprises a DESCRIPTOR word formed from a single 24-character string. Figure 7a is a table illustrating the sub-components of the string. The first 12 characters comprise a location string used to define the geographical location of the subject associated with the content of the website or web page. Figure 7b is a table illustrating the breakdown of the location string into a 4-character block country code which can have a range in hexadecimal of $0000 to $FFFF. It is thus possible to define 65,536 countries in accordance with individual country codes. A state or county code comprises a 2character block having a hexadecimal range of $00 to $FF. This enables 256 states or counties to be coded. A city code comprises a 3-character block having a hexadecimal range of $000 to $FFF which enables 4,096 cities to be coded uniquely. A town code
<Desc/Clms Page number 12>
comprises a 3-character block with a hexadecimal range of $000 to $FFF which enables 4,096 possible towns to be uniquely encoded.
When a user wishes to locate a subject, business, or business type associated with a web page or web site, they can use a combination of the location codes as filters to target the search within the database as will be described in more detail hereinafter.
The subject type string in the classification code illustrated in the table of Figure 7a is used to define the type of subject. The structure of the subject type string is illustrated in the table of Figure 7c and comprises a major category and a minor category as a subgroup. The major category comprises a 4-character hexadecimal block with the range $0000 to $FFFF which enables 65,536 possible major categories to be encoded.
The minor category coding comprises a 4-character hexadecimal block with the range . 1--c7 ir $0000 to $FFFF which enables 65,536 possible minor categories of subject types to be encoded.
When a user wishes to locate a subject, business or business type, they can use a combination of the subject type codes as filters to target the search within the database as will be described in more detail hereinafter.
The vector string comprises a 4-character block as illustrated in the table of Figure 7d that defines the geographical distance of a subject from a town or city. This is a useful search parameter that can be used by a user wishing to find all the instances of a subject type within a given distance of a town or city.
Thus in the user form illustrated in Figures 6a and 6b, the location and subject selections can be based hierarchically on the classification codes. For example, when a user selects the UK as a country, the classifier will present counties within the UK as the next selectable classification. In this way the menu displayed from which a user can select is dependent upon a previous selection. Thus the classification codes are based on user selections hierarchical classifications.
<Desc/Clms Page number 13>
The generated descriptor code in the form of a 24-character hexadecimal string will be incorporated into the HTML header code embedded in the HTML source code as shown below.
< HTML > < HEAD > < TITLE > < "A website or webpage that is very easy to find........" > < /TITLE >
< META NAME="description"content=="webpage description.........." > < META NAME="keywords"content="lots of key words < META NAME="descriptor 1 " content="xxxxxxxxxxxxxxxxxxxxxxxx" >
...........
...........
...........
< /HEAD > As can be seen, a meta tag termed"descriptorl"is a meta tag used to hold the classification code as a descriptor word. It is possible for more than one descriptor word to be used to enable the web page or website to be classified more than once.
Thus the meta tag "descriptor2", "descriptor3", etc. , can be used for multiple descriptor words.
In this embodiment of the present invention, the HTML code inserted in the user's web page is output in a window on a web page to enable a user to manually insert the code in the HTML source code on the user's computer 40. Suitable code comprises: < META NAME ="descriptorl"content ="xxxxxxxxxxxxxxxxxxxxxxxx" > In addition to outputting the code, instructions can be given to a user to inform them where to place the code in their HTML source code.
A method of searching for web pages using the classification database 26 and the search engine 24 will now be described with reference to Figures 8 and 9. This method is also applicable to searching using the search engine 32 and the classification database 34 where classification codes are available for web pages on third party search engine servers.
<Desc/Clms Page number 14>
Figure 8 is a diagram of the search window generated by the web server 21. A search window allows a user to enter the type of subject or business that is to be searched for.
The geographical location can then be selected hierarchically by selecting the country, then the state or county, then the city, and then the town. Also areas of the search within a range of the city or town can be selected.
Figures 9a to 9e illustrate the process in more detail with regard to a specific search. If the user wishes to find all furniture shops within a 10 km radius of Cambridge, England, a user uses a web browser 42 to access the search page provided by the search engine 24 and the web server 21. In the search window the user types in"furniture shops" (Figure 9a). A user then selects one or more of the location fields on the form to narrow down the breadth of the search using the drop-down menus provided (Figure 9b). A user
clicks on the drop-down menu for the"country"selector and chooses"UK". The user then clicks on the drop-down menu for the"state/county"selector and chooses "Cambridgeshire". The user then clicks on the drop-down menu for the"city"selector and chooses"Cambridge". The user then clicks on the drop-down menu for the"area of search"selector and chooses"10", thereby limiting the search to within 10 km from the centre of Cambridge. In this example the user does not enter anything into the"town" field.
The search engine then performs the best match with the details in the categories in the classification database 26 and returns the display illustrating in Figure 9c. Using the text string typed in the search window, the search engine 24 has located the closest 5 subject categories to the input subject. A user then selects the closest subject area (furniture repairers in this instance) and a number of records are returned (Figure 9d).
To obtain more information, a user can make a selection and obtain more information as illustrated in Figure 9e. This further information can be the additional information entered by a website administrator using the form illustrated in Figure 6b.
The second specific embodiment of the present invention will now be described with reference to Figures 10 and 11.
<Desc/Clms Page number 15>
In this embodiment of the present invention many of the components are the same as that of the first specific embodiment of the present invention illustrated in Figure 3 and thus like reference numerals are used. For brevity, only the features that differ will be described.
In this embodiment of the present invention, the classifier server 20 is not provided with a classifier application. Instead classifier code 29 is available and downloadable from the web server 21. When the code is downloaded by a user's computer 40, a classifier application 45 can be implemented at the user's computer 40 to perform the classification operation. The classification code can then be passed back to the web server 21 and to the classification database 26.
The operation of this embodiment of the present invention will now be described with reference to the flow diagram of Figure 11. A user uses the web browser 42 to access the web pages 27 served by the web server 21 in the classifier server 20 (step S40). The classifier code 28 is downloaded and installed onto the user's computer 40 (step S41).
The classifier application 45 is run at the user's computer 40 (step S42) and the user enters details (URL and additional data) and selects the classification criteria using the user interface generated by the classifier application (step S43). The classifier application then generates classification code (step S44) and HTML code is generated as a wrapper around the classification code (step S45). The classifier application then automatically amends the HTML code 43 for the user's web pages to include the HTML code wrapper. This step can alternatively be performed manually. Updated web pages are then uploaded to the web host 50. This can either be performed manually using the FTP client, or the classifier application 45 can perform this automatically.
When the classification code is generated by the classifier application 45, the classifier application 45 also sends the classification code, URL, and other data entered by the user to the web server 21 (step S48). The classification code, URL and other data entered by the user are added as a record to the classification database 26 (step S49).
The spider application 25 then uses the URL in the classification database 26 to retrieve and process the web page to generate other data (step S50). The other data is then
<Desc/Clms Page number 16>
added to the classification database (step S51). The classification database is then complete and accessible over the network (step S52).
Thus in this embodiment of the present invention, the classifier is implemented not at the classifier server, but instead at the user's computer.
A third specific embodiment of the present invention will now be described with reference to Figures 12 and 13.
This embodiment of the present invention has many similar features to the first specific embodiment of the invention and thus like reference numerals have been used for like features.
This embodiment of the present invention differs from the first embodiment in that an applet 46 is downloaded to the user's computer 40 to generate HTML code and add the generated HTML code to the web page code 43.
The operation of this embodiment of the present invention will now be described with reference to the flow diagram of Figure 13.
A user uses the web browser 42 to access the classifier pages 22 served by the web server 21 (step S60). A user enters details (URL and other data) and selects classification criteria as described hereinabove for previous embodiments (step S61).
The data is received by the web server 21 and passed to the classifier application 23 that generates classification code in accordance with the selected classification criteria (step S62). The generated classification code, the URL and other data entered by the user are added to a record in the classification database 26 (step S63). The web server 21 then downloads an applet 46 via the web browser 42 to the user's computer 40. The applet generates HTML code containing the classification code and adds it to the user's web pages 43 (step S65). The user can then upload the new web pages to the host using the FTP client 41 (step S66). The spider application 25 in the classification server can use the URLs in the classification database 26 to retrieve and process web pages to generate other data (step S67). The other data that is generated by the spidering process is added
<Desc/Clms Page number 17>
to the classification database 26 (step S68) to complete the data and the classification database 26 is then accessible over the network and searchable by classifications (step S69).
It can thus be seen that in this embodiment of the present invention the classifier application generates classification code for addition to the classification database and for addition to the web pages but the applet 46 performs the actual addition process.
The process is thus split between the user's computer 40 and the classifier server 20.
It is thus apparent from the embodiments of the present invention described hereinabove that the classification process can be implemented either partially or completely on any machine.
Although the present invention has been described hereinabove with reference to specific embodiments, it will be apparent to a skilled person in the art that modifications lie within the spirit and scope of the present invention.

Claims (1)

  1. CLAIMS: 1. A method of generating classification code for information, the method comprising : generating a user interface providing selectable hierarchical classifications for the information ; receiving user selections of the hierarchical classifications; generating classification code on the basis of the user selections; and storing the classification code in association with the information.
    2. A method according to claim 1, including receiving identification information identifying the information, and submitting the classification code and identification
    information to a database to allow the information to be searched for and identified d using the classification code.
    3. A method according to claim 1 or claim 2, wherein storing of the classification code comprises adding the classification code to the information.
    4. A method according to claim 3, wherein the information comprises information in the form of a mark-up language and the classification code is added to the information as tagged code.
    5. A method according to claim 4 wherein the mark-up language comprises hypertext mark-up language and the tagged code comprises a meta tag.
    6. A method according to claim 1 or claim 2, wherein the classification code is stored in a logical location linked to a logical location of the information.
    7. A method according to any preceding claim, wherein the classification code includes geographic code identifying a geographic location related to the information and content code identifying content of the information.
    <Desc/Clms Page number 19>
    8. A method according to claim 7, wherein the information comprises information on a business or entity and the geographic code identifies the geographic location of the business or entity.
    9. Apparatus for generating classification code for information, the apparatus comprising : user interface means for generating a user interface providing selectable hierarchical classifications for the information; receiving means for receiving user selections of the hierarchical classifications; generating means for generating classification code on the basis of the user selections ; and storing means for storing the classification code in association with the information.
    10. Apparatus according to claim 9, wherein said receiving means is adapted to receive identification information identifying the information, the apparatus including means for submitting the classification code and identification code to a database to allow the information to be searched for and identified using the classification code.
    12. Apparatus according to claim 9 or claim 10, wherein said storing means is adapted to store the classification code by adding the classification code to the information.
    13. Apparatus according to claim 12, wherein the information comprises information in the form of a mark-up language and said storing means is adapted to store the classification code by adding the classification code to the information as tagged code.
    14. Apparatus according to claim 13 wherein the mark-up language comprises hypertext mark-up language and the tagged code comprises a meta tag.
    <Desc/Clms Page number 20>
    15. Apparatus according to claim 9 or claim 10, wherein said storing means is adapted to store the classification code in a logical location linked to a logical location of the information.
    16. Apparatus according to any one of claims 9 to 15, wherein the classification code includes geographic code identifying a geographic location related to the information and content code identifying content of the information.
    17. Apparatus according to claim 16, wherein the information comprises information on a business or entity and the geographic code identifies the geographic location of the business or entity.
    18. A method of generating classification code for information, the method comprising : generating a user interface providing selectable hierarchical classifications for the information; receiving user selections of the hierarchical classifications and identification information identifying the information; generating classification code on the basis of the user selections; and storing the classification code and identification information in a database to allow the information to be searched for and identified using the classification code.
    19. A method according to claim 18, including outputting the classification code for storage in association with the data.
    20. A method according to claim 19, wherein the information comprises information in the form of a mark-up language and the classification code is output as tagged code for addition to the information.
    21. A method according to claim 20 wherein the mark-up language comprises hypertext mark-up language and the tagged code comprises a meta tag.
    <Desc/Clms Page number 21>
    22. A method according to any one of claims 19 to 21, wherein the classification code is output to a logical location linked to a logical location of the information.
    23. A method according to any one of claims 18 to 22, wherein the classification code includes geographic code identifying a geographic location related to the information and content code identifying content of the information.
    24. A method according to claim 23, wherein the information comprises information on a business or entity and the geographic code identifies the geographic location of the business or entity.
    25. Apparatus for generating classification code for information, the apparatus comprising: generating means for generating a user interface providing selectable hierarchical classifications for the information; receiving means for receiving user selections of the hierarchical classifications and identification information identifying the information ; generating means for generating classification code on the basis of the user selections; and storing means for storing the classification code and identification information in a database to allow the information to be searched for and identified using the classification code.
    26. Apparatus according to claim 25, including outputting means for outputting the classification code for storage in association with the data.
    27. Apparatus according to claim 26, wherein the information comprises information in the form of a mark-up language and said outputting means is adapted to output the classification code as tagged code for addition to the information.
    28. Apparatus according to claim 27 wherein the mark-up language comprises hypertext mark-up language and the tagged code comprises a meta tag.
    <Desc/Clms Page number 22>
    29. Apparatus according to any one of claims 26 to 28, wherein said outputting means is adapted to output the classification code to a logical location linked to a logical location of the information.
    30. Apparatus according to any one of claims 25 to 29, wherein the classification code includes geographic code identifying a geographic location related to the information and content code identifying content of the information.
    31. Apparatus according to claim 30, wherein the information comprises information on a business or entity and the geographic code identifies the geographic location of the business or entity.
    32. A method of generating classification code for information, the method comprising : generating a user interface providing selectable hierarchical classifications for the information; receiving user selections of the hierarchical classifications; generating classification code on the basis of the user selections; and outputting the classification code for storage in association with the information.
    33. A method according to claim 32, including receiving identification information identifying the information, and submitting the classification code and identification information to a database to allow the information to be searched for and identified using the classification code.
    34. A method according to claim 32 or claim 33, wherein output classification code is for adding to the information.
    35. A method according to claim 34, wherein the information comprises information in the form of a mark-up language and the classification code is output as tagged code for adding to the information.
    <Desc/Clms Page number 23>
    36. A method according to claim 35 wherein the mark-up language comprises hypertext mark-up language and the tagged code comprises a meta tag.
    37. A method according to claim 32 or claim 33, wherein the classification code is output for storing in a logical location linked to a logical location of the information.
    38. A method according to any one of claims 32 to 37, wherein the classification code includes geographic code identifying a geographic location related to the information and content code identifying content of the information.
    39. A method according to claim 38, wherein the information comprises information on a business or entity and the geographic code identifies the geographic location of the business or entity.
    40. Apparatus for generating classification code for information, the apparatus comprising : user interface means for generating a user interface providing selectable hierarchical classifications for the information ; receiving means for receiving user selections of the hierarchical classifications ; generating means for generating classification code on the basis of the user selections; and outputting means for outputting the classification code for storage in association with the information.
    41. Apparatus according to claim 40, wherein said receiving means is adapted to receive identification information identifying the information, the apparatus including means for submitting the classification code and identification code to a database to allow the information to be searched for and identified using the classification code.
    42. Apparatus according to claim 40 or claim 41, wherein said outputting means is adapted to output the classification code for adding to the information.
    <Desc/Clms Page number 24>
    43. Apparatus according to claim 42, wherein the information comprises information in the form of a mark-up language and said outputting means is adapted to output the classification code as tagged code.
    44. Apparatus according to claim 43 wherein the mark-up language comprises hypertext mark-up language and the tagged code comprises a meta tag.
    45. Apparatus according to claim 40 or claim 41, wherein said outputting means is adapted to output the classification code for storing in a logical location linked to a logical location of the information.
    46. Apparatus according to any one of claims 40 to 45, wherein the classification
    11 A 4. 1-code includes geographic code identifying a geographic location related to the 'r. A.. 7 IAA5 r, %, I I information and content code identifying content of the information.
    47. Apparatus according to claim 46, wherein the information comprises information on a business or entity and the geographic code identifies the geographic location of the business or entity.
    48. A method of generating classification code for information, the method comprising: generating a user interface providing selectable hierarchical classifications for the information ; receiving user selections of the hierarchical classifications; outputting the received user selections; receiving classification code generated on the basis of the user selections; and storing the classification code in association with the information.
    49. A method according to claim 48, including receiving identification information identifying the information, and outputting the identification information for storage in a database to allow the information to be searched for and identified using the classification code.
    <Desc/Clms Page number 25>
    50. A method according to claim 48 or claim 49, wherein storing of the classification code comprises adding the classification code to the information.
    51. A method according to claim 50, wherein the information comprises information in the form of a mark-up language and the classification code is added to the information as tagged code.
    52. A method according to claim 51 wherein the mark-up language comprises hypertext mark-up language and the tagged code comprises a meta tag.
    53. A method according to claim 48 or claim 49, wherein the classification code is stored in a logical location linked to a logical location of the information.
    54. A method according to any one of claims 48 to 53, wherein the classification code includes geographic code identifying a geographic location related to the information and content code identifying content of the information.
    55. A method according to claim 54, wherein the information comprises information on a business or entity and the geographic code identifies the geographic location of the business or entity.
    56. Apparatus for generating classification code for information, the apparatus comprising : user interface means for generating a user interface providing selectable hierarchical classifications for the infonnation ; receiving means for receiving user selections of the hierarchical classifications; outputting means for outputting the user selections; receiving means for receiving classification code generated on the basis of the user selections; and storing means for storing the classification code in association with the information.
    <Desc/Clms Page number 26>
    57. Apparatus according to claim 56, wherein said receiving means is adapted to receive identification information identifying the information, and the outputting means is adapted to output the identification code for storage in a database to allow the information to be searched for and identified using the classification code.
    58. Apparatus according to claim 56 or claim 57, wherein said storing means is adapted to store the classification code by adding the classification code to the information.
    59. Apparatus according to claim 58, wherein the information comprises information in the form of a mark-up language and said storing means is adapted to store the classification code by adding the classification code to the information as tagged code.
    60. Apparatus according to claim 59 wherein the mark-up language comprises hypertext mark-up language and the tagged code comprises a meta tag.
    61. Apparatus according to claim 56 or claim 57, wherein said storing means is adapted to store the classification code in a logical location linked to a logical location of the information.
    62. Apparatus according to any one of claims 56 to 61, wherein the classification code includes geographic code identifying a geographic location related to the information and content code identifying content of the information.
    63. Apparatus according to claim 62, wherein the information comprises information on a business or entity and the geographic code identifies the geographic location of the business or entity.
    64. Apparatus for generating a searchable database for web sites, the apparatus comprising: a database of classification codes and universal resource locators for a plurality of web sites; and
    <Desc/Clms Page number 27>
    spider means for reading the universal resource locators for the web sites, for retrieving the hypertext mark-up language, for extracting searchable parameters, and for storing the extracted parameters in the database in association with corresponding classification codes and universal resource locators.
    65. A method of generating a searchable database for web sites using a database of classification codes and universal resource locators for a plurality of web sites, the method comprising: reading the universal resource locators for the web sites; retrieving the hypertext mark-up language; extracting searchable parameters ; and storing the extracted parameters in the database in association with corresponding classification codes and universal resource locators.
    66. An information search method for use with a database of classification code generated in accordance with any one of claims 18 to 24, the method comprising: receiving a classification code; looking up corresponding identification information in the database using the received classification code; and outputting the corresponding identification information.
    67. An information search apparatus for use with a database of classification code generated using the apparatus of any one of claims 25 to 31, the method comprising: receiving means for receiving a classification code ; means for looking up corresponding identification information in the database using the received classification code; and outputting means for outputting the corresponding identification information.
    68. Apparatus for generating classification code for information comprising: an instruction memory storing processor implementable instructions; and a processor operable to read the instructions stored in the instruction memory ;
    <Desc/Clms Page number 28>
    wherein the instructions stored in the instruction memory comprise instructions for controlling the processor to implement the method of any one of claims 1 to 8,18 to 24,32 to 39 or 48 to 55.
    69. Apparatus for generating a searchable database for web sites using a database of classification codes and universal resource locators for a plurality of web sites, the apparatus comprising : an instruction memory storing processor implementable instructions; and a processor operable to read the instructions stored in the instruction memory; wherein the instructions stored in the instruction memory comprise instructions for controlling the processor to implement the method of claim 65.
    70. Information search apparatus for use with a database of classification code A Axi I u generated in accordance with any one of claims 18 to 24, the apparatus comprising: an instruction memory storing processor implementable instructions; and a processor operable to read the instructions stored in the instruction memory; wherein the instructions stored in the instruction memory comprise instructions for controlling the processor to implement the method of claim 66.
    71. A carrier medium carrying computer readable instructions for controlling a computer to carry out the method of any one of claims 1 to 8,18 to 24,32 to 39,48 to 55,65 or 66.
GB0120823A 2001-08-28 2001-08-28 Information classification system Withdrawn GB2379290A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB0120823A GB2379290A (en) 2001-08-28 2001-08-28 Information classification system
PCT/GB2002/003830 WO2003021481A2 (en) 2001-08-28 2002-08-21 Information classification system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0120823A GB2379290A (en) 2001-08-28 2001-08-28 Information classification system

Publications (2)

Publication Number Publication Date
GB0120823D0 GB0120823D0 (en) 2001-10-17
GB2379290A true GB2379290A (en) 2003-03-05

Family

ID=9921085

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0120823A Withdrawn GB2379290A (en) 2001-08-28 2001-08-28 Information classification system

Country Status (2)

Country Link
GB (1) GB2379290A (en)
WO (1) WO2003021481A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2399198A (en) * 2003-03-03 2004-09-08 Richard Percy Classification system for accessing information by means of a compound code
US8849830B1 (en) * 2005-10-14 2014-09-30 Wal-Mart Stores, Inc. Delivering search results

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6243699B1 (en) * 1998-08-03 2001-06-05 Robert D. Fish Systems and methods of indexing and retrieving data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001027805A2 (en) * 1999-10-14 2001-04-19 360 Powered Corporation Index cards on network hosts for searching, rating, and ranking

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6243699B1 (en) * 1998-08-03 2001-06-05 Robert D. Fish Systems and methods of indexing and retrieving data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Yahoo! Suggesting Sites to Yahoo!, URLs: uk.docs.yahoo.com/info/howto/chapters/10/1.html *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2399198A (en) * 2003-03-03 2004-09-08 Richard Percy Classification system for accessing information by means of a compound code
US8849830B1 (en) * 2005-10-14 2014-09-30 Wal-Mart Stores, Inc. Delivering search results

Also Published As

Publication number Publication date
WO2003021481A3 (en) 2004-05-06
WO2003021481A2 (en) 2003-03-13
GB0120823D0 (en) 2001-10-17

Similar Documents

Publication Publication Date Title
US6311194B1 (en) System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising
US6223178B1 (en) Subscription and internet advertising via searched and updated bookmark sets
US6256623B1 (en) Network search access construct for accessing web-based search services
US7548914B2 (en) System and method for providing active tags
US9069853B2 (en) System and method of goal-oriented searching
KR101475126B1 (en) System and method of inclusion of interactive elements on a search results page
US8818995B1 (en) Search result ranking based on trust
US6848077B1 (en) Dynamically creating hyperlinks to other web documents in received world wide web documents based on text terms in the received document defined as of interest to user
US6012053A (en) Computer system with user-controlled relevance ranking of search results
US8176440B2 (en) System and method of presenting search results
USRE44794E1 (en) Method and apparatus for representing and navigating search results
US9977827B2 (en) System and methods of automatic query generation
US8510339B1 (en) Searching content using a dimensional database
US20080059454A1 (en) Search document generation and use to provide recommendations
US20030126235A1 (en) System and method for performing a search and a browse on a query
US20070067217A1 (en) System and method for selecting advertising
US20080243787A1 (en) System and method of presenting search results
US20090198675A1 (en) Methods and systems for using community defined facets or facet values in computer networks
US20070033221A1 (en) System and method for implementing a knowledge management system
US8560518B2 (en) Method and apparatus for building sales tools by mining data from websites
CN108153749A (en) Information push method, message pusher, message push system, server and mobile terminal
CA2945627A1 (en) Method and apparatus for retreiving video content
WO2009054611A1 (en) System and method for managing information map
KR20050070955A (en) Method of scientific information analysis and media that can record computer program thereof
US8131752B2 (en) Breaking documents

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)