GB2469909A - Method for updating a database - Google Patents

Method for updating a database Download PDF

Info

Publication number
GB2469909A
GB2469909A GB1006839A GB201006839A GB2469909A GB 2469909 A GB2469909 A GB 2469909A GB 1006839 A GB1006839 A GB 1006839A GB 201006839 A GB201006839 A GB 201006839A GB 2469909 A GB2469909 A GB 2469909A
Authority
GB
United Kingdom
Prior art keywords
information
related
accordance
vehicles
plurality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1006839A
Other versions
GB201006839D0 (en
Inventor
William George Imlah
Original Assignee
William George Imlah
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to GB0907188A priority Critical patent/GB2472562A/en
Priority to GB0909974A priority patent/GB0909974D0/en
Priority to GB0916296A priority patent/GB0916296D0/en
Priority to GBGB1001401.7A priority patent/GB201001401D0/en
Priority to GBGB1003037.7A priority patent/GB201003037D0/en
Application filed by William George Imlah filed Critical William George Imlah
Publication of GB201006839D0 publication Critical patent/GB201006839D0/en
Publication of GB2469909A publication Critical patent/GB2469909A/en
Application status is Withdrawn legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web

Abstract

A method for creating and maintaining a database of timely information, of a type suitable analysis of trends over time, or situations at a given point in time, in relation to a set of objects of a given class or type, such as weather patterns, statuses of people or objects, economic situations, emerging markets, population movement or growth, or other objects susceptible to review or analysis and for the purposes of identifying social, geophysical, economic or climate trends, or for other purposes, said method comprising: making a http request to a website for a web page using a first URL; receiving a response to said request comprising a relocation status code and a second URL indicative of a location at which said page may be requested; accessing information dependent on said response, where said information is dependent on said website; identifying a data value indicative of content available in said web page requested in said request, where the data value identified is dependent on said information; updating said database using said data value.

Description

DESCRIPTION

TECHNICAL FIELD OF THE INVENTION

The invention described below relates to apparatus and methods for efficenfly creating and maintaining a database of up-to-date information available from a plurality of sources.

BACKGROUND OF THE INVENTION

Maintaining a database of up-to-date information from a variety of sources can be a time consuming process. For example a database of professional practitioners, such as wiU be described in the embodiment, may be kept up to date by an organization who wishes to make sure that the people on their list are properly registered or accredited with the appropriate bodies. This may involve individuals in that organization regularly checking the web sites of registering or accrediting organizations to ensure that the individuals on its database continue to maintain proper, valid and up to date membership. For example the website http://www.counsellor-d irectory.co.u k I sts about six thousand practitioners of counselling and psychotherapy in the UK and Ireland, and for each of these UKIDCP carries out checks on over one hundred relevant accrediting or membership organizations for evidence of each practitioner's qualifications and professional status, by checking the website of the relevant organizations. In the prior art, such checks typically involve an operator looking up a web page on the site of a relevant organization, identifying details confirming the professional status of the practitioner, and updating the database by recording the current date, i.e. the latest date on which the practitioner's details were found or re-found, as well as the URL or other information needed to support the re-checking of these details on a regular basis or, in the case where the details have been removed from the relevant organisation's website, recording the date on which evidence of the practitioner's continuing professional status became unavailable. However, in order to keep thousands of such records up to date on a day, or even weekly basis, a substantial number of checks need to be done -in the above example, up to thirty thousand per day for any one websfte to keep the database up-to-date on a daily basis, and up to four thousand per website per day to keep the database up-to-date on a weekly basis. This can be very time-consuming.

Moreover, since the process of making thousands of such lookups to relevant organisations' websites is identifiable by such organizations as an unusual and resource-intensive use of their websites, such activity can be perceived by them as undesirable, and may result in the operator finding that their requests to such a website have been blocked by the relevant organization, who may see allowing such usages as constituting an unacceptable overhead, or even misinterpret it as a sign of malicious activity such as may be found in the lead-up to, or during, a Denial of Service (DoS) attack. In such cases the operator may find their requests either completely blocked by the website, or time-rationed. Time-rationing of website resources is well known in the prior art. Examples of time rationed resources can be found at: http://www.ip-adress.com/iptracer/2O3.192.74.47 which provides a utility to trace the geographical origin of IP addresses, and which, after a second request has been submitted the same day, displays a message to say that a limit of one lookup per day has been reached; and the postcode lookup service http://postcode.royalmail.com/portal/rmtpostcodefinder which prohibits the operator from looking up further postcodes once a daily limit of fifteen lookups has been reached.

SUMMARY OF THE INVENTION

A purpose of the invention is to provide a solution to the above problems by providing a method of creating and maintaining a database of timely information that significantly reduces the overheads involved, by reducing the time and the resources used to identify timely changes to the data to be stored on the database. A second purpose of the invention is to avoid the triggering of mechanisms to block or ration access to websites, by reducing the overheads normally necessary to access and retrieve relevant data from such sites.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of this invention will be readily apparent from the description below and the appended drawings, which are meant to illustrate and not to limit the invention, and in which: FIG. I is a diagram showing the steps to be carried out in one embodiment of the present invention.

FIG 2 describes the processing to be carried out for each practitioner (or other type of individual or object for further embodiments described herein).

FIG 3 describes the processing to be carried out for checking the details of a particu'ar practitioner (or other type of individual or object for further embodiments described herein) on a particular website.

FIG. 4 is a block diagram showing the components in one embodiment of the present invention.

FIG. 5 is a flowchart illustrating in the operation of the selection component 420.

FIG. 6 is a flowchart illustrating the operation of the weighting component 430.

FIGS 7, 8 and 9 are flowcharts illustrating the operation of the data collection component 425.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

For the purposes of clarity, the description of the embodiment uses an example that creates and maintains a database of information related to practitioners of counselling and psychotherapy, but it should be understood that the invention covers the efficient creation and maintenance of databases of information in a timely way which can be used for a variety of purposes such as to analyse over time trends which are economic, social, geophysical or climate related, or for any other purpose. To help explicate this, alternative embodiments are described or referred to.

The preferred embodiment will be described in terms of a system for creating and maintaining a database related to individual practitioners of counselling and psychotherapy for whom details of professiona' qualification, membership, accreditation and status may be found on a plurality of web pages related to recognised organizations in the field of counselling and psychotherapy.

Referring now to the drawings, FIG. I shows the steps to carry out in the process of maintaining the database of practitioner details. In the drawings the term "object" is used, which for the purpose of this embodiment should be interpreted as meaning "practitioner". At step 110 a review is made of the database of practitioner details, hereafter referred to in the description of this embodiment as the practitioner database. In addition to practitioner contact information such as first name, surname and phone number, the database lists, for each practitioner a plurality of URLs identifying pages on organization websites where information may be found to identify the practitioner's professional status in terms of registration, accreditation or quaHfications in relation to a recognised organization in the field of counselling and psychotherapy. For each practitioner, each of the pkiraty of URLs is stored in a record identifying the practitioner, the said URL, hereafter referred to in the description of this embodiment as a practitioner-URL, the name of the organization listed as accrediting, registering or conferring relevant quafications on the practitioner, and a date field specifying the date on which these details were most recently retrieved. At step 10, the list of practitioners and URLs is reviewed and a list collated of the practitioner records and practitioner-URL records, for those practitioners listed on the database for which there does not exist a practitioner-URL record for which the date field specifying the date on which the details were last retrieved indicates a date of less than seven days ago.

Should the list be empty, no more processing need be done. However, if, at step there is at least one practitioner listed on the list, the listed practitioner is removed from the list, and the practitioner's details are processed at step 140 in accordance with Figures 2 and 3. At step 120 a check is made as to whether there are any more practitioners listed, and the process is repeated for each accordingly. At the end of the process, the database of practitioners is thereby updated with any available more up-to-date information related to the professional status of the practitioners listed thereon. In order to indicate the advantageous aspects of the invention as described in this embodiment, step 140 is described in greater detail in Figure 2, which shows the process to be carried out for each practitioner on the above list.

Referring to Figure 2, at step 220 a check is made as to whether a practitioner-URL is listed for the practitioner. If the list of practitioner-URLs is non-empty, at step 240 a http HEAD request is made using the URL, to the website referenced in the URL. On receiving a http response from the server to which the request was made, the response is processed at step 260, after which step 220 is again carned out. The processing at step 260 s further descnbed in Figure 3, which outhnes the processing to be carried out for each URL which is so checked.

Referring to Figure 3, at step 310 the Response InterpretatIon Database, hereafter referred to in the description of this embodiment as the RI database, is searched, this being a database storing practitioner-URL5 or templates for sets of practitioner-URLs for which the system can make a context-dependent interpretation of the response to the HEAD request that wifi yield information about the practitioner which is normaUy returned in the corresponding web page, should the corresponding web page be available (eg. by a GET or POST request, possibly redirected). By "context dependent" it should be understood that the interpretation is not one defined in general for the type of response returned, but is specific to a website, web page or set of web pages only, being an inapplicable interpretation for similar or identical http responses from a different website. For each such URL or URL template, the database stores information on particular responses that can be made by the website server and which are specific to that websfte and URL rather than the general, non-site specific interpretation which is defined in the definition of the protocols for http. It should be noted that for the purposes of the invention it is not necessary that a HEAD request be made. For examp'e, a GET or POST request may also result in a response indicating further action to be taken to complete the request, such as a response indicating redirection.

The RI Database stores a set of responses and their non-standard interpretations.

In this embodiment, the RI Database contains a table in the fo'lowing form: RI Database Table Schema: URLtemrlate: a template corresponding to a URL with N parameterised slots in it, encoded in the form <ITEMi> where ITEMi is the name of a variable part of the URL of the form ITEM1, ITEM2 and so on corresponding to the first, second (and so on) variable part of the URL, where N s zero or more.

URLitems: a list of N items corresponding to data item names on the practitioner database.

ResionseStatus: One of the set of possib'e response codes that could be returned from a http request to the particular website which matches the URL in URLtemplate and for which a non-standard, website-specific interpretation is defined in Responselnterpretation.

ResponseDataTemlate: a template corresponding to a URL returned with a redirection response (eg. 302, 303, 307) with M slots in it, encoded in the form RETURNi> where RETURNi is the name of a parameterized (variable) part of a URL of the form RETURN1, RETURN2 and so on corresponding to the first, second (and so on) variable part of the URL, where M is zero or more.

ResponseDataltems: a list of M items corresponding to data items existing on the database or to be created on the database.

Responseinterpretation: Instructions on how to update the practitioner database comprising (i) a description of how the URLitems map to search terms to identify a record in the practitioner database, and (ii) a description of how the ResponseDataltems should be used to update the so-identified record.

The RI Database contains the following records: RI Database Table Record 1: URLtemplate: http://ukcounsellorregistration.org.uk/find?id=<ITEMI> URLitems: practitioneriD ResponseStatus: 301 ResponseDataTemplate: <any> ResponseData Items: Responselnterpretation: status.practitonerID(< ITEM 1>) "registered" RI Database Record 2: URLtemplate: http:// ukcounse orregistration.org.uk ifind?id=<ITEMI> URLitems: practitionerlD ResponseStatus: 302 ResponseDataTemplate: <ANY> ResponseData Items: Responselnterpretation: status.practitionerlD(< ITEM 1>) lapsed" RI Database Record 3: U RLtemplate: http:/fu kcounseUorregistration.org.u kffind?d=< ITEM1> URLitems: practitioneriD ResponseStatus: 307 ResponseDataTemplate: ANY> Response Data Items: Responselnterpretation: status.practitionerlD(< ITEM 1>) "retired" In this example, a status code of either of 301, 302 or 307 results in a successful lookup of the RI Database and processing can proceed to step 330, to update the records for the practitioner to indicate that the practitioner is either still registered, has a lapsed registration or has retired, respectively, noting on the record the date and time at which the http request was returned which al'owed this deduction to be made about the practitioner. It should be noted that the determination of the meaning of the response is website-specific, rather than being a general interpretation applicable regardless of which website or web page is being accessed. The term "website-specific" should be understood to mean that the determination of the meaning of the response s specific to the particular website returning the response, and s specific to the particular page or pages on that website that match the URL templates found in the RI database, and s an interpretation not defined in an existing generic protocol which provides standard ways of interpreting response messages regardless of the particular web servers, IP address or domain name involved in the http request or response. By "specific to the particular website" it should be understood that a given type of interpretation is not necessarily unique to a particular page or a particular website, but that a particular interpretation is defined for each RI database record which is defined for a particular URL request, or template for a URL request, and matching response.

In this way, a large proportion of the accesses which would normally have to be made to this website to return web pages with content indicating the above information about the practitioners can be avoided. Instead the relevant parts of the content of the given web pages are deduced using the interpretation of the response as defined in the Responseinterpretation field of the RI Database record which matches the response information.

By avoiding the requesting and downloading of the entire web page in this way, for such web pages, the load on the website server is greatly reduced, and the response time for each request is decreased. This allows quicker processing of each practitioner's details, in these cases, as well as reducing the size of the "footprint" that the is made in accessing the given website, since interpreting the information from the given responses, which involve the transfer of a relatively small amount of data, allows the avoidance of requests which would otherwise result in the website server processing and sending much greater quantities of web page data, and sending many more web pages, with a concomitant significant increase in load on the server and on the network of which the server is a part. Thus the invention greatly speeds up this process, reduces the resources required, and helps avoid the danger of the operator of the invention finding access to the required data source being restricted or removed. Where the websftes involved place automatic restrictions on the number of pages actuaUy delivered in responses to the same IP address or agent, the invention may side-step complete'y any mechanisms which would otherwise prevent the retrieval of information in relation to a arge number of records in a given time period.

In the above examples the practitioner database is automatically updated in accordance with these interpretations so that the record with practitioner ID <ITEMI> would be updated so that the field on the record called "status" was set to the value "registered", "lapsed" or retired" respectively, dependent on the matched response, and the field on the practitioner record that indicated when the information in the status field was last updated would be set to the date and time when the matched response was received from the website server for the given request.

The invention is particulady useful in the case where the information that would have been returned in the page content, but which is instead deduced using a non-standard, non-general, website-page specific interpretation of the http response, would have been in a form that is easily human readable but not readily machine readable, for instance in the form of graphical images of text, icons or images mixing text and image such as a photo of the practitioner with an indication of the practitioner's accreditation status embedded in the image in the form of text, icon or other visual marker, or where text which is readable to a human being is shown in a normal browser by the assembling of fragments of text by javascript routines in a way which obscures the meaning of the message until the javascript code interacts with page layout routines in the browser to display contiguous text, possibly originally encoded for non-readability.

For the purposes of this example, ft should be noted that the domain name ukcounsellorregistration.org.uk is not, at the time of writing, a registered domain name with Nominet (the Internet registry for.uk domain names) and that this domain name is used here solely as a hypothetical website domain name for explanation purposes only. Nominet is a registered trademark of Nominet UK.

A second embodiment is described below, which also has the advantage of creating a reduction in processing and network resources required to carry out the process described, by using the invention, and also thus reduces the risk of being blocked or restricted by the websites concerned.

The second embodiment is a system that creates a database by retrieving information from websites on the internet which indicate the location of vehicles fitted with tracking devices. The embodiment collects such information from a number of such websites and creates a database of vehicle records, hereafter referred to in the description of this embodiment as the vehicle database, indicating the current position of each vehicle, or information related to the status of such information or of the vehicle itself. In the drawings the term "object" is used, which for the purpose of this second embodiment should be interpreted as meaning "vehicle". Differences from the above-described embodiment for practitioner details are described below.

At a first timepoint, the system, at step 110, firsts scans the vehicle database for vehicles for which the status or location information is more than one hour old.

Having created a list of such vehicles, the system automatically carries out the process in accordance with Figures 1, 2 and 3. For the purposes of this example, the vehicle database is assumed to contain a record that was created for a vehicle with ID 1001, and that a vehicle-URL record exists which can be used to retrieve information and update the database in relation to that vehicle, of the form http://locator123.org.uklfnd?d=1001'. At step 240 the system makes a HEAD request to the corresponding website server using the URL so-stored on the database for that vehicle ID.

At step 250, the system retrieves a response from the website server and at step 310 searches the RI database to find a record with a value of URLtempate that matches the above URL and with a value of ResponseStatus that matches the http response code returned by the web server. If a value is defined in the RI database record for ResponseDatalemplate then this is also matched. It should be noted that for some appUcations a value for ResponseDataTemplate may not be needed (does not encode any non-standard nformation to be used to maintain the vehicle database) and so for some embodiments, a match on ResponseDataTemplate wilt not be part of the processing.

The RI Database in this example contains the foUowing records: RI Database Table Record 1: U RLtemplate: http://Iocatorl 23.org.u k/find?id=< ITEM 1> URLitems: vehiclelD ResponseStatus: 302 ResponseDataTemplate: <any> ResponseData Items: Responselnterpretation: status.veh iclel D(z ITEM 1>) = "uncontactable" RI Database Record 2: U RLtemplate: http://Iocatorl 23 org.u klfind?id=< ITEM 1> URLitems: vehiclelD ResponseStatus: 301 ResponseDataTemplate: <ANY> ResponseDataltems: Responselnterpretation: status.vehiclelD('ITEM1>) = "decomissioned" RI Database Record 3: U RLtem plate: http://Iocatorl 23org u k/report? d=< ITEM 1> URLitems: vehidelD ResponseStatus: 307 ResponseDataTemplate: http://locatorl 23.org.ukfreport?id=<ITEM1 >&x=<ITEM2>&y=<ITEM3> ResponseData Items: veh iclelD, , Ycoord Responsetnterpretation: location.veh del D(< ITEM 1>) = (zlTEM2, ITEM3) In this example, the vehicle database is derived from pages from a plurality of websites, including the website http://locatorl23.org.uk, and including a vehicle identified on that site as 1001, and has a set of records for that vehicle indicating locations for it at various times. In order to bring up to date the database in relation to vehide 1001, the system carries out the process detailed in FIG 1-3 as described above.

Similarly, at a second timepoint, the same request to the website server for vehicle 1001 returns a response with status code 301 In this instance, the RI record matched is record 2, and so at step 330 the the vehicle database is updated by adding a record to a vehicle location record table for vehicle 1001 with a location-status field of "decomissioned", a vehicle-id fie'd of 1001 and timestamp indicating the time and date of the second timepoint, indicating that this vehicle was determined at that timepoint to have been decommissioned and is no longer available for tracking. Thus the database of vehicles is updated, without the necessity of retrieving the corresponding web page which would have served up the information about the decomissioning of the vehicle. The processor s now able, on checking the location-status field of the corresponding vehicle record, to ascertain that ft need not make any further requests over time to the websfte in relation to this vehicle.

Alternatively, the website server may have returned a status code of 307, indicating that the requesting client should make a further request to a different URL, and it may also in this case have returned the URL http://locatorl 23.0mg.u k/report? id-i 001 &x3423423&y=24349233 indicating that the page on the server containing the information about the vehicle's location should instead be retrieved using the returned URL. In this case, the processor matches the request URL and response data to Record 3 on the RI Database, and yield the following interpretation of the http response: location.vehiclel D(1 001) = (3423423,24349233) indicating that the a record for vehicle-location should be created in the vehicle database with a vehiclelD field with value 1001', a value for the vehicle-location field of (3423423,24349233)' representing a known location, and with a time-stamp field recording the time at which the http request response was received by processor. Thus the database of vehicles is once again updated at step 330, without the necessity of retrieving the corresponding web page which would have served up the information about the locatability of the vehicle.

In the case where this data is to be stored as real-time or almost-real time (only a few seconds or minutes old) the invention thus drastically reduces the overheads needed to create and maintain a database for which a large number of records need to be intensively updated, and can avoid the triggering of automatic restraints on the volume of data that can be requested by a given agent, or from a single IP address, in a given penod of time, by reducing such overheads.

n the case where no match is found, processing proceeds to step 350 and the process retrieves and processes the page contents and updates the database according'y. n one embodiment, where there is no match on the R Database, and the response indicates that a page is not available to retrieve, or an attempt to retrieve the page fails, the record for that vehicle is updated by recording the status code of the unsuccessful attempt at page-requesting, and a timestamp to indicate when the unsuccessful attempt took place.

In a third embodiment, the invention is used to maintain a database of information about used cars. For the purposes of clarity, the description of the third embodiment uses an example that creates and maintains a database of information related to vehicles running on both fossil fuels and renewable sources, over time, but it should be understood that the invention covers the efficient creation and maintenance of databases of information in a timely way which can be used to analyse over time trends which are economic, social, geophysical or climate related, or for any other purpose. To help exp'icate this, alternative versions of the third embodiment are described or referred to.

In the third embodiment, the word object' is used throughout to describe any specific instance of physical or abstract object for which information is available about the object or the context in which it exists or is available. Likewise the phrase Class-exemplar' is used throughout to describe an exemplar of a class of objects under consideration for which trends are desired to be identified. The word object' may be used to refer to anything for which it may be possible to evaluate as part of, or not part of, a class represented by a Class-exemplar, for examp'e, in the case of vehicles, whether the object runs on fossil fuels or is powered from renewable energy sources.

Likewise the word 9tem' s used to describe the object under consideration for which an assessment of attributes related to that object is to be made using statistical information Both the words item' and object' may be used to refer to anything for which it may be possible to assess or estimate using statistical inforamtion a characteristic or data value, or in the case of objects for which information about condition, context or other circumstances includes hypothetical information about the object (such as the hypothesised condition and circumstances of lost or damaged items had they not been lost or damaged) to use information about or assess a quantitative or qualitative value of an attribute of said objects or items in relation to the hypothetical circumstances or condition, where the hypothesized condition is dependent on aspects of the object not discernable from a general product description (such as mileage or date of purchase), or other type of usage information not listed here. The terms "usage information", "usage-related information" and "information about usage" are used throughout the description of the third embodiment to mean any information related to the usage or history of an object, for example mileage or information related to wear and tear, information about pre-existing damage, and information indicative of the likelihood or otherwise of corrosion or indicative of the likelihood or otherwise of any other type of deterioration of material or components. In addition the words item' and object' may refer to a set of distinct objects that form a logical unit, in the same way that a "dinner set", or "football team" are each made up of a distinct items of crockery or a distinct set of individuals, but can be perceived and assessed as a unit with sub-parts in order to make assessments or evaluations of the overall value or worth of the unit by assessing either its attributes as a unit or the attributes of one or more of its sub-parts, or both.

In the description of the third embodiment the word "attribute" is used throughout to describe any characteristic of an object which may be relevant to the assessment of its inclusion or exclusion from a particular Class such as: physical characteristics and other inherent characteristics of the object; characteristics of the context or circumstances related to the object (for example, its physical location).

The third embodiment will be described in terms of a system for creating and maintaining a database related to individual motor vehicles running on fossil fuels, renewable energy sources, or both (in the case of hybrid vehicles) advertised on the internet on a plurality of web pages.

Referring now to the drawings, FIG. 4 shows a block diagram of the components of the system in the third embodiment: An input component 410 receives Class-exemplar information descriptive of the attributes of the Class-exemplar which is the subject of assessment. In the third embodiment such attributes may include features that might be expected to be relevant to the grouping of vehicles into sets suitable for the analysis of trends, with characteristics such as make, model, body type, fuel category (e.g. renewable, fossil, hybrid) and type (e.g. petrol, diesel, electric), and engine size, and those features of the vehicle which are likely to be relevant to the classification of vehicles such as, but not limited to, recorded mileage, year of manufacture, registration period (e.g., 03, T, V, 55). In addition, features relevant to the identifying social, geographical or economic trends such as the location or post code identified for the vehicle for which information is being gathered may be included in the information provided to the Class-exemplar input component 410.

The Class-exemplar information is then passed to a selection component 420 which uses the information to identify matching records from a database of object records 435, hereafter referred to in the description of this embodiment as the object record database, where the recorded nformation includes attributes of specific used vehides currently or recently for sale. The comparison component 420 selects a set of such database records by matching the values of attributes of those vehicles in the object record database with one or more of the attributes provided to the Class-exemplar information component 410.

For some attributes the match may be required to be exact, for example, where the Class-exemplar has value "Ford" for attribute "make" and value "Focus" for attribute "model", only object records with those same values of make and model attributes should be matched. "Ford" and "Ford Focus" are registered trademarks of Ford Motor Company. n the third embodiment, an exact match is required for make, model, body type, transmission type, number of doors, subtype, fuel category and type and either of calendar year or registration year, except that no distinction is made between upper and lower case letters.

For other attributes there may be considered to be a match if the value of the attribute on the database record is within a certain range defined in the informaton for the Class-exemplar being analysed. An example of such an attribute is mileage, for which records which fall within a certain range of values might be considered to be in correspondence with a certain Class-exemplar, for example, vehicles whose recorded mileage is not more than 20% lower and not more than 3% higher than the recorded mileage of the Class-exemplar being analysed. In this way a subset of vehicle records from the object record database can be identified that are likely to be acceptable as representatives of the Class-exemplar for which the trend is being analysed For the set of vehicles which are considered to be a match, an ordering is calculated using an ordering function. The ordering thus calculated may give two or more vehicles equal ranking, in which case an arbitrary order is used for those two or more vehicles in relation to each other. In the third embodiment the ordering function is a weighting function whereby the values used to indicate the order of objects are also indicative of weightings for the objects.

t should be clarified that in the third embodiment a lower weighting is indicative of the suitability of the corresponding object as a member of the given Class-exemplar, and a higher weighting is indicative of less suitability. For example, a "mileage" attribute for the object which is significantly higher or lower will increase the weighting and therefore decrease its suitability as an exemplar of that Class.

An inverse relationship is not necessary for the practicing of the invention, but is important to remember for the correct understanding of the description that follows.

In the third embodiment, the proximity of the location related to the Class-exemplar selected for statistical analysis to the location for an object for which a weighting is being processed is calculated using the postcode information for the Class-exemplar and the object. Methods of calculating distances between postcodes based on location are well known.

In the third embodiment some attributes which are not used by the selection component 420 in the process of identifying matching records may be used by the weighting component 430 during the process of calculating the weighting of the object in relation to the Class-exemplar selected for analysis.

The processing of weighting component 430 in the third embodiment will be described in more detail below, but may be configured to decrease the weighting where the comparison is considered favourable (for instance, decreasing the weighting by 25% where the recorded mileage of the object being assessed is not more than 0.5% higher than that of the Class-exemplar) and to increase the weighting where the comparison is considered unfavourable (for instance, where the distance between the postcode of the C'ass-exemplar, and the recorded location in the object record is ten miles or more).

In addition to the weighting information ca'culated by weighting component 430, in one embodiment the system may also calculate summary information such as statistical information about the set of vehicles selected as representatives of the given Class or as meeting some criteria of similarity to the Class-exemplar. In another embodiment, the output component displays other information indicative of political, financial or social trends, such as the fall or rise in percentage of vehides running on fossil fuels or non-fossil fuels over a period of time, or the correlation between the fall or rise in the number of luxury cars being sold and economic indices over the same period.

For the purpose of evaluating or "sanity checking" the effectiveness of the system to create coherent and sensible sets of objects corresponding to the particular Class intended by the Class-exemplar description input to the system, the third embodiment provides a display function so that those who are analyzing the trends related to the Class in question can ensure that the groupings are valid and sound in relation to any interpretations of the "common-sense description" of the class related to the set of objects identified using a Class-exemplar. In the third embodiment information about the objects identified by results component 440 is displayed by presentation component 450 via a browser web page. This facilitates the early identification of hypotheses in relation to emerging trends, by providing a visual representation of up-to-date information. In one embodiment this up-to-date information is graphed and compared with information of similarly-identified groups of objects corresponding to the same Class-exemp'ar but at different time points, for example showing weekly trends. To ensure that any emerging trends can be identified and assessed in a timely way, in one embodiment, as part of the process of assessng the Class-exemplar, a data coUecton component checks, for each vehide in the set of vehicles, whether the advertisement for that vehicle is accessible and up-to-date, by attempting to retrieve it and by modifying the weightings or removing the vehicle for the set, according to the current information available from the website hosting the advertisement, and the browser requests this information from the application via an AJAX mechanism so that it can be updated while the user is displaying, reviewing and refining the selection of vehicles. n another embodiment, on receiving information from the information component, the system initiates the data collection component to make retrieval requests specific to the Class-exemplar under consideration, e.g. search requests for vehicles similar to the Class-exemplar, identifies and retrieves advertisements for vehicles similar to the Class-exemplar, and updates the object record store to ensure that the subsequent processing uses information, and displays results to the user, that are as timely as possible. In another embodiment, the link provided to allow viewing of an advertisement for a used vehicle is in the form of a URL that identifies a representation of the web page. The term "representation of a web page" may refer to the page as it is directly retrieved from the website hosting the web page, to a cached copy of the web page which was retrieved from the website at a time point prior to the display of the link and which may or may not be modified to enable the page to be displayable in a form as close as possible to the original, or to a page served up by a proxy server which may or may not cache copies of the web page. In this case when the person reviewing the trend-related information or object information clicks on a link thus related to a representation of the web page in order to view a a particular advertisement related to the set of results, the data collection component first makes one or more http requests related to the given page to the website that hosts that page, to check whether it is accessible and whether a detail identified as significant has changed since the page was last retrieved. In the case that ft is inaccessible, the system instead serves up to the user its cached version of the web page. In the case that it is accessible and has not has changed, the user is served up a current version of the web page. In this way, the user can determine information about a wider set of source data than may be currently available publicly on the given websites. In the case that the page is accessible but key information has changed, in the third embodiment key information being any information about the vehicle currently being displayed to the user in conjunction with the said link, the system is configured to provide concurrently with the current advertisement a message indicating that the page has recently changed and indicating the nature of the change. In this way the person making the analysis can have confidence in the data provided by the system despite apparent inconsistencies between the stored information and the current version of the advertisement.

THE DATA COLLECTION COMPONENT

In the third embodiment, the data stored on the object record database 435 is collected by via the retrieval of the pages on vehicle sales websites and automatic extraction of the information available in the content or markup of said web pages.

Techniques and tools to automatically retrieve such information are well known.

In one embodiment the data is collected using routines specific to the given site using the CPAN LWP and CPAN XML modules which are available for download via the site http://search.cpan.org/ In the third embodiment, for each website to be searched, a parameterized template is stored for the URLs of search results on that website, along with information about how to instantiate the template to retrieve the corresponding search pages. For example, a template of the form might be stored along with corresponding page range 1:120 to ndcate that the parameter %s should be replaces by the numbers 1 through to 620 consecutively to yeld a set of URLs corresponding to the search pages to be retrieved. tn the case where search result pages are retrieved in this way, URLs are extracted from each of the retrieved search pages by identifying the relevant URLs in the page either by form (eg. searching for URLs with the substring "&AdverUd=" indicating that the URL identifies a page advertising a used vehicle on the website) or by context (eg. by recognizing that the URL that follows the markup substring class="advert" identifies a page advertising a used vehicle on the website).

In the third embodiment the data collection component queries the websites daily to determine whether any of the know pages have expired or been withdrawn, making any re-retrievals if necessary, and records this information on the database, updating specific object attributes if necessary. Since there may be up to a million of such advertisement pages to re-check in this way daily, the optimization adopted in this embodiment is to identify the record on the database corresponding to that URL (if previously retrieved) and to update that record to indicate that the page has been re-found at the time that the search results page containing the reference URL identifying the page was found, as described above.

In this way, the majority of re-checking for the continuing presence of advertisement pages on a website can be done on a daily basis without the need to make any requests or retrievals directly related to the said advertisement pages.

In one embodiment, the websites from which information is retrieved uniquely identify each object advertisement page via an identifier which is indicative of the order in which the advertisements on the site have been processed on the website concerned (for example, an integer value that is incremented to create an id for each new advertisement) and thus there may be an indication, or rough indication of the temporal order in which advertisements have been created or published on the website. In this case, the identifier can be used as an attribute value that helps define the temporal ordering of selected object records, or used in conjunction with information about dates or times that the object was first advertised to help identify a temporal ordering or partial temporal ordering in relation to objects for which information was identified from different websites, for example by identifying two identifiers from different websites where the object identified in each case shares a common date as the value of the attribute date first advertised".

In another embodiment, a standard search results page is retrieved from the public pages on the site which indicates the most recent advertisements on the site and, by identifying links on this page, the URLs to corresponding advertisement pages of objects and the identifiers used on that website for each of those object advertisement pages are extracted, for example by identifying strings that occur in the search page only in URLs for advertisement pages, and by identifying strings in said URLs that identify an advertisement id within the URL.

The data collection component then attempts to retrieve pages using a plurality of such identifiers, by utilising a template URL to create a page link specific to a given identifier. In that embodiment, the data coection component first generates a URL corresponding to the highest identifier integer thus discovered, and then attempts to retrieve the corresponding web page for each identifier value in the range between the thus-discovered identifier and the closest value of identifier already existing on the database of retrieved data by successively decrementing the identifier integer, using URLs generated from templates using those particular identifiers. For example, a template of the form "http:l/abcdeautos.co.uk/vehicles.pl?id=%s" may be converted to a URL for a website advertising vehicles by replacing the string "%s" by a particular advertisement id "987123", or hypothesised id in the case where the page access made speculativ&y, to give the URL "http://abcdeautos.co.uk/vehicles.pl?id9871 23". In another embodiment the data coection component attempts to retrieve pages corresponding to specific but not yet encountered identifiers, for example to fill "gaps" in its set of records, or to investigate whether new pages may be available that have not been identified on the site in other ways (by speculatively making page requests using URLs encoding identifiers in a range close to values of known identifiers, where the request is likely to succeed if such a page exists).

In an embodiment where web pages are retrieved, the data collection routine scans or parses the page to identify regularities in the page structure or content that are indicative of information that can be extracted. For example, one such website may include in its pages a URL with argument information that encodes the identifier and attributes of a particular object. For example the relative URL "/page.php?id= 123894231 232&year=2002&fuel=Petrol&trans=Automatic&drs=5" may be used to identify that the vehicle identified on that site as 12389423123 has attribute values of "2002" for year, "Petrol" for fuel type, "Automatic" for transmission and has five doors, for example by parsing the URL and extracting parameter values for particular known parameter names.

In another example, the presence of a particular string of characters such as "<li>Engine Size<" may indicate that in proximity to that string may exist information about engine size, for example as the content of the immediately following HML element or as the content on the line immediately following the identified line in the file caching the web page.

In another example, the form of the data may indicate its type, for example, a string of characters of the form ">[number] miles<" may indicate that [number] s representative of the meage of that vehicle (where [number] in the previously-mentioned string represents the occurrence of a particular number, for example ">12,300 miles<").

n another example, the values of one attribute or set of attributes on the page may indicate the meaning of other attributes. For example, the determination that the make of the object is Ford" and the mode' is "Focus" be used in a context-dependent rule of the form: ii the make and model are known to be "Ford" and "Focus" respectively, then the occurrence of a substring "Ford Focus LX" in the <title> element of the markup on the advertisement page indicates that the vehicle has a subtype attribute value of "LX".

In another example, the occurrence of a known value of a known attribute can be ascertained from its form or context. For example, the existence of the character string "<li>Red</li>" may be used on certain contexts (such as the web page for the vehicle on a particu'ar website) to ascertain that the value of the attribute "colour" for this object is "Red".

The resource-saving mechanism in the third embodiment is based again on the observation that when a page is requested from websites using a specific URL, in some cases the website server replies to the request with a response which does not serve up the desired web page, but instead replies with a status code indicating a further action to be taken by the requestor in order to complete the requested page retrieval. An example of such a response is the return of the http status codes 302, 303 and 307, aU of which formally encode different types of condition under which the requestor is given an alternative URL to use to request the given page, and http status code 100, indicating that the server to which the request was made has received the request headers, and that the client making the request should proceed to send the request body.

Such codes have formal deflnftions, following agreed or de facto standards, which indicate, for example whether the redirection of the request is a temporary condition, or is permanent. However, as well as formally encoding information such as "temporarily moved" or "permanently moved", such return information may also be decoded in relation to particular websites and their usage of such responses to yield information that is not part of the formal or de facto standard.

The invention exploits this ability to encode information in non-recognised ways (in effect subverting' the meaning or interpretation of the response to contain information not formally recognised in the protocol), in relation to http status codes and responses, and thereby return information that allows a requestor to deduce certain pieces of information, based on the context in which the request is made and the response given in that context, in relation to entities described or detailed on the page requested.

Referring to FIG. 9, at step 910 the Data Collection Component makes a http HEAD request using a URL and, at 920, receives a response to that HEAD request from the web server corresponding to that URL. The Data Collection Component then searches a Response Interpretation Database at step 930, this being a database storing URLs and templates for sets of URL5 for which the system can make a context-dependent interpretation of the response that will yield information about information normally returned in the corresponding web page should the corresponding web page be returned (eg. by a GET or POST request, possibly redirected). For each such URL or URL template, the database stores information on particular responses that can be made by the server, and an interpretation that can be made for that particular website and URL. It should be noted that for the purposes of the invention it is not necessary that a HEAD request be made. For example, a GET or POST request may also result in a response indicating further action to be taken to complete the request, such as a response indicating redirection.

In the case where no match is found, step 950 is bypassed and the Data Collection Component either retrieves and processes the page contents (if the response indicates that the page is avaiabe), or updates the database record by recording the status code of the unsuccessful attempt at page-requesting, and a timestamp to indicate when the unsuccessful attempt took place.

THE SELECTION COMPONENT

In the third embodiment, the selection component 420 identifies from the information available for C'ass-exemplar being assessed, the information relevant to the selection of a subset of object records from the object record database.

The processing for selection component 420 is given in flowchart form in FIG. 5.

The selection component utilizes a comparison function to extract records from the object record database 435 which match the information about the Class-exemp'ar as follows.

An exact match is required in the comparison at step 570 for the following: make model body type transmission type number of doors fue' type subtype either of calendar year or registration year An approximate match s required in the comparison at step 570 as follows: Engine size: within 100 cc of the Class-exemp'ar engine size Postcode: proximity calculation yields a value less than 15 mes Mileage: must be within 5% of the Class-exemp'ar mileage From the records avai'able in the object record database, the selection component records at step 590 information indicative of the set of records that match the above specification. In the third embodiment the set of records thus identified is passed at step 5100 to the data collection component which makes any necessary updates to the set of records in accordance with the processing described in figures 7 and 8, before the updated set of records generated is passed to the weighting component.

THE WEIGHTING COMPONENT

In the third embodiment, the weighting component 430 accesses the information provided by the selection component 420 and calculates a weighting va'ue for each of the objects in the set indicated by the selection component.

The processing for weighting component 430 is given in flowchart form in FIG 6.

The following a'gorithm is used at step 620 to calculate the weighting for each object.

1. Set the weighting to the recorded selling price of the vehicle.

2. If the proximity value is greater than 10.0, multiply the weighting by 1.05 3. If the mileage is over 3% higher than the Class-exemplar mileage, multiply the weighting by 1.5 4. If the paint type on both the object and C'ass-exemplar records is "metallic" multiply the weighting by 0.95 5. If the meage s over 4% less than the Class-exemplar meage then multiply the weighting by 0.98 6. If the advertisement for this object was placed more than six weeks ago, multiply the weighting by 1.2 At step 630 a further test is carried out for the purposes of filtering out ow-grade results causes by the existence of unreliable object data, by rejecting any weighting less than 25 or greater than 500,000.

If test 630 is successful the information about the weighting of the object is stored at step 640.

The weighting component 430 then makes the information about the weighting of all the objects for which the weighting was in range available at step 650 to the results component 440.

In this embodiment the weighting function and the comparison function are described separately, although it will be recognized that the invention can also be practiced in a configuration that combines the weighting function and the comparison function into a single function which can calculate weightings for objects in the process of making a selection in relation to the object.

THE RESULTS COMPONENT

The results component 440 accesses the information provided by the weighting component 430 and the object record database 435 and makes a selection of information to provide to presentation component 450.

In the third embodiment, the results component sorts the weighted object records by weighting and selects the ten records with the lowest weighting, representing those members of the set of vehicles simar to the exemplar which more closely resemble the class exemplar, according to the crfteria used for that assessment.

In the case that there are less than ten records in the set, all the records in the set are selected.

The results component 440 then ranks the records by a measureable attribute such as mi'eage, or proximity to a given location, with the object with the lowest attribute va'ue being first in the list. Where two objects have the same attribute value, the one with the lower weight is listed first. Where both objects have the same weight and attribute value, the re'ative rank of the two is treated as arbitrary.

The results component 440 then selects the first five objects in the ranking and makes the information about these vehicles and their ranking available to the presentation component 450. Where there are less than five vehicles in the ranking, all the vehicles in the ranking are made available to the presentation component.

THE PRESENTATION COMPONENT

The presentation component accesses the information made available by the results component 440 and information on the object record database 435 and makes it available in the form of a web page with a table isting the vehicles, in rank order, and indicating, where available, for each one: Make Model Subtype Body type Transmission Doors Fuel type Year Registration Mileage Proximity Date first advertised URL5 of any web pages identified as advertising this vehicle Where the attribute values are common to aU records (e.g. make, model, transmission) the display module lists these attributes once only, above the list of ranked vehicles.

In the third embodiment, information indicative of the value of the Class-exemplar is presented in the form of a table of vehicles, listing a set of vehicles that are identified by the weighting function as most similar to the Class-exemplar in question, and for each one listing significant information for the vehicle. In another embodiment, as well as the list of values of a significant detail of similar vehicles being indicative of the value of that detail for the Class-exemplar, a single value is also calculated which is the average of the set of values, and is displayed.

For purposes of clarity, it should be explained that information indicative of a value of the Class-exemplar may take a variety of forms including but not limited to: a set of values related to objects identified as similar to the Class-exemplar (for example the above table of vehicles details); and as a single value (for example the above average of the set of values, or the value related to an exemplar vehicle from the set of vehicles).

ALTERNATIVE EMBODIMENTS IN VARIATION OF THE THIRD EMBODIMENT

In another embodiment, further statistica' information s also made avaabe in the presentation component, to aid sanfty-checking of groupings and to help identify emerging trends, such as the highest and lowest engine size found within given proximfties (for example within five miles, fifteen miles and one hundred miles), mileage ranges (for example within 10, 50 and 1000 miles of the Class-exemplar's mileage) or within a range or dates of publication of the advertisement for the object (for example within one week, two weeks and one month).

In one embodiment, the data collection component requests from a website a search result page or set of pages, and extracts from them the URLs corresponding to advertisement pages, for example by recognizing that they have a specific form (eg. containing the parameter "advertlD" as an argument) or by recognizing them from context (eg. by identifying that the string "Class-exemplaradvertlink" identifies style instructions in the markup that occur only in relation to advertisement links) retrieving any advertisement pages not already retrieved.

In one embodiment, the data collection component re-retrieves web pages daily in order to check whether they are still publicly available and in order to identify any significant changes in details such as ocation changes. As welt as using this information to update or delete records from the database, it stores information indicating the nature of any updates, such as vehicle, owner or location details, and uses this information to enable statistical reporting of trends, eg. to identify changes in the average values of attributes that might be valuable to an end user in the early identification of market trends or social trends.

In one embodiment, the interface used to control the processing related to the Class-exemplar under consideration is loaded into a user browser in the form of a web page or set of web pages configured to allow the submission to a server of information about the Class-exemplar sufficient to identify the values of the attributes required in order to carry out the selection, weighting and ordering of vehicle records corresponding to vehicles identified as similar to the Class-exemplar, and to allow the information related to the vehicles to be displayed to the user, including a clickable link or links for one or more of the vehicles, where clicking on the link causes the browser retrieve a representation of a web page advertising the vehicle, where the representation is either a web page retrieved directly from the site on which the advertisement originated, in the case where the page is accessible, or the representation is a cached version of the web page, in the case where the page is no longer accessible on the site at which it originated.

Claims (108)

  1. CLAIMS1. Apparatus for maintaining a database, comprising a processor operable to: make a http request for a web page using a first URL; receive a response to said request comprising a relocation status code and a second URL indicative of a location at which said page may be requested; access information dependent on said response, where said information is dependent on said first URL; identify a data value indicative of content available in said web page requested in said request, where the data value identified is dependent on said information, and said data value is dependent on said response and said first URL; update said database using said data value.
  2. 2. Apparatus in accordance to claim 1 where said processor is operable to: access information related to a plurality of website pages containing listings of professional qualifications; store in said database said information related to the plurality of website pages.
  3. 3. Apparatus in accordance to claim 1 for the assessment of an item which is a used vehicle, where said processor is operable to: access information related to a p'urality of website pages containing advertisements for used vehicles; store in said database said information related to the plurality of website pages; access usage-r&ated and specification-related information for said item; compare said usage-related and specification-related information for said item with usage-related and specification-related information for said used vehicles, where the usage-related and specification-related information for the used vehides is derived from the information related to the plurality of website pages; select information identifying a plurality of vehicles identified in the comparison as having similar usage-related information and similar specification-related information to said ftem; output information identifying links enabling access to representations of website pages that advertise vehicles identified in the selected information; and output information related to the plurality of vehicles indicative of a assessment of said item where said information related to advertisements is derived from said information related to the plurality of website pages.
  4. 4. Apparatus in accordance to claim 3 where the processor is operable to: make requests for search result pages to a website hosting pages in the plurality of website pages, said search result pages containing information identifying website pages advertising used veh ides; retrieve said search result pages; extract from the search result pages information identifying website pages advertising used vehicles; make requests to retrieve said website pages advertising used vehicles; extract information related to vehicles from said website pages advertising used vehides where the information related to vehicles includes usage-relatedinformation and specification-related information.
  5. 5. Apparatus in accordance to claim 3 where the processor is operable to: access first information from said database identifying a plurality of previously-accessed website pages, said pages being previously accessed according to claim 3; access second information related to said plurality of previously-accessed website pages, the access of said first information being prior to the access of said second information; update said database using said second information to reflect changes in advertisements in said website pages.
  6. 6. Apparatus in accordance to claim 3 where the processor is biased to prefer vehides for which information indicative of a quantftative value related to the vehide indicates a lower quantitative value compared to a plurality of other vehides.
  7. 7. Apparatus in accordance to claim 3 where the processing of the processor is dependent on time-re'ated information re'evant to advertisements for used veh des
  8. 8. Apparatus in accordance to daim 3 where the processor comprises a plurality of devices connected via a computer network.
  9. 9. Apparatus in accordance to claim 3 where the processor utilizes information related to the plurality of used vehicles to calculate a weighting for a vehicle using a weighting function.
  10. 10. Apparatus in accordance to claim 3 where the processor is operable to: accept from a user information identifying a subset of said plurality of vehicles; and output a signal indicative of a quantitative value responsive to information related to the subset of the set of used vehicles, where the quantitative value related to the subset of the p'urality of vehicles differs from the quantitative value related to the plurality of vehicles in accordance with differences between said plurality and said subset.
  11. 11. Apparatus in accordance to claim 3 configured to support refinement a quantitative assessment where the processor is operable to: output to a user a signal indicative of a first specification for a selection of a set of used vehicles; output to the user a signal indicative of quantitative information related to saiditem dependent on the first specification;accept from a user information related to a second specification for a selection of a set of used vehicles; output a signal indicative of a second set of used vehicles related to the secondspecification; andoutput a signal indicative of quantitative information related to said item dependent on information related to the second set of used vehicles.
  12. 12. Apparatus in accordance to claim 3 where the processor is operable to: access time-dependent information about advertisements identified in said plurality of website pages; and utilize the time-dependent information about advertisements in the process of selecting said information identifying a plurality of vehicles.
  13. 13. Apparatus in accordance to claim 3 where the processor is operable to: access time-dependent information about advertisements identified in said plurality of website pages; and utilize the time-dependent information about advertisements in making the comparison.
  14. 14. Apparatus in accordance to claim 3 where the pages in the plurality of website pages are located on a plurality of websites.
  15. 15. Apparatus in accordance to claim 3 where said selection is dependent on an ordering function and the information output related to the plurality of vehicles indicates an ordering of vehicles distinct from an ordering of vehicles indicated by the ordering function.
  16. 16. Apparatus in accordance to claim 3 where the processor is operab'e to output a signal indicative of a level of confidence associated with an assessment.
  17. 17. Apparatus in accordance to daim 3 where the processor is operable to: compare for each of a plurality of vehides first information related to an advertisement for the vehicle at a first time point and second information related to the advertisement at a second time point; identify differences between said first information and said second information; output a signal dependent on said differences indicative of a statistic related to said pluraUty of vehicles.
  18. 18. Apparatus in accordance to claim 3 item where the processor is operable to: make a request for website pages advertising used vehicles, said website pages being identified in said information identifying a plurality of vehicles; retrieve said requested website pages; and update said information identifying a plurality of vehides using information retrieved in said retrieval.
  19. 19. Apparatus in accordance to claim 3 where the processor is operab'e to output a signal indicative of statistics relevant to a class of objects to which said item belongs.
  20. 20. Apparatus in accordance to claim 3 where the processor is operab'e to output a signal indicative of statistics related to the assessed quantitative value of said item.
  21. 21. Apparatus in accordance to claim 3 where the processor is operab'e to output a signal indicative of statistics related to the advertisement of used vehicles.
  22. 22. Apparatus in accordance to claim 3 where the selection of information identifying a puraty of said vehicles s responsive to weightings calculated by a weighting function.
  23. 23. Apparatus in accordance to claim 3 where the comparison is responsive to weightings calculated by a weighting function.
  24. 24. Apparatus in accordance to claim 3 where the comparison involves engine-related information.
  25. 25. Apparatus in accordance to claim 24 where the processor is biased to prefer vehides for which engine-related information is similar to engine-related information related to said item.
  26. 26. Apparatus in accordance to daim 3 where the comparison involves information related to said item relevant to distance between locations.
  27. 27. Apparatus in accordance to claim 26 where the processor is biased to prefer vehides for which information relevant to distance between locations is indicative of a lower distance between locations compared to information relevant to distance between locations re'evant to other vehicles.
  28. 28. Apparatus in accordance to claim 3 where the comparison involves mileage-related information.
  29. 29. Apparatus in accordance to claim 28 where the processor is biased to prefer vehides for which mileage-related information is similar to mileage-related information related to said item.
  30. 30. Apparatus in accordance to c'aim 3 configured to assess an assessed quantftative value of said ftem where the processor is operab'e to: identify first information indicative of a first assessment for said item in accordance to claim 3; access second information indicative of a second assessment for said item; compare the first information and the second information; and output a signa' dependent on the c'oseness of match between the first assessed quantitative va'ue and the second assessed quantitative va'ue.
  31. 31. Apparatus n accordance to claim 30 where the processor is opera be to output a signa' indicative of a need to review an assessed quantitative va'ue.
  32. 32. Apparatus in accordance to daim 3 where the processor is operab'e to: identify information r&ated to a first advertisement for a used vehide; identify information re'ated to a second advertisement for a used vehide; compare vehide attribute va'ues in the first advertisement with vehide attributes in the the second advertisement; and output a signal eva'uative of whether the advertisements refer to the same used vehicle.
  33. 33. Apparatus in accordance to daim 32 where the processor is configured to omit information re'ated to the first advertisement from said information identifying a plurality of vehicles in the case that information related to the second advertisement is included in the selected information identifying a plurality of veh ides.
  34. 34. Apparatus in accordance to claim 3 where the comparison involves time-related information.
  35. 35. Apparatus in accordance to claim 34 where the processor is biased to prefer vehicles for which time-related information is similar to time-related information related to said item.
  36. 36. A user browser application operable to provide information suitable for the assessment of an item which is a used vehicle by interacting with apparatus operating within a computer network, said apparatus executing a server-based application and interacting with at least said user browser application, said server-based application comprising a processor operable to: access information related to a plurality of website pages containing advertisements for used vehicles; store in said database said information related to the plurality of website pages; access usage-related and specification-related information for said item from the browser; compare said usage-related and specification-related information for said item with usage-related and specification-related information for said used vehicles, where the usage-related and specification-related information for the used vehicles is derived from the information related to the plurality of website pages; select information identifying a plurality of vehicles identified in the comparison as having similar usage-related information and similar specification-related information to said item; output to the browser information identifying links enabling access to representations of website pages that advertise vehicles identified in the selected information; and output to the browser information related to the plurality of vehicles indicative of a assessment of said item where said information related to advertisements is derived from said information related to the plurality of website pages.
  37. 37. Apparatus in accordance to claim 36 where the processor s operable to: make requests for search result pages to a website hosting pages in the pluralfty of websfte pages, said search result pages containing information identifying website pages advertising used vehicles; retrieve said search result pages; extract from the search result pages information identifying website pages advertising used vehicles; make requests to retrieve said website pages advertising used vehicles; extract information related to vehicles from said website pages advertising used vehicles where the information related to vehicles includes usage-relatedinformation and specification-related information.
  38. 38. Apparatus in accordance to claim 36 where the processor is operable to: access first information from said database identifying a plurality of previously-accessed website pages, said pages being previously accessed according to claim 36; access second information related to said plurality of previously-accessed website pages, the access of said first information being prior to the access of said second information; update said database using said second information to reflect changes in advertisements in said website pages.
  39. 39. Apparatus in accordance to claim 36 where the processor is biased to prefer vehicles for which information indicative of a quantitative value related to the vehicle indicates a lower quantitative value compared to a plurality of other vehicles.
  40. 40. Apparatus in accordance to claim 36 where the processing of the processor is dependent on time-related information relevant to advertisements for used veh ides
  41. 41. Apparatus in accordance to claim 36 where the processor comprises a plurality of devices connected via a computer network.
  42. 42. Apparatus in accordance to claim 36 where the processor utilizes information related to the plurality of used vehicles to calculate a weighting for a vehicle using a weighting function.
  43. 43. Apparatus in accordance to claim 36 where the processor is operable to; accept from a user information identifying a subset of said plurality of vehicles; and output a signal indicative of a quantitative value responsive to information related to the subset of the set of used vehicles, where the quantitative value related to the subset of the p'urality of vehicles differs from the quantitative value related to the plurality of vehicles in accordance with differences between said plurality and said subset.
  44. 44. Apparatus in accordance to claim 36 configured to support refinement a quantitative assessment where the processor is operable to: output to a user a signal indicative of a first specification for a selection of a set of used vehicles; output to the user a signal indicative of quantitative information related to saiditem dependent on the first specification;accept from a user information related to a second specification for a selection of a set of used vehicles; output a signal indicative of a second set of used vehicles related to the secondspecification; andoutput a signal indicative of quantitative information related to said item dependent on information related to the second set of used vehides.
  45. 45. Apparatus in accordance to claim 36 where the processor is operable to: access time-dependent information about advertisements identified in said plurality of website pages; and utilize the time-dependent information about advertisements in the process of selecting said information identifying a plurality of vehicles.
  46. 46. Apparatus in accordance to claim 36 where the processor is operable to: access time-dependent information about advertisements identified in said plurality of website pages; and utilize the time-dependent information about advertisements in making the comparison.
  47. 47. Apparatus in accordance to claim 36 where the pages in the plurality of website pages are located on a plurality of websites.
  48. 48. Apparatus in accordance to claim 36 where said se'ection is dependent on an ordering function and the information output related to the plurality of vehicles indicates an ordering of vehicles distinct from an ordering of vehicles indicated by the ordering function.
  49. 49. Apparatus in accordance to claim 36 where the processor is operable to output a signal indicative of a evel of confidence associated with an assessment.
  50. 50. Apparatus in accordance to claim 36 where the processor is operable to: compare for each of a plurality of vehicles first information related to an advertisement for the vehicle at a first time point and second information related to the advertisement at a second time point; identify differences between said first information and said second information; output a signal dependent on said differences indicative of a statistic related to said plura'ity of vehides.
  51. 51. Apparatus in accordance to daim 36 item where the processor is operable to: make a request for website pages advertising used vehicles, said website pages being identified in said information identifying a plurality of vehicles; retrieve said requested website pages; and update said information identifying a plurality of vehides using information retrieved in said retrieval.
  52. 52. Apparatus in accordance to claim 36 where the processor is operable to output a signal indicative of statistics r&evant to a class of objects to which said item belongs.
  53. 53. Apparatus in accordance to claim 36 where the processor is operable to output a signa' indicative of statistics related to the assessed quantitative value of said item.
  54. 54. Apparatus in accordance to claim 36 where the processor is operable to output a signal indicative of statistics related to the advertisement of used vehicles.
  55. 55. Apparatus in accordance to claim 36 where the selection of information identifying a p'urality of said vehicles is responsive to weightings calculated by a weighting function.
  56. 56. Apparatus in accordance to claim 36 where the comparison is responsive to weightings calculated by a weighting function.
  57. 57. Apparatus in accordance to claim 36 configured to support refinement of an assessed quantitative value where the browser application is operable to: access a first set of information related to vehides related to said tern; make a first calculation using the first set of information; output a first signal related to the first calculation indicative of a first assessment of said item; access a second set of information related to vehicles related to said item indicative of a modification of the first information; make a second calculation using the second set of information; and output a second signa' related to the second ca'culation indicative of a second assessment of said item.
  58. 58. Apparatus in accordance to claim 36 where the browser application lists vehides in the plurality of vehicles in a order distinct from an ordering indicated by the processor.
  59. 59. Apparatus in accordance to claim 36 where the comparison involves engine-related information.
  60. 60. Apparatus in accordance to claim 59 where the processor is biased to prefer vehides for which engine-related information is similar to engine-related information related to said item.
  61. 61. Apparatus in accordance to claim 36 where the comparison involves information related to said item relevant to distance between locations.
  62. 62. Apparatus in accordance to claim 61 where the processor is biased to prefer vehides for which information relevant to distance between locations is indicative of a lower distance between locations compared to information relevant to distance between locations relevant to other vehicles.
  63. 63. Apparatus in accordance to claim 36 where the comparison involves mileage-related information.
  64. 64. Apparatus in accordance to claim 63 where the processor is biased to prefer vehicles for which mileage-related information is similar to mileage-related information related to said item.
  65. 65. Apparatus in accordance to claim 36 configured to assess an assessed quantitative value of said item where the processor is operable to: identify first information indicative of a first assessment for said item in accordance to claim 36; access second information indicative of a second assessment for said item; compare the first information and the second information; and output a signal dependent on the closeness of match between the first assessed quantitative value and the second assessed quantitative value.
  66. 66. Apparatus in accordance to claim 65 where the processor is operable to output a signal indicative of a need to review an assessed quantitative value.
  67. 67. Apparatus in accordance to claim 36 where the processor is operable to: identify information related to a first advertisement for a used vehicle; identify information related to a second advertisement for a used vehicle; compare vehicle attribute values in the first advertisement with vehicle attributes in the the second advertisement; and output a signal evaluative of whether the advertisements refer to the same used vehicle.
  68. 68. Apparatus in accordance to claim 67 where the processor is configured to omit information related to the first advertisement from said information identifying a plurality of vehicles in the case that information related to the second advertisement is included in the selected information identifying a plurality of veh ides
  69. 69. Apparatus in accordance to claim 36 where the comparison involves time-related information.
  70. 70. Apparatus in accordance to claim 69 where the processor is biased to prefer vehides for which time-related information is similar to time-related information related to said item.
  71. 71. A method for maintaining a database, the method comprising processor steps of: making a http request to a website for a web page using a first URL; receiving a response to said request comprising a relocation status code and a second URL indicative of a location at which said page may be requested; accessing information dependent on said response, where said information is dependent on said website; identifying a data value indicative of content available in said web page requested in said request, where the data value identified is dependent on said information; updating said database using said data value.
  72. 72. A method in accordance to claim 71 for the assessment of an item which is a used vehicle, where said processor means carries out the steps of: accessing information related to a plurality of website pages containing listings of professional qualifications; storing in said database said information related to the pluraty of website pages.
  73. 73. A method in accordance to claim 71 for the assessment of an item which is a used vehicle, where said processor means carries out the steps of: accessing information related to a plurality of websfte pages containing advertisements for used vehicles; storing in said database said information related to the plurality of website pages; accessing usage-re'ated and specification-related information for said item; comparing said usage-related and specification-re'ated information for said item with usage-related and specification-related information for said used vehicles, where the usage-related and specification-related information for the used vehides is derived from the information related to the plurality of website pages; selecting information identifying a plurality of vehicles identified in the comparison as having similar usage-related information and similar specification-related information to said item; outputting information identifying links enabling access to representations of website pages that advertise vehicles identified in the selected information; and outputting information related to the plurality of vehicles indicative of a assessment of said item where said information related to advertisements is derived from said information related to the plurality of website pages.
  74. 74. A method in accordance to claim 73 where the processor means carries out the step of accessing information related to the plurality of website pages by carrying out the steps of: making requests for search result pages to a website hosting pages in the plurality of website pages, said search result pages containing information identifying website pages advertising used veh ides; retrieving said search result pages; extracting from the search result pages information identifying websfte pages adverUsing used vehicles; making requests to retrieve said website pages advertising used vehicles; extracting information related to vehicles from said website pages advertising used vehicles where the information related to vehicles includes usage-relatedinformation and specification-related information.
  75. 75. A method in accordance to claim 73 where the processor means carries out the further steps of: accessing first information from said database identifying a plurality of previously-accessed website pages, said pages being previously accessed according to claim 73; accessing second information related to said plurality of previously-accessed website pages, the access of said first information being prior to the access of said second information; updating said database using said second information to reflect changes in advertisements in said website pages.
  76. 76. A method in accordance to claim 73 where the processor means prefers vehicles for which information indicative of a quantitative value related to the vehicle indicates a lower quantitative value compared to a plurality of other vehicles.
  77. 77. A method in accordance to claim 73 where the processing of the processor means is dependent on time-related information relevant to advertisements for used vehicles.
  78. 78. A method in accordance to daim 73 where the processor means comprises a puraity of devices connected via a computer network.
  79. 79. A method in accordance to daim 73 where the processor means utilizes information related to the plurality of used vehicles to calculate a weighting for a vehide using a weighting function.
  80. 80. A method in accordance to claim 73 where the processor means carries out the steps of: accepting from a user information identifying a subset of said plurality of vehides; and outputting a signal indicative of a quantitative va'ue responsive to information related to the subset of the set of used vehicles, where the quantitative value related to the subset of the p'urality of vehides differs from the quantitative value related to the plurality of vehides in accordance with differences between said plurality and said subset.
  81. 81. A method in accordance to daim 73 to support refinement of a quantitative assessment where the processor means carries out the steps of: outputting to a user a signal indicative of a first specification for a selection of a set of used veh ides; outputting to the user a signal indicative of quantitative information related tosaid item dependent on the first specification;accepting from a user information related to a second specification for a selection of a set of used vehicles; outputting a signal indicative of a second set of used vehicles related to thesecond specification; andoutputting a signal indicative of quantitative information related to said item dependent on information related to the second set of used vehicles.
  82. 82. A method in accordance to claim 73 where the processor means carries out the steps of: accessing time-dependent information about advertisements identified in said plurality of website pages; and utilizing the time-dependent information about advertisements in the process of selecting said information identifying a plurality of vehicles.
  83. 83. A method in accordance to claim 73 where the processor means carries out the steps of: accessing time-dependent information about advertisements identified in said plurality of website pages; and utilizing the time-dependent information about advertisements in making the comparison.
  84. 84. A method in accordance to claim 73 where the pages in the plurality of website pages are located on a plurality of webs ites.
  85. 85. A method in accordance to claim 73 where said selection is dependent on an ordering function and the information output related to the plurality of vehicles indicates an ordering of vehicles distinct from an ordering of vehicles indicated by the ordering function.
  86. 86. A method in accordance to claim 73 where the processor means carries out the step of outputting a signal indicative of a level of confidence associated with an assessment.
  87. 87. A method in accordance to claim 73 where the processor means carries out the further steps of: comparing for each of a pluraty of vehicles first information related to an advertisement for the vehicle at a first time point and second information related to the advertisement at a second time point; identifying differences between said first information and said second information; outputting a signal dependent on said differences indicative of a statistic related to said plurality of vehicles.
  88. 88. A method in accordance to claim 73 where the processor means carries out the steps of: making a request for website pages advertising used vehicles, said website pages being identified in said information identifying a plurality of vehicles; retrieving said requested website pages; and updating said information identifying a plurality of vehicles using information retrieved in said retrieval.
  89. 89. A method in accordance to claim 73 where the processor means carries out the step of outputting a signal indicative of statistics relevant to a class of objects to which said item belongs.
  90. 90. A method in accordance to claim 73 where the processor means carries out the step of outputting a signal indicative of statistics related to the assessed quantitative value of said item.
  91. 91. A method in accordance to claim 73 where the processor means carries out the step of outputting a signal indicative of statistics related to the advertisement of used vehicles.
  92. 92. A method in accordance to claim 73 where the selection of information identifying a pluraty of said vehicles s responswe to weightings calculated by a weighting function.
  93. 93. A method in accordance to claim 73 where the comparison is responsive to weightings calculated by a weighting function.
  94. 94. A method in accordance to claim 73 where the comparison involves engine-related information.
  95. 95. A method in accordance to claim 94 where the processor means prefers vehicles for which engine-related information is similar to engine-related information related to said item.
  96. 96. A method in accordance to claim 73 where the comparison involves information related to said item relevant to distance between locations.
  97. 97. A method in accordance to claim 96 where the processor means prefers vehicles for which information relevant to distance between locations is indicative of a lower distance between locations compared to information relevant to distance between locations relevant to other vehicles.
  98. 98. A method in accordance to claim 73 where the comparison involves mileage-related information.
  99. 99. A method in accordance to claim 98 where the processor means prefers vehicles for which mileage-related information is similar to mileage-related information related to said item.
  100. 100. A method in accordance to claim 73 to assess an assessed quantitative value of said ftem where the processor means carnes out the steps of: identifying first information indcative of a first assessment for said item in accordance to daim 73; accessing second information indicative of a second assessment for said item; comparing the first information and the second information; and outputting a signal dependent on the closeness of match between the first assessed quantitative value and the second assessed quantftative value.
  101. 101. A method in accordance to claim 100 where the processor means carries out the step of outputting a signal indicative of a need to review an assessed quantitative value.
  102. 102. A method in accordance to claim 73 where the processor means carries out the steps of: identifying information related to a first advertisement for a used vehicle; identifying information related to a second advertisement for a used vehicle; comparing vehicle attribute values in the first advertisement with vehicle attributes in the the second advertisement; and outputting a signal evaluative of whether the advertisements refer to the same used vehicle.
  103. 103. A method in accordance to claim 102 where the processor means carries out the step of omitting information related to the first advertisement from said information identifying a pluraUty of vehicles in the case that information related to the second advertisement is included in the selected information identifying a plurality of vehicles.
  104. 104. A method in accordance to daim 73 where the comparison involves time-related information.
  105. 105. A method in accordance to daim 104 where the processor means prefers vehicles for which time-related information is simUar to time-re'ated information related to said item.
  106. 106. Program instructions for programming processor means to carry out a method in accordance to claim 71.
  107. 107. A storage medium for storing program instructions in accordance to claim 73.
  108. 108. A signal carrying program instructions in accordance to claim 73.
GB1006839A 2009-02-16 2010-04-26 Method for updating a database Withdrawn GB2469909A (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
GB0907188A GB2472562A (en) 2009-02-16 2009-04-27 Assessing the value of an item
GB0909974A GB0909974D0 (en) 2009-02-16 2009-06-10 Method and apparatus for assessing the value of an item
GB0916296A GB0916296D0 (en) 2009-09-17 2009-09-17 Method and apparatus for assessing the value of an item
GBGB1001401.7A GB201001401D0 (en) 2010-01-28 2010-01-28 Method and apparatus for creating a database
GBGB1003037.7A GB201003037D0 (en) 2010-02-23 2010-02-23 Method and apparatus for creating a database

Publications (2)

Publication Number Publication Date
GB201006839D0 GB201006839D0 (en) 2010-06-09
GB2469909A true GB2469909A (en) 2010-11-03

Family

ID=42270753

Family Applications (2)

Application Number Title Priority Date Filing Date
GB1006839A Withdrawn GB2469909A (en) 2009-02-16 2010-04-26 Method for updating a database
GBGB1013224.9A Ceased GB201013224D0 (en) 2009-09-17 2010-08-06 Method and apparatus for creating a database

Family Applications After (1)

Application Number Title Priority Date Filing Date
GBGB1013224.9A Ceased GB201013224D0 (en) 2009-09-17 2010-08-06 Method and apparatus for creating a database

Country Status (1)

Country Link
GB (2) GB2469909A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130186963A1 (en) * 2000-07-18 2013-07-25 Scott C. Harris Barcode Device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130186963A1 (en) * 2000-07-18 2013-07-25 Scott C. Harris Barcode Device

Also Published As

Publication number Publication date
GB201006839D0 (en) 2010-06-09
GB201013224D0 (en) 2010-09-22

Similar Documents

Publication Publication Date Title
US7822757B2 (en) System and method for providing enhanced information
US6397219B2 (en) Network based classified information systems
CN101288046B (en) Identifying alternative spellings of search strings by analyzing self-corrective searching behaviors of users
US7640239B2 (en) Methods and apparatus for real-time business visibility using persistent schema-less data storage
US6564210B1 (en) System and method for searching databases employing user profiles
JP5302351B2 (en) Systems and methods for search result relevance for automatic optimization
US7840538B2 (en) Discovering query intent from search queries and concept networks
KR101312190B1 (en) Search systems and methods with integration of user annotations
US7822751B2 (en) Scoring local search results based on location prominence
US7860872B2 (en) Automated media analysis and document management system
US20080189254A1 (en) Presenting web site analytics
CN101739467B (en) Personalized network searching method and system
US20040019688A1 (en) Providing substantially real-time access to collected information concerning user interaction with a web page of a website
KR101380936B1 (en) Time series search engine
US8099406B2 (en) Method for human editing of information in search results
KR101063364B1 (en) A system and method for prioritizing Web site for the Web crawl process
US8234266B2 (en) Mobile SiteMaps
JP5778255B2 (en) The method of queries based on vertical search, system, and device
US20100299290A1 (en) Web Query Classification
US20070174255A1 (en) Analyzing content to determine context and serving relevant content based on the context
US8117207B2 (en) System and methods for evaluating feature opinions for products, services, and entities
US10269024B2 (en) Systems and methods for identifying and measuring trends in consumer content demand within vertically associated websites and related content
US20030212648A1 (en) Use of extensible markup language in a system and method for influencing a position on a search result list generated by a computer network search engine
US20020013729A1 (en) Advertisement presentation system
US20070239528A1 (en) Dynamic proxy method and apparatus for an online marketing campaign

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)