WO2015120968A1 - Increasing search result validity - Google Patents

Increasing search result validity Download PDF

Info

Publication number
WO2015120968A1
WO2015120968A1 PCT/EP2015/000249 EP2015000249W WO2015120968A1 WO 2015120968 A1 WO2015120968 A1 WO 2015120968A1 EP 2015000249 W EP2015000249 W EP 2015000249W WO 2015120968 A1 WO2015120968 A1 WO 2015120968A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
collected
search results
client
platform
Prior art date
Application number
PCT/EP2015/000249
Other languages
French (fr)
Inventor
Guillaume Legrand
Charles-Antoine Robelin
Luc Isnardy
François LABURTHE
Original Assignee
Amadeus S.A.S.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/179,815 external-priority patent/US9984165B2/en
Priority claimed from EP14290034.9A external-priority patent/EP2908255B1/en
Application filed by Amadeus S.A.S. filed Critical Amadeus S.A.S.
Publication of WO2015120968A1 publication Critical patent/WO2015120968A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Definitions

  • the present invention generally relates to database technology. More specifically, the present invention is directed to a mechanism to increase validity or confidence of search results retrieved from a pool of pre-computed search results.
  • a common object in database technology is to ensure short response times to database queries which require processing large volumes of data. For example, such computing-power consuming processing has to be performed in response to so-called "open queries" which contain only little input information (e.g. only one or two parameters out of a dozen possible parameters are specified and/or the specified value ranges of the parameters are broad). Consequently, such open queries lead to a large number of results in general. Possibilities to speed up data processing by increasing hardware performance are limited. Thus, attention is drawn to improving the mechanisms underlying the processing of large data volumes.
  • One approach to shorten query times is to pre-compute or pre-collect expected results to queries and to maintain the corresponding query results in a pool of pre-computed or pre- collected results. Queries are then actually not processed on the large and distributed and/or complex-to-calculate data basis, but are directed to the pool.
  • this approach is employed by Internet search engines which utilize automated robots or crawlers to collect content of web servers and store this pre-collected content in a search engine repository. Internet search queries are then answered on the basis of the repository instead of retrieving the web servers' primary content at search query time.
  • One solution to remedy this issue is directed to improving the validity or correctness of the pre-computed or pre-collected query results by optimizing the re-computation or re-collection strategy, for example, by re-computing or re-collecting these query results with priority which are likely outdated.
  • Such strategies are, for example, described by International patent application PCT/EP2013/002390, EP 2541473 Al and US 2009/0234682 Al .
  • a 100% all-time validity or correctness of the pre-computed or pre-collected query results is unachievable.
  • US 7,562,027 Bl discloses a travel planning system dealing with seat availability.
  • the system is implemented by a computer system.
  • the system includes a scheduling process and an availability process.
  • the scheduling process provides a set of instances of transportation that satisfies a user query.
  • the availability process accesses seat availability information from multiple information sources.
  • the availability determines quality properties such as confidence, precision and/or validity of availability information retrieved from a first information source and determines whether the first information source is reliable. If the first information source is not reliable, the availability process executes a second set of seat availability queries to the first information source or a different information source to provide a second set of instances of transportation for which a seat is available.
  • some sources of availability data include measures of availability confidence of the results (e.g., "a seat of Q is available with 80% certainty"). This certainty information, if present, is passed to the client for processing and display for informative purposes to the user.
  • the client can also be programmed to filter out seats and flights with less than a specified probability of being available.
  • the present invention takes a different approach than the strategies of increasing the validity of pre-computed or pre-collected search results by optimizing the re-computation or re-collection process which is performed asynchronous to the occurrence of search queries.
  • the present invention proposes a mechanism to estimate the validity of pre-computed or pre-collected search results and to utilize this validity estimation in order to return pre- computed or pre-collected search results to the client which are probably valid.
  • a method of handling queries in a database system is provided.
  • the database system has at least one client and at least one search platform.
  • the search platform maintains pre-computed search results which are associated which confidence factors.
  • a confidence factor indicates a probability of the associated search result being valid.
  • a query indicating at least one search criterion is received by the search platform.
  • the search platform utilizes the confidence factors associated with the identified pre-collected search results to increase the mean probability of pre-collected search results of being valid returned to the client.
  • the search platform utilizes the confidence factors associated with the identified pre-collected search results by returning only pre-computed search results which are associated with confidence factors having values exceeding a given threshold.
  • a respective search platform for handling queries in a database system as described above is provided.
  • a database system including a search platform and a client for handling queries as described above is provided.
  • a non-transitory computer readable storage medium having computer program instructions stored therein, which, when executed on a computer system, perform the method as described above.
  • Fig. 1 gives an overview of a system including at least one client and a server maintaining pre-computed search results.
  • Fig. 2 visualizes the effect of the likelihood decreasing over time that pre-computed search results kept in the server are valid.
  • Fig. 3 shows a search platform maintaining pre-computed search results and associated confidence factor values.
  • Fig. 4 shows a first example of the search platform internally utilizing a filter on the basis of the confidence factor threshold.
  • Fig. 5 is a message sequence chart relating to the example of Fig.4.
  • Fig. 6 illustrates a second example according to which the search platform employs a confidence factor threshold as an additional search criterion.
  • Fig. 7 depicts a message flow of the example of Fig. 6.
  • Fig. 8 explains a third example according to which the confidence factor threshold is employed to perform a further query in a primary data source in response to the server's query results.
  • Fig. 9 presents a message sequence of the example of Fig. 8.
  • Fig. 10 shows a fourth example according to which the search platform performs a further query to a primary data source.
  • Fig. 1 1 is a message sequence chart relating to the example of Fig. 10.
  • Fig. 12 shows a fifth example according to which the search platform first returns pre- computed search results to the client and performs re- validation subsequently.
  • Fig. 13 is a message sequence chart relating to the example of Fig. 12.
  • Fig. 14 depicts an exemplary architecture example of a distributed travel-related database environment.
  • Fig. 15 is an exemplary schematic view of the internal architecture of the search platform and/or the client.
  • the present invention generally relates to handling search queries in a database system maintaining pre-computed or pre-collected search results.
  • An exemplary database system 1 is shown by Figure 1.
  • the database system 1 includes at least one, but generally a plurality of clients 4 and at least one search platform 2.
  • a plurality of search platforms 2 may be present.
  • the at least one search platform 2 maintains pre-computed or pre-collected search results in order to decrease response times to answer search queries received by the clients 4.
  • pre-collected is used to cover any sort of pre-collection and pre-computation such as simple Internet crawlers collecting or copying the content of Internet web servers, but also complex and time-intensive computations of search results on the basis of underlying data as it is e.g. described for priced travel recommendation by PCT/EP2013/002390 and EP 2541473 Al .
  • database is meant to encompass any types of structured information storage system such as standard stand-alone databases like SQL server or Oracle databases as well as complex, distributed and/or proprietary storage systems, relational databases including database management systems or object-oriented database systems.
  • the client 4 directs search queries to the search platform 2 including one or more search criteria or parameters.
  • a search query is an Internet search
  • the search query might carry a search string, search text or search phrase as search criteria.
  • a further search criterion may be the language of websites to be searched or an indication of a point of time of the first availability of the requested search string, search text or search phrase.
  • the search query is a database request for a product or service offered by a service provider platform such as an Internet book store or a travel provider.
  • the search query might include e.g. an upper price limit or a price range for the service or product and desired characteristics of the product/service such as book title, travel origin and destination, etc.
  • the search platform 2 processes a search query received from the client 4 and performs a database search within the pre-collected search results. In turn, search platform 2 responds with one or more pre-collected search results fulfilling the search criteria included in the search query. The client 4 receives this response and presents the search results to the user.
  • the pre-collection of search results is performed by using computation/collection platform 3.
  • search platform 2 or another control entity employs an appropriate re-collection strategy in order to update the pre-collected search results stored by search platform 2.
  • search platform 2 or the other control entity generates and transmits re-collection orders to computation/collection platform 3.
  • Computation/collection platform 3 executes the re-computation or re-collection, e.g. by requesting original data corresponding to the pre-collected search results from primary data sources.
  • any suitable re-collection strategy for updating the pre-collected search results may be employed, e.g. the update strategies as described by PCT/EP2013/002390 which is incorporated by reference herein.
  • the present invention does not focus on improving the validity or correctness of pre-collected search results returned to client 4 in response to search queries by a particular re-collection strategy. Rather, the invention focuses on improving the validity or correctness of pre- collected search results which are actually returned to the client 4 at the time of an incoming search query. In essence, it is proposed to return only such pre-collected search results to the client 4 which have a certain likelihood of being valid, while it is refrained from returning pre-collected search results to client 4 which have a certain likelihood of being invalid.
  • the pre-collected search results maintained by search platform 2 are associated with confidence factors.
  • each pre-collected search result stored by search platform 2 has a corresponding confidence factor.
  • one confidence factor may be associated with a plurality of pre-collected search results.
  • a confidence factor indicates a probability of the associated pre-collected search result(s) being valid.
  • confidence factors associated with pre-collected search results are utilized in order to decide which pre- collected search results are returned to the client 4 in response to a search query and which pre-collected search results are not returned to the client 4 and/or are returned to the client 4 in a specific way.
  • the present invention utilizes the confidence factors in order to generally provide the client 4 with pre-collected search results having a higher probability of being valid than pre-collected search results which would have been returned to the client without utilizing the confidence factors.
  • the confidence factors may be utilized in different ways in order to provide the client 4 with potentially more valid pre-collected search results, as will be described below.
  • a confidence threshold is employed. This confidence threshold is either prescribed by the client 4.
  • the client 4 includes a threshold value (such as "at least 85%” or “at least 0,9” or “high” being likewise defined as “at least 0,9”) in the search query when requesting search results from the search platform 2.
  • the client 4 may also send dedicated asynchronous messages indicating a desired confidence threshold to the search platform 2.
  • Search platform 2 stores these client-specific confidence threshold prescriptions and employs them each time a search query is received from client 4.
  • the confidence threshold is set by a third party such as the operator of the search platform 2. In this case, a single confidence threshold value may be applicable for all search queries received from all clients 4.
  • the clients 4 may not have an influence on the confidence threshold employed by the search platform 2.
  • the confidence threshold pre-set by the third party may act as a default value and clients 4 may be able to override this default value by an own client-specific prescription.
  • the client 4 is provided with search results which are associated with a confidence factor value exceeding the confidence threshold, wherein “exceeding" may also include the case that the confidence factor value equals the confidence threshold.
  • the client 4 is only provided with pre- collected search results exceeding the confidence threshold.
  • the client 4 may initially also provided with pre-collected search results below the confidence threshold, while the search platform performs a validation of these pre-collected search results below the threshold and updates the tentatively returned pre-collected search results below the threshold with the corresponding validated search results.
  • US 7,562,027 Bl which, on the one hand, classifies different data sources with different levels of confidence or reliability.
  • the confidence measures of US 7,562,027 Bl (“a seat in Q is available with 80% certainty") is not a confidence factor in the sense of the present invention because the confidence measures of US 7,562,027 Bl do not indicate an estimation of the validity of pre-collected search results (i.e. the likely that the pre-collected search results still correspond to the original search results), but an estimation of the actual seat availability.
  • the confidence measures of US 7,562,027 Bl are part of the user data, while the present confidence factors are control data e.g. maintained on a probabilistic model.
  • US 7,562,027 Bl describes these kinds of measures of confidence included in search results, these information are simply passed on the client and potentially displayed to the user which, as such, does not result in a client's provision with search results having a higher probability of being valid as achieved by the mechanisms disclosed herein.
  • US 7,562,027 Bl proposes to filter out potentially invalid search results only at the client side, while the present invention advantageously utilizes the confidence factors downstream the client 4.
  • the function of the confidence factor to indicate a validity probability of pre-collected search results according to the present invention is exemplarily implemented by a probabilistic model utilizing the following parameters:
  • the age ti of a pre-collected search result refers to the time passed since the last re- computation or re-collected of this pre-collected search result by the computation/collection platform 3.
  • the validity rate ⁇ of the pre-collected search result i is a measure of how long the pre-collected search result i remains valid or how fast the pre-collected search result i becomes invalid due to changes of the underlying original data.
  • This validity rate of a given pre-computed search result i is, for example, statistically derived from the occurrence and the outcomes of past (re-)computations or (re-)collections and comparisons of the re-collected search result with its previous state or values.
  • a particular pre-collected search result i has a validity rate of 10% per hour meaning that the probability of i being valid decreases by 10% every hour.
  • the upper function represents a pre-collected search result which potentially remains more accurate (or, more correctly, stays at a higher probability of being valid over time) than another pre-computed search result associated with the lower function.
  • the pre-computed search result represented by the upper function has 70% probability of being still valid at 35 hours after its last re-collection, while the other pre- computed search result characterized by the lower function is only valid up to about 50% at 35 hours after its latest re-collection.
  • Both functions may also represent whole sets of pre- collected search results and accordingly indicate proportions of the sets of pre-collected search results likely being valid at a time passed since the last re-collection of the set.
  • the confidence factor values are derived from such probabilistic model modelling the validity of pre-collected search results over time. More specifically, in some embodiments, the probability of a pre-collected search result i being valid at a time t after a previous collection of the pre-collected search result i is given by ⁇ ⁇ ,, ' . As outlined before, ⁇ , denotes a rate of the pre-collected search result i becoming invalid.
  • the confidence factors ⁇ ⁇ , ⁇ ' associated with the pre-collected search results are maintained by the search platform 2 (or another entity) in form of stored values of the pre-collected search results' validity rate ⁇ and the timestamps TS of the last re-collected or re-computation of the pre-collected search results.
  • search platform 2 stores the validity rate ⁇ , and the timestamp TSi. These values are not changing over time, but are constant until the next re-collection of i. At search time, i.e.
  • the confidence factor value of pre-collected search result i e ⁇ k is calculated from by using ⁇ , and TSi, wherein t in e ⁇ l,t> results from TS S - TSi, TS S referring to the time of the search query's arrival at search platform 2.
  • the confidence factor is associated with the pre- collected search results by having the values of ⁇ and TS stored for each pre-collected search result (or sets of pre-collected search results).
  • FIG. 3 A basic setting of client 4 and search platform 2 is shown by Figure 3.
  • Search platform 2 runs a database with pre-collected search results.
  • the pre-collected search results include e.g. an index (visualized by "#" in Figure 3), the search result data (indicated by "Data” in Figure 3) including e.g. data fields which are defined as primary key values and secondary secondary key values as well as confidence factor values (referred to as "Conf. Factor” in Figure 3).
  • the confidence factor values are stored in the database in form of the values ⁇ and TS being associated with each of the pre-collected search result.
  • Client 4 directs a search query 10 to search platform 2.
  • Search platform 2 processes the search query 10 and performs a search in the database in order to determine pre-collected search results fulfilling search criteria transmitted with the search query.
  • Search platform 2 generally returns pre-collected search results by message 1 1 which meet the confidence threshold.
  • the confidence factors being associated with the pre-collected search results stored by the search platform 2 are not stored in the same database tables, partition or in the same database as the pre-collected search results, but are maintained in a separate database or station and retrieved from there by the search platform 2 at the time of processing a search query (cf. also Figure 12 discussed further below).
  • Figures 4 and 5 illustrate a first example according to which search platform 2 ensures that client 4 receives only search results fulfilling the confidence threshold by filtering out pre-collected search results with confidence factor values below the confidence threshold.
  • the client 4 generates and transmits a search query 10 to search platform 2.
  • Search query 10 includes one or more search criteria such as a search string for an Internet search.
  • Search platform 2 performs a database lookup in the database of pre-collected search results and retrieves pre-collected search results fulfilling the at least one search criterion passed over with the search query 10.
  • the search platform 2 filters out those pre-collected search results from the set of pre-collected search results resulting from the database lookup which do not meet the confidence threshold, i.e. which have confidence factor values below the confidence threshold. These filtered pre- collected search results are not returned to the client 4. Rather, the search platform 2 only returns these pre-collected search results uncovered by the database lookup which have confidence factor values at or above the confidence threshold. These results are returned to the client 4 by message 20 ( Figure 4).
  • Figure 5 is a message sequence chart visualizing the message flow of this first implementation example.
  • the search query includes one or more search criteria, for example, in the exemplary case of a travel-related query, parameters for the travel the user is interested in such as an origin and destination pair and a timeframe for the travel ( Figure 5: "criterion A", optional "criterion B").
  • search query 10 also includes a values for the confidence threshold ( Figure 5: "confidence threshold”).
  • search platform 10 utilizes a predetermined confidence threshold in an autonomous manner, i.e. without receiving a confidence threshold in the search query 10.
  • search platform 2 is provided with a default value for the confidence threshold prior to receiving the search query 10.
  • Search platform 2 then performs the database lookup in its pool of pre-collected search results on the basis of the one or more search criteria which were received as content of the search query 10. Subsequently, search platform 2 filters out these search results with confidence factor values below the confidence threshold and returns, by message 20, only those pre- collected search results with confidence factor values at or above the confidence threshold.
  • client 4 is provided with search results in a similar fast manner than a normal query to search platform 2 without a utilization of the confidence factor as presented herein.
  • the filter activity by search platform 2 may result in "holes" in the set of pre-collected search results produced by the search platform's database lookup.
  • a substantial part of potential search results the user is interested in may be missed due to the filter activity and not returned to client 4. This may nevertheless be acceptable for particular applications, for example, the retrieval of advertisement banners which are of potential interest to the user on web pages.
  • the search platform 2 utilizes the confidence threshold as an additional search criterion ( Figures 6 and 7). Basically, in this second example, search platform 2 utilizes the confidence threshold as a search criterion in addition to the at least one search criterion included in the search query 10.
  • Figure 6 visualizes the interaction between client 4 and search platform 2, while Figure 7 shows the message sequence between both entities.
  • Client 4 generates and transmits search query 10 to the search platform 2 in a similar manner as described in the first implementation example before.
  • the search platform 2 After the search platform 2 has received the search query 10, the search platform 2 performs a lookup in its database of pre-collected search results.
  • this database lookup is based on the at least one search criterion included in the search query 10 as in the first implementation example.
  • the database lookup is also based on the confidence threshold which is either prescribed by the client (e.g. by being included in the search query 10) or is available internally in the search platform 2.
  • the confidence threshold functions as an additional search criterion, i.e. the database lookup only retrieves such pre-collected search results which have associated confidence factors with values being at or above the confidence threshold.
  • pre-collected search results which fulfil the at least one search criterion included in the search query 10, but not the confidence threshold, are not returned by the database lookup.
  • the pre-collected search results uncovered by the database lookup are returned by the search platform 2 to the client with message 12 (cf. Figures 4 and 5).
  • the search query 10 is directed to find the cheapest flights from Munich to Paris within a three day time interval.
  • the search platform 10 may generally operate in a manner that for each of the three days, the five cheapest flights stored as pre-collected data records are returned to the client 4. It may be the case that all of the five cheapest flights e.g.
  • the search platform 2 determines the five cheapest flights on the third day which also fulfil the confidence threshold.
  • client 4 is provided with search results for all three days satisfying the given confidence requirements.
  • Figures 8 and 9 present a third example according to which client 4 only receives search results satisfying the confidence threshold.
  • search platform 2 includes a search platform server 2a as well as an entity located upstream the search platform server 2a, i.e. an intermediate element between the client 4 and the server 2a. This entity is herein referred to as switch 6.
  • the client 4 directs its search query to server 2a.
  • the search query is transmitted via switch 6 to the server 2a. More specifically, switch 6 receives search query 10 from client 4 and relays search query 10 to server 2a in form of message 13.
  • the server 2a then performs a database search on the basis of the stored pre-collected search results in accordance with the search criteria included in the search query 10.
  • the server 2a then, by message 14, returns pre-collected search results fulfilling the search criteria together with the confidence factor values associated with these pre-collected search results.
  • the database search conducted by the server 2a is not limited to any pre-collected search results being associated with a certain confidence factor threshold. Rather, the server 2a returns pre-collected search results to the switch 6 irrespective of their associated confidence factor values.
  • the threshold may either be set by the client e.g. by including a threshold value in the search query 10 (or into any other message transmitted asynchronously to search query 10) or, alternatively, be autonomously set by switch 6, e.g. by utilizing a given default value.
  • the switch 6 evaluates the confidence factor values of the pre-collected search results received from the server 2a by message 14. Pre- collected search results being associated with confidence factor values at or above the threshold are forwarded unchanged to the client 4 by message 15. Pre-collected search results having confidence factor values below the threshold are not forwarded to the client 4.
  • the switch initiates a secondary database search at a primary data source 5 by messages 16 and 17.
  • the primary data source may maintain original data which is not pre-collected.
  • This secondary database search thus validates the pre-collected search results received from the server 2a with confidence factor values below the threshold.
  • the validated search results received by switch 6 from the primary data source 5 with message 17 are thus 100% valid.
  • the switch 6 then returns the search results to the client 4 by messages 15 and 18.
  • message 15 may either by sent to the client 4 immediately after the respective pre-collected search results received by switch 6 from server 2 with message 14 have been recognized to be associated with confidence factor values at or above the threshold, while message 18 are only sent after the secondary database search with the primary data source 5 has been performed.
  • messages 15 and 18 are sent separately at different points of time.
  • message 15 may be held back by switch 6 until the validated search results have been received from the primary data source 5 with message 17.
  • messages 15 and 18 are sent at the substantially same point of time. They may also be sent as a single combined message.
  • switch 6 is able to provide the client 4 with pre-collected search results having confidence factor values above the threshold and/or search results validated with the primary data source 5 in an incremental manner. Accordingly, client 4 might be arranged to display incrementally arriving search results in an incremental manner to the user.
  • the validated search results may not only be forwarded to client 4, but also to server 2a for including the validated search results in the database of the server 2a.
  • server 2a for including the validated search results in the database of the server 2a.
  • the revalidation of the pre-collected search results below the confidence threshold are leveraged for future search queries as they may not require re-validation, but may have confidence factor values above the threshold and, thus, may be returned to client 4 without revalidation.
  • the switch 6 and server 2a forming the search platform 2 may be implemented as an integrated entity or as separate elements or modules.
  • switch 6 may be implemented as a software module with the same hardware station of server 2a.
  • switch 6 is implemented by separate hardware.
  • switch 6 may serve more than one server 2a and may therefore operate as a unified interface for a plurality of servers 2a.
  • search query 10 issued by client 4.
  • the search query 10 includes at least one search criterion ("criterion A").
  • search query 10 will contain more than one search criterion, as indicated by italic "criterion B".
  • search query 10 might include e.g. the four search criteria or search parameters origin city (e.g. Nice), destination city (e.g. New York), outbound date (e.g. 27 December 2013) and return date (e.g. 6 January 2014).
  • search query 10 includes a value for the confidence threshold which switch 6 is going to apply.
  • Switch 6 receives the search query 10 and relays search query 10 to server 2a by message 13.
  • server 2a performs a database lookup in the pool of pre- collected search results by using the search criteria included in search query 10 and message 13.
  • server 10 returns the retrieved pre-collected search results fulfilling the search criteria.
  • These pre-collected search results include the associated confidence factor values.
  • Switch 6 receives the pre-collected search results with message 14 from server 2a and analyses the associated confidence factor values and compares the associated confidence factor values with the confidence threshold. Switch 6 forwards pre-collected search results with confidence factor values at or above the confidence threshold to the client 4 by return message 15. On the other hand, switch 6 requests validation of pre-collected search results having confidence factor values below the confidence threshold with primary data source 5. To this end, switch 6 sends request message 16 to primary data source 5. Request message 16 might contain the primary key values of the pre-collected search results to be validated in order to specifically request the pre-collected search results to be validated from the primary data source 5. Primary data source 5 looks up the request search results and returns the original and therefore valid search results to switch 6 with message 17. Finally, switch 6 forwards the validated search results to client 4 by message 18.
  • the primary data source 5 may actually include more than one data source, e.g. a plurality of databases, web server, computation platforms, etc.
  • messages 16 and 17 may be decomposed into several sub-messages which are sent to the plurality of primary data sources.
  • Messages 16 and 17 may also formed by a plurality of sub-messages if the primary data source 5 is a single data source, e.g. in order to realize an incremental validation as explained next.
  • the switch 6 is additionally arranged to control the validation of the pre-collected search results below the confidence threshold in a more sophisticated manner. For example, switch 6 request validation of only a subset of the pre- collected search results below the confidence threshold, while other pre-collected search results below the confidence threshold are not validated (and are, in this example, not forwarded to the client 4 - in other examples, pre-collected search results with confidence factor values below the confidence threshold may also be forwarded to the client 4 although they have not been part of the validated subset).
  • the subset is, for example, be formed by an available time for validation.
  • switch 6 performs the validation in an incremental way (e.g.
  • a single request message 16 is decomposed into a plurality of validation requests which are serially sent to the primary data source 5 for every pre-collected search result to be validated) and stops sending requests 16 to the primary data source 5 after a given period of time.
  • this incremental validation is performed in an ordered manner, e.g. starting with the pre-collected search result(s) having the lowest confidence factor value and continuing with pre-collected search results(s) having greater confidence factor values. In this way, the validation controlled by switch 6 times out.
  • switch 6 indicates the time available for validation to the primary data source 5 and it is the primary data source 5 which stops the validation activity after the time is elapsed.
  • the subset is, additionally or alternatively, be formed by a limit of the number of pre- collected search results to be requested from the primary data source 5 or by computation resources available at the primary data source 5.
  • switch 6 may be arranged to decide to only validate a given number of pre-collected search results (for example, 20 pre- collected search results) and request validation of that given number from the primary data source 5 while pre-collected search results below the threshold excessing the given number may be discarded by the switch 6 or, in other embodiments, may be forwarded to the client 4.
  • a fourth example is given by Figures 10 and 1 1.
  • This fourth example is more general than the third example of Figure 8 and 9 in that it is the search platform 2 which performs the revalidation of pre-collected search results with confidence factor values below the confidence threshold with the primary data source 5.
  • the same principles as explained in the third example apply to the third example.
  • search platform 2 receives a search query 10 from client 4 ( Figures 10 and 1 1). Search platform 2 then performs a search in the database of pre-collected search results for search results corresponding to the search criteria included in the search query 10. Pre-collected search results having confidence factor values below the threshold are re-validated by search platform 2 with primary data source 5 by message 16. The search platform receives the validated search results from primary data source 5 with message 17 and, for example, consolidates the re-validated search results received from primary data source 5 with the pre-collected search results having confidence factors at or above the confidence threshold. Search platform 2 then transmits the consolidated search results to client 4 by message 19.
  • message 19 might be a single message including all search results to be returned to client 4 or message 19 might be split up into several messages, e.g. into messages 19a ( Figure 1 1) carrying the pre-collected search results at or above the confidence threshold (as they are available earlier than the re-validated pre-collected search results below the threshold) and further messages 19b ( Figure 11) carrying the search results re-validated with primary data source 5 (as they are available only at a later point of time).
  • a fifth example is given by Figures 12 and 13.
  • the fifth example is a further variation of the third and fourth example.
  • the search platform 2 first returns the pre-collected search results complying with the at least one search criteria included in the search query 10 and only validates these pre-collected search results below the confidence threshold in parallel and/or subsequently.
  • the search platform 2 then returns the validated search results to the client 4, thereby updating the initially returned pre-collected search results below the confidence threshold with the corresponding validated search results and, thus, increasing the probability of these search results being valid.
  • search platform 2 receives a search query 10 from client 4 ( Figures 12 and 13). Search platform 2 then performs a search in the database of pre-collected search results for search results corresponding to the search criteria included in the search query 10. Search platform 2 then returns all pre-collected search results, irrespective the pre-collected search results' confidence factor values (below, at, or above the threshold) to the client 4 by message 20 (again, message 20 may include one or more individual sub-messages). The pre-collected search results below the confidence threshold are, however, re-validated by search platform 2 with primary data source 5 by message 16 in a similar manner as in the third example or in the fourth example.
  • the search platform 2 receives the validated search results from primary data source 5 with message 17. Search platform 2 then transmits the validated search results to client 4 by message 21. Client 4 processes the validated search results and updates the corresponding pre-collected search results below the confidence threshold initially received from the search platform 2 with the validated search results (e.g. by overwriting the pre-collected search results below the confidence threshold initially received from the search platform 2 with the validated search results and displaying the updated search results to the user).
  • the validation processes formed by messages 16, 17 and 21 may occur incrementally and in parallel or subsequently to returning the initial pre-collected search results by message 20.
  • messages 16, 17 and 21 may be subdivided into a plurality of sub-messages as already explained above with reference to Figures 8 and 9.
  • a validation control as also described with reference to Figure 8 and 9 may be employed, e.g. the validation process of messages 16, 17 and 21 may be capped to a given amount of validation time or computation / collection resources.
  • the validation control may employ an ordered validation process, e.g. by validating the pre-collected search results below the confidence threshold in ascending order in terms of the confidence values of the pre-collected search results.
  • Figure 14 shows an application example of the database system 1.
  • This application example relates to a database system used in the travel industry. More specifically, in this embodiment, the computation platform 3 maintains data on air travel offers.
  • a plurality of search platforms 2 store prices related to these air travel offers which the computation platform 3 calculates on the basis of calculation rules, in particular flight fares and their associated calculation rules.
  • the computation platform 3 may be a Massive Computation Platform (MCP) as disclosed by EP 2521074 Al .
  • MCP Massive Computation Platform
  • the search platforms 2 and the MCP 3 are coupled via communication links which are utilized to transmit pre-computed priced travel recommendations from the MCP 3 to the search platforms 2.
  • the database system 1 includes a re-computation controller 7 which is responsible for monitoring the validity of the pre-computed priced travel recommendations stored in the search platforms 2 and for deciding which pre-computed priced travel recommendations are to be re-computed by MCP 3.
  • the re- computation controller 7 employs a probabilistic model for tracking the validity probabilities of the pre-computed priced travel recommendations stored in the search platforms 2.
  • the probabilistic model may be based on the parameters as described above with reference to Figure 3.
  • re-computation controller is equipped with several communication interfaces in order to input statistical data for estimating change rates of travel recommendations as well as to recognize external events such as fare changes, customer promotions and flight availability changes.
  • the confidence factor values of the pre-computed priced travel recommendations are maintained centrally by the re- computation controller 7 for all search platforms 2.
  • search platform 2 request confidence factor values associated with the pre- computed priced travel recommendations fulfilling the search criteria included in the search queries from re-computation controller 7 via interface 30.
  • re- computation controller 7 performs the appropriate processing (for example, calculates ⁇ ⁇ ⁇ ⁇ ' ⁇ for each requested pre-collected search result on the basis of the respective values of the validity rate ⁇ , the timestamp TS and TS S ) and returns the requested confidence factor values to the search platform 2.
  • each search platform 2 may maintain the confidence factor values associated with the stored pre-computed priced travel recommendations, for example as shown by Figure 3 and described above.
  • the search platform 2 either are equipped with the aforementioned communication interfaces in order to maintain the probabilistic model by themselves.
  • the confidence factor values stored in the search platforms 2 may also be updated by the re-computation controller 7 e.g. on a periodic basis.
  • the search platforms 2 may implement various applications. For example, a pre-shopping application serves as an unbinding information platform by which the clients 4 can obtain information about flight routes, flight schedules and prices, hotel room availability, rental car services, etc. without having to make an actual reservation.
  • Another application may be an advertisement banner application which provides data for travel advertisement banner to Internet websites being subscribed to such banner advertisement.
  • the banner content is dynamically loaded from a banner application search platform 2 in response to banner search queries automatically generated by client 4.
  • the dynamically loaded banner content may depend on interests of the user determined e.g. by cookies or browsing history data of client 4.
  • the first example of filtering pre- collected search results as described with reference to Figure 4 and 5 may be suitable because it may not be necessary avoid holes in the priced travel advertisements. Rather, short response times of the advertisement banner may be more important.
  • the confidence factor associated with the pre- collected search results that are actually returned to the client 4 may be transmitted to client 4 along with the actual search results.
  • confidence factor values of 100% may be returned to client 4.
  • Client 4 may be arranged to process the confidence factor values (which are all at or above the confidence threshold), for example, to indicate the varying confidence of the various search results to the user. This indication may, for example, be realized by the client 4 by grouping the received search results into classes of different confidence intervals, e.g. search results with a confidence factor value of 100% (i.e. the revalidated search results), search results with a confidence factor between 95% and 100% and further search results with a confidence factor below 95%, but still above the confidence threshold.
  • Fig. 15 is a diagrammatic representation of a computer system which provides the functionality of the search platform 2.
  • the search platform 2 includes a processor 101 , a main memory 102 and a network interface device 103, which communicate with each other via a bus 104.
  • the search platform 2 may further include a static memory 105 and a disk-drive unit 106.
  • a video display 107, an alpha-numeric input device 108 and a cursor control device 109 may form a distribution list navigator user interface.
  • the network interface device 103 is wired and/or wireless interface which connects the data search platform 2 to the computation/collection platform 3, the sources of statistical data needed to fill up the probabilistic model such as a statistics search platform, the Internet and/or any other network.
  • the network interface device 103 utilizes either standard communication protocols such as the HTTP/TCP/IP protocol stack, IEEE 802.1 1 and/or proprietary communication protocols.
  • a set of instructions (i.e. software) 1 10 embodying any one, or all, of the methodologies described above, resides completely, or at least partially, in or on a machine-readable medium, e.g. the main memory 102 and/or the processor 101.
  • the instructions may implement the search platform's capabilities to process incoming search queries 10, to perform database lookups among the pre-collected search results and to generate and transmit messages like response messages 1 1 , 12, 14 and 20 as well as request message 17.
  • a machine-readable medium on which the software 1 10 resides may also be a non- volatile data carrier 1 1 1 (e.g. a non-removable magnetic hard disk or an optical or magnetic removable disk) which is part of disk drive unit 106.
  • the software 1 10 may further be transmitted or received as a propagated signal 1 12 via the Internet through the network interface device 103.
  • Client 4 may reside in a stationary computer or a mobile device such as a smartphone, a cell phone, a laptop, a tablet computer or the like which may be of a similar structure as shown by Figure 15. Accordingly, the instructions 1 10 embodied in the processor/memory implement the client's functionality to generate and transmit search query 10 and receive, process response messages 1 1 , 12, 14, 19 and 20 and display the search results received from search platform 2 and/or switch 6.
  • switch 6 may be included in the search platform 2 or may be provided as a separate hardware entity. In the latter case, switch 6 may also be of similar structure as shown by Figure 15.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method, a search platform, a system and a storage medium for handling queries in a database system are disclosed. The database system includes at least one client and at least one search platform. The search platform maintains pre-collected search results which are associated which confidence factors. A confidence factor indicates a probability of the associated pre-collected search result being valid. The search platform receives a query indicating at least one search criterion to the search platform. The confidence factors associated with the identified pre-collected search results are utilized to increase the mean probability of pre-collected search results returned to the client of being valid. For example, pre-collected search results complying with the at least one search criterion and being associated with confidence factors having values exceeding a given threshold are returned to the client.

Description

INCREASING SEARCH RESULT VALIDITY
FIELD OF THE INVENTION
The present invention generally relates to database technology. More specifically, the present invention is directed to a mechanism to increase validity or confidence of search results retrieved from a pool of pre-computed search results.
BACKGROUND
A common object in database technology is to ensure short response times to database queries which require processing large volumes of data. For example, such computing-power consuming processing has to be performed in response to so-called "open queries" which contain only little input information (e.g. only one or two parameters out of a dozen possible parameters are specified and/or the specified value ranges of the parameters are broad). Consequently, such open queries lead to a large number of results in general. Possibilities to speed up data processing by increasing hardware performance are limited. Thus, attention is drawn to improving the mechanisms underlying the processing of large data volumes.
One approach to shorten query times is to pre-compute or pre-collect expected results to queries and to maintain the corresponding query results in a pool of pre-computed or pre- collected results. Queries are then actually not processed on the large and distributed and/or complex-to-calculate data basis, but are directed to the pool. For example, this approach is employed by Internet search engines which utilize automated robots or crawlers to collect content of web servers and store this pre-collected content in a search engine repository. Internet search queries are then answered on the basis of the repository instead of retrieving the web servers' primary content at search query time.
A disadvantage of this approach is, however, that the pre-computed or pre-collected query results get outdated if the underlying primary data changes. In this case, the pool of pre- computed or pre-collected results returns incorrect results to the inquiring client.
One solution to remedy this issue is directed to improving the validity or correctness of the pre-computed or pre-collected query results by optimizing the re-computation or re-collection strategy, for example, by re-computing or re-collecting these query results with priority which are likely outdated. Such strategies are, for example, described by International patent application PCT/EP2013/002390, EP 2541473 Al and US 2009/0234682 Al . However, a 100% all-time validity or correctness of the pre-computed or pre-collected query results is unachievable.
US 7,562,027 Bl discloses a travel planning system dealing with seat availability. In terms of hardware, the system is implemented by a computer system. In terms of software, the system includes a scheduling process and an availability process. The scheduling process provides a set of instances of transportation that satisfies a user query. The availability process accesses seat availability information from multiple information sources. The availability determines quality properties such as confidence, precision and/or validity of availability information retrieved from a first information source and determines whether the first information source is reliable. If the first information source is not reliable, the availability process executes a second set of seat availability queries to the first information source or a different information source to provide a second set of instances of transportation for which a seat is available. According to another embodiment of US 7,562,027 B l , some sources of availability data include measures of availability confidence of the results (e.g., "a seat of Q is available with 80% certainty"). This certainty information, if present, is passed to the client for processing and display for informative purposes to the user. The client can also be programmed to filter out seats and flights with less than a specified probability of being available.
SUMMARY OF THE INVENTION
It is an object of the present invention to increase the validity of search results which have been pre-collected or pre-computed and are returned to a client in response to a search query. Generally, the present invention takes a different approach than the strategies of increasing the validity of pre-computed or pre-collected search results by optimizing the re-computation or re-collection process which is performed asynchronous to the occurrence of search queries. Rather, the present invention proposes a mechanism to estimate the validity of pre-computed or pre-collected search results and to utilize this validity estimation in order to return pre- computed or pre-collected search results to the client which are probably valid. According to the present invention, a method of handling queries in a database system is provided. The database system has at least one client and at least one search platform. The search platform maintains pre-computed search results which are associated which confidence factors. A confidence factor indicates a probability of the associated search result being valid. A query indicating at least one search criterion is received by the search platform. The search platform utilizes the confidence factors associated with the identified pre-collected search results to increase the mean probability of pre-collected search results of being valid returned to the client.
In some embodiments, the search platform utilizes the confidence factors associated with the identified pre-collected search results by returning only pre-computed search results which are associated with confidence factors having values exceeding a given threshold.
According to another aspect, a respective search platform for handling queries in a database system as described above is provided.
According to still another aspect, a database system including a search platform and a client for handling queries as described above is provided.
According to yet a further aspect, a non-transitory computer readable storage medium having computer program instructions stored therein, which, when executed on a computer system, perform the method as described above.
Further aspects are set forth in the dependent claims.
BRIEF DESCRIPTION OF THE FIGURES
The present invention will be described with reference to the accompanying figures. Similar reference numbers generally indicate identical or functionally similar elements.
Fig. 1 gives an overview of a system including at least one client and a server maintaining pre-computed search results. Fig. 2 visualizes the effect of the likelihood decreasing over time that pre-computed search results kept in the server are valid.
Fig. 3 shows a search platform maintaining pre-computed search results and associated confidence factor values.
Fig. 4 shows a first example of the search platform internally utilizing a filter on the basis of the confidence factor threshold.
Fig. 5 is a message sequence chart relating to the example of Fig.4.
Fig. 6 illustrates a second example according to which the search platform employs a confidence factor threshold as an additional search criterion.
Fig. 7 depicts a message flow of the example of Fig. 6.
Fig. 8 explains a third example according to which the confidence factor threshold is employed to perform a further query in a primary data source in response to the server's query results.
Fig. 9 presents a message sequence of the example of Fig. 8.
Fig. 10 shows a fourth example according to which the search platform performs a further query to a primary data source.
Fig. 1 1 is a message sequence chart relating to the example of Fig. 10.
Fig. 12 shows a fifth example according to which the search platform first returns pre- computed search results to the client and performs re- validation subsequently.
Fig. 13 is a message sequence chart relating to the example of Fig. 12.
Fig. 14 depicts an exemplary architecture example of a distributed travel-related database environment. Fig. 15 is an exemplary schematic view of the internal architecture of the search platform and/or the client.
DETAILED DESCRIPTION
Before turning to the detailed description with reference to Figures 4 to 15, some general aspects will be set forth first on the basis of Figures 1 to 3.
The present invention generally relates to handling search queries in a database system maintaining pre-computed or pre-collected search results. An exemplary database system 1 is shown by Figure 1. The database system 1 includes at least one, but generally a plurality of clients 4 and at least one search platform 2. To increase failure safety or performance, a plurality of search platforms 2 may be present. The at least one search platform 2 maintains pre-computed or pre-collected search results in order to decrease response times to answer search queries received by the clients 4. Hereinafter, the term "pre-collected" is used to cover any sort of pre-collection and pre-computation such as simple Internet crawlers collecting or copying the content of Internet web servers, but also complex and time-intensive computations of search results on the basis of underlying data as it is e.g. described for priced travel recommendation by PCT/EP2013/002390 and EP 2541473 Al . The term "database" is meant to encompass any types of structured information storage system such as standard stand-alone databases like SQL server or Oracle databases as well as complex, distributed and/or proprietary storage systems, relational databases including database management systems or object-oriented database systems.
The client 4 directs search queries to the search platform 2 including one or more search criteria or parameters. For example, if a search query is an Internet search, the search query might carry a search string, search text or search phrase as search criteria. A further search criterion may be the language of websites to be searched or an indication of a point of time of the first availability of the requested search string, search text or search phrase. According to another example, the search query is a database request for a product or service offered by a service provider platform such as an Internet book store or a travel provider. In that case, the search query might include e.g. an upper price limit or a price range for the service or product and desired characteristics of the product/service such as book title, travel origin and destination, etc.
The search platform 2 processes a search query received from the client 4 and performs a database search within the pre-collected search results. In turn, search platform 2 responds with one or more pre-collected search results fulfilling the search criteria included in the search query. The client 4 receives this response and presents the search results to the user.
The pre-collection of search results is performed by using computation/collection platform 3. Generally, search platform 2 or another control entity (cf. Figure 14) employs an appropriate re-collection strategy in order to update the pre-collected search results stored by search platform 2. To this end, search platform 2 or the other control entity generates and transmits re-collection orders to computation/collection platform 3. Computation/collection platform 3 executes the re-computation or re-collection, e.g. by requesting original data corresponding to the pre-collected search results from primary data sources. For the purpose of the present invention any suitable re-collection strategy for updating the pre-collected search results may be employed, e.g. the update strategies as described by PCT/EP2013/002390 which is incorporated by reference herein.
The present invention does not focus on improving the validity or correctness of pre-collected search results returned to client 4 in response to search queries by a particular re-collection strategy. Rather, the invention focuses on improving the validity or correctness of pre- collected search results which are actually returned to the client 4 at the time of an incoming search query. In essence, it is proposed to return only such pre-collected search results to the client 4 which have a certain likelihood of being valid, while it is refrained from returning pre-collected search results to client 4 which have a certain likelihood of being invalid.
To this end, the pre-collected search results maintained by search platform 2 are associated with confidence factors. For example, each pre-collected search result stored by search platform 2 has a corresponding confidence factor. Alternatively, one confidence factor may be associated with a plurality of pre-collected search results. A confidence factor indicates a probability of the associated pre-collected search result(s) being valid. In general, confidence factors associated with pre-collected search results are utilized in order to decide which pre- collected search results are returned to the client 4 in response to a search query and which pre-collected search results are not returned to the client 4 and/or are returned to the client 4 in a specific way.
In order to make this decision, the present invention utilizes the confidence factors in order to generally provide the client 4 with pre-collected search results having a higher probability of being valid than pre-collected search results which would have been returned to the client without utilizing the confidence factors. The confidence factors may be utilized in different ways in order to provide the client 4 with potentially more valid pre-collected search results, as will be described below.
In some embodiments, a confidence threshold is employed. This confidence threshold is either prescribed by the client 4. For example, the client 4 includes a threshold value (such as "at least 85%" or "at least 0,9" or "high" being likewise defined as "at least 0,9") in the search query when requesting search results from the search platform 2. The client 4 may also send dedicated asynchronous messages indicating a desired confidence threshold to the search platform 2. Search platform 2 stores these client-specific confidence threshold prescriptions and employs them each time a search query is received from client 4. Alternatively, in other embodiments, the confidence threshold is set by a third party such as the operator of the search platform 2. In this case, a single confidence threshold value may be applicable for all search queries received from all clients 4. The clients 4 may not have an influence on the confidence threshold employed by the search platform 2. Alternatively, the confidence threshold pre-set by the third party may act as a default value and clients 4 may be able to override this default value by an own client-specific prescription.
Generally, in some embodiment, irrespective of the way the confidence factor is set and whether or not the confidence factor is client-specific, the client 4 is provided with search results which are associated with a confidence factor value exceeding the confidence threshold, wherein "exceeding" may also include the case that the confidence factor value equals the confidence threshold. In some embodiments, the client 4 is only provided with pre- collected search results exceeding the confidence threshold. In other embodiments, the client 4 may initially also provided with pre-collected search results below the confidence threshold, while the search platform performs a validation of these pre-collected search results below the threshold and updates the tentatively returned pre-collected search results below the threshold with the corresponding validated search results. These mechanisms have the effect that pre- collected search results at the search platform 2 which have a higher likelihood of being invalid are either not returned at all to the client or updated with search results having a higher likelihood of being valid, thereby increasing the accuracy of the search results for the clients 4 while still maintaining the advantage of short response times due to the pre-collection of search results. Particular examples of arrangements for the client's provision with pre- collected search results exceeding the confidence threshold are given further below with reference to Figures 4 to 13.
This solution substantially differs from US 7,562,027 Bl which, on the one hand, classifies different data sources with different levels of confidence or reliability. On the one hand, the confidence measures of US 7,562,027 Bl ("a seat in Q is available with 80% certainty") is not a confidence factor in the sense of the present invention because the confidence measures of US 7,562,027 Bl do not indicate an estimation of the validity of pre-collected search results (i.e. the likely that the pre-collected search results still correspond to the original search results), but an estimation of the actual seat availability. Thus, the confidence measures of US 7,562,027 Bl are part of the user data, while the present confidence factors are control data e.g. maintained on a probabilistic model. As far as US 7,562,027 Bl describes these kinds of measures of confidence included in search results, these information are simply passed on the client and potentially displayed to the user which, as such, does not result in a client's provision with search results having a higher probability of being valid as achieved by the mechanisms disclosed herein. To increase the potential validity of search results displayed to the user, US 7,562,027 Bl proposes to filter out potentially invalid search results only at the client side, while the present invention advantageously utilizes the confidence factors downstream the client 4.
The function of the confidence factor to indicate a validity probability of pre-collected search results according to the present invention is exemplarily implemented by a probabilistic model utilizing the following parameters:
The age ti of a pre-collected search result refers to the time passed since the last re- computation or re-collected of this pre-collected search result by the computation/collection platform 3. The validity rate λί of the pre-collected search result i is a measure of how long the pre-collected search result i remains valid or how fast the pre-collected search result i becomes invalid due to changes of the underlying original data. This validity rate of a given pre-computed search result i is, for example, statistically derived from the occurrence and the outcomes of past (re-)computations or (re-)collections and comparisons of the re-collected search result with its previous state or values. For example, it has been determined that a particular pre-collected search result i has a validity rate of 10% per hour meaning that the probability of i being valid decreases by 10% every hour. At the time of its (re-)collection or (re-)computation, i is generally 100% valid. After one hour, i is valid with a probability of 90%. After two hours the validity of i is 81% (=90% decreased by another 10%). After three hours, i's probable validity is at 72.9%, and so on.
The validity rate λί may be employed to provide an estimate of the probability P for a pre- collected search result to stay valid after a given time: Ptunchanged after t) = e~1** .
t -X-t
This is also referred to as the probability the expected accuracy acci = e 1 or, more general, as the probability of a pre-collected search result being still valid.
Two exemplary functions of this probable validity or accuracy decreasing over time are depicted by Figure 2. The upper function represents a pre-collected search result which potentially remains more accurate (or, more correctly, stays at a higher probability of being valid over time) than another pre-computed search result associated with the lower function. For example, the pre-computed search result represented by the upper function has 70% probability of being still valid at 35 hours after its last re-collection, while the other pre- computed search result characterized by the lower function is only valid up to about 50% at 35 hours after its latest re-collection. Both functions may also represent whole sets of pre- collected search results and accordingly indicate proportions of the sets of pre-collected search results likely being valid at a time passed since the last re-collection of the set.
In some embodiments, the confidence factor values are derived from such probabilistic model modelling the validity of pre-collected search results over time. More specifically, in some embodiments, the probability of a pre-collected search result i being valid at a time t after a previous collection of the pre-collected search result i is given by β~λ,,' . As outlined before, λ, denotes a rate of the pre-collected search result i becoming invalid.
In some embodiments, the confidence factors β~λ,ι' associated with the pre-collected search results are maintained by the search platform 2 (or another entity) in form of stored values of the pre-collected search results' validity rate λ and the timestamps TS of the last re-collected or re-computation of the pre-collected search results. Thus, for a particular pre-collected search result i, search platform 2 stores the validity rate λ, and the timestamp TSi. These values are not changing over time, but are constant until the next re-collection of i. At search time, i.e. when a search query is received by search platform 2, the confidence factor value of pre-collected search result i e~k, is calculated from by using λ, and TSi, wherein t in e~l,t> results from TSS - TSi, TSS referring to the time of the search query's arrival at search platform 2. Thus, in these embodiments, the confidence factor is associated with the pre- collected search results by having the values of λ and TS stored for each pre-collected search result (or sets of pre-collected search results).
A basic setting of client 4 and search platform 2 is shown by Figure 3. Search platform 2 runs a database with pre-collected search results. As shown by Figure 3, the pre-collected search results include e.g. an index (visualized by "#" in Figure 3), the search result data (indicated by "Data" in Figure 3) including e.g. data fields which are defined as primary key values and secondary secondary key values as well as confidence factor values (referred to as "Conf. Factor" in Figure 3). As outlined before, in some embodiments, the confidence factor values are stored in the database in form of the values λ and TS being associated with each of the pre-collected search result. Client 4 directs a search query 10 to search platform 2. Search platform 2 processes the search query 10 and performs a search in the database in order to determine pre-collected search results fulfilling search criteria transmitted with the search query. Search platform 2 generally returns pre-collected search results by message 1 1 which meet the confidence threshold. In other embodiments, the confidence factors being associated with the pre-collected search results stored by the search platform 2 are not stored in the same database tables, partition or in the same database as the pre-collected search results, but are maintained in a separate database or station and retrieved from there by the search platform 2 at the time of processing a search query (cf. also Figure 12 discussed further below).
Now referring to the more detailed description of exemplary implementations of the mechanisms described in a more general manner above, Figures 4 and 5 illustrate a first example according to which search platform 2 ensures that client 4 receives only search results fulfilling the confidence threshold by filtering out pre-collected search results with confidence factor values below the confidence threshold. As previously described with reference to Figure 3, the client 4 generates and transmits a search query 10 to search platform 2. Search query 10 includes one or more search criteria such as a search string for an Internet search. Search platform 2 performs a database lookup in the database of pre-collected search results and retrieves pre-collected search results fulfilling the at least one search criterion passed over with the search query 10. As a sort of post-processing activity, the search platform 2 then filters out those pre-collected search results from the set of pre-collected search results resulting from the database lookup which do not meet the confidence threshold, i.e. which have confidence factor values below the confidence threshold. These filtered pre- collected search results are not returned to the client 4. Rather, the search platform 2 only returns these pre-collected search results uncovered by the database lookup which have confidence factor values at or above the confidence threshold. These results are returned to the client 4 by message 20 (Figure 4).
Figure 5 is a message sequence chart visualizing the message flow of this first implementation example. As indicated by Figure 5, the search query includes one or more search criteria, for example, in the exemplary case of a travel-related query, parameters for the travel the user is interested in such as an origin and destination pair and a timeframe for the travel (Figure 5: "criterion A", optional "criterion B"). Optionally, search query 10 also includes a values for the confidence threshold (Figure 5: "confidence threshold"). Alternatively, search platform 10 utilizes a predetermined confidence threshold in an autonomous manner, i.e. without receiving a confidence threshold in the search query 10. For example, search platform 2 is provided with a default value for the confidence threshold prior to receiving the search query 10.
Search platform 2 then performs the database lookup in its pool of pre-collected search results on the basis of the one or more search criteria which were received as content of the search query 10. Subsequently, search platform 2 filters out these search results with confidence factor values below the confidence threshold and returns, by message 20, only those pre- collected search results with confidence factor values at or above the confidence threshold.
As an advantage of this implementation example, client 4 is provided with search results in a similar fast manner than a normal query to search platform 2 without a utilization of the confidence factor as presented herein. On the other hand, the filter activity by search platform 2 may result in "holes" in the set of pre-collected search results produced by the search platform's database lookup. Depending on the value of the confidence threshold and the values of confidence factors of the retrieved pre-collected search results, a substantial part of potential search results the user is interested in may be missed due to the filter activity and not returned to client 4. This may nevertheless be acceptable for particular applications, for example, the retrieval of advertisement banners which are of potential interest to the user on web pages.
According to a second example, the search platform 2 utilizes the confidence threshold as an additional search criterion (Figures 6 and 7). Basically, in this second example, search platform 2 utilizes the confidence threshold as a search criterion in addition to the at least one search criterion included in the search query 10. Figure 6 visualizes the interaction between client 4 and search platform 2, while Figure 7 shows the message sequence between both entities.
Client 4 generates and transmits search query 10 to the search platform 2 in a similar manner as described in the first implementation example before.
After the search platform 2 has received the search query 10, the search platform 2 performs a lookup in its database of pre-collected search results. On the one hand, this database lookup is based on the at least one search criterion included in the search query 10 as in the first implementation example. On the other hand, however, the database lookup is also based on the confidence threshold which is either prescribed by the client (e.g. by being included in the search query 10) or is available internally in the search platform 2. In essence, the confidence threshold functions as an additional search criterion, i.e. the database lookup only retrieves such pre-collected search results which have associated confidence factors with values being at or above the confidence threshold. Other pre-collected search results which fulfil the at least one search criterion included in the search query 10, but not the confidence threshold, are not returned by the database lookup. The pre-collected search results uncovered by the database lookup are returned by the search platform 2 to the client with message 12 (cf. Figures 4 and 5).
This second example of utilizing the confidence threshold as an additional search criterion has the effect that "holes" in the set of resulting pre-collected search results as they may occur in the first example described above with reference to Figures 4 and 5 can be avoided. For example, the search query 10 is directed to find the cheapest flights from Munich to Paris within a three day time interval. The search platform 10 may generally operate in a manner that for each of the three days, the five cheapest flights stored as pre-collected data records are returned to the client 4. It may be the case that all of the five cheapest flights e.g. on the third day have a confidence factor value below the confidence threshold with the effect that they are not returned to the client 4 by the first example according to which the search platform 2 is arranged to filter out these pre-collected search results not fulfilling the confidence threshold. If, however, the confidence threshold is used as an additional search criterion, the search platform 2 determines the five cheapest flights on the third day which also fulfil the confidence threshold. Thus, client 4 is provided with search results for all three days satisfying the given confidence requirements.
Figures 8 and 9 present a third example according to which client 4 only receives search results satisfying the confidence threshold. According to this third example, search platform 2 includes a search platform server 2a as well as an entity located upstream the search platform server 2a, i.e. an intermediate element between the client 4 and the server 2a. This entity is herein referred to as switch 6. The client 4 directs its search query to server 2a. However, the search query is transmitted via switch 6 to the server 2a. More specifically, switch 6 receives search query 10 from client 4 and relays search query 10 to server 2a in form of message 13.
The server 2a then performs a database search on the basis of the stored pre-collected search results in accordance with the search criteria included in the search query 10. The server 2a then, by message 14, returns pre-collected search results fulfilling the search criteria together with the confidence factor values associated with these pre-collected search results. It is noted that the database search conducted by the server 2a is not limited to any pre-collected search results being associated with a certain confidence factor threshold. Rather, the server 2a returns pre-collected search results to the switch 6 irrespective of their associated confidence factor values.
It is then a function of switch 6 (forming a logical part of the search platform 2) to utilize the confidence factor threshold. As outlined above, the threshold may either be set by the client e.g. by including a threshold value in the search query 10 (or into any other message transmitted asynchronously to search query 10) or, alternatively, be autonomously set by switch 6, e.g. by utilizing a given default value. The switch 6 evaluates the confidence factor values of the pre-collected search results received from the server 2a by message 14. Pre- collected search results being associated with confidence factor values at or above the threshold are forwarded unchanged to the client 4 by message 15. Pre-collected search results having confidence factor values below the threshold are not forwarded to the client 4. Rather, the switch initiates a secondary database search at a primary data source 5 by messages 16 and 17. The primary data source may maintain original data which is not pre-collected. This secondary database search thus validates the pre-collected search results received from the server 2a with confidence factor values below the threshold. The validated search results received by switch 6 from the primary data source 5 with message 17 are thus 100% valid.
The switch 6 then returns the search results to the client 4 by messages 15 and 18. Note that message 15 may either by sent to the client 4 immediately after the respective pre-collected search results received by switch 6 from server 2 with message 14 have been recognized to be associated with confidence factor values at or above the threshold, while message 18 are only sent after the secondary database search with the primary data source 5 has been performed. Thus, in this setting, messages 15 and 18 are sent separately at different points of time. In another setting, message 15 may be held back by switch 6 until the validated search results have been received from the primary data source 5 with message 17. In this case, messages 15 and 18 are sent at the substantially same point of time. They may also be sent as a single combined message.
It is also possible to subdivide messages 15 and 18 into smaller messages, for example atomic messages each conveying a single search result. In this way, switch 6 is able to provide the client 4 with pre-collected search results having confidence factor values above the threshold and/or search results validated with the primary data source 5 in an incremental manner. Accordingly, client 4 might be arranged to display incrementally arriving search results in an incremental manner to the user.
Optionally, the validated search results may not only be forwarded to client 4, but also to server 2a for including the validated search results in the database of the server 2a. In this manner, the revalidation of the pre-collected search results below the confidence threshold are leveraged for future search queries as they may not require re-validation, but may have confidence factor values above the threshold and, thus, may be returned to client 4 without revalidation. The switch 6 and server 2a forming the search platform 2 may be implemented as an integrated entity or as separate elements or modules. For example, switch 6 may be implemented as a software module with the same hardware station of server 2a. In some embodiments, switch 6 is implemented by separate hardware. In this case, switch 6 may serve more than one server 2a and may therefore operate as a unified interface for a plurality of servers 2a.
A chronological message sequence and activities by the various entities is visualized by Figure 9. The process starts with search query 10 issued by client 4. The search query 10 includes at least one search criterion ("criterion A"). Generally, search query 10 will contain more than one search criterion, as indicated by italic "criterion B". For example, if search query 10 is a travel-related request such as a request for flight connections directed to server 4 being a travel recommendation search platform, search 10 might include e.g. the four search criteria or search parameters origin city (e.g. Nice), destination city (e.g. New York), outbound date (e.g. 27 December 2013) and return date (e.g. 6 January 2014). Optionally, search query 10 includes a value for the confidence threshold which switch 6 is going to apply.
Switch 6 receives the search query 10 and relays search query 10 to server 2a by message 13. In response to receiving message 13 server 2a performs a database lookup in the pool of pre- collected search results by using the search criteria included in search query 10 and message 13. By message 14, server 10 returns the retrieved pre-collected search results fulfilling the search criteria. These pre-collected search results include the associated confidence factor values.
Switch 6 receives the pre-collected search results with message 14 from server 2a and analyses the associated confidence factor values and compares the associated confidence factor values with the confidence threshold. Switch 6 forwards pre-collected search results with confidence factor values at or above the confidence threshold to the client 4 by return message 15. On the other hand, switch 6 requests validation of pre-collected search results having confidence factor values below the confidence threshold with primary data source 5. To this end, switch 6 sends request message 16 to primary data source 5. Request message 16 might contain the primary key values of the pre-collected search results to be validated in order to specifically request the pre-collected search results to be validated from the primary data source 5. Primary data source 5 looks up the request search results and returns the original and therefore valid search results to switch 6 with message 17. Finally, switch 6 forwards the validated search results to client 4 by message 18.
Note that the primary data source 5 may actually include more than one data source, e.g. a plurality of databases, web server, computation platforms, etc. Thus, messages 16 and 17 may be decomposed into several sub-messages which are sent to the plurality of primary data sources. Messages 16 and 17 may also formed by a plurality of sub-messages if the primary data source 5 is a single data source, e.g. in order to realize an incremental validation as explained next.
Optionally, in some embodiments, the switch 6 is additionally arranged to control the validation of the pre-collected search results below the confidence threshold in a more sophisticated manner. For example, switch 6 request validation of only a subset of the pre- collected search results below the confidence threshold, while other pre-collected search results below the confidence threshold are not validated (and are, in this example, not forwarded to the client 4 - in other examples, pre-collected search results with confidence factor values below the confidence threshold may also be forwarded to the client 4 although they have not been part of the validated subset). The subset is, for example, be formed by an available time for validation. Thus, for example, switch 6 performs the validation in an incremental way (e.g. a single request message 16 is decomposed into a plurality of validation requests which are serially sent to the primary data source 5 for every pre-collected search result to be validated) and stops sending requests 16 to the primary data source 5 after a given period of time. In some embodiments, this incremental validation is performed in an ordered manner, e.g. starting with the pre-collected search result(s) having the lowest confidence factor value and continuing with pre-collected search results(s) having greater confidence factor values. In this way, the validation controlled by switch 6 times out. In other embodiments, switch 6 indicates the time available for validation to the primary data source 5 and it is the primary data source 5 which stops the validation activity after the time is elapsed. The subset is, additionally or alternatively, be formed by a limit of the number of pre- collected search results to be requested from the primary data source 5 or by computation resources available at the primary data source 5. For example, switch 6 may be arranged to decide to only validate a given number of pre-collected search results (for example, 20 pre- collected search results) and request validation of that given number from the primary data source 5 while pre-collected search results below the threshold excessing the given number may be discarded by the switch 6 or, in other embodiments, may be forwarded to the client 4.
A fourth example is given by Figures 10 and 1 1. This fourth example is more general than the third example of Figure 8 and 9 in that it is the search platform 2 which performs the revalidation of pre-collected search results with confidence factor values below the confidence threshold with the primary data source 5. Apart from that, the same principles as explained in the third example apply to the third example.
Similar to the third example, search platform 2 receives a search query 10 from client 4 (Figures 10 and 1 1). Search platform 2 then performs a search in the database of pre-collected search results for search results corresponding to the search criteria included in the search query 10. Pre-collected search results having confidence factor values below the threshold are re-validated by search platform 2 with primary data source 5 by message 16. The search platform receives the validated search results from primary data source 5 with message 17 and, for example, consolidates the re-validated search results received from primary data source 5 with the pre-collected search results having confidence factors at or above the confidence threshold. Search platform 2 then transmits the consolidated search results to client 4 by message 19.
As described above for the third example, message 19 might be a single message including all search results to be returned to client 4 or message 19 might be split up into several messages, e.g. into messages 19a (Figure 1 1) carrying the pre-collected search results at or above the confidence threshold (as they are available earlier than the re-validated pre-collected search results below the threshold) and further messages 19b (Figure 11) carrying the search results re-validated with primary data source 5 (as they are available only at a later point of time).
A fifth example is given by Figures 12 and 13. The fifth example is a further variation of the third and fourth example. According to this fifth example, the search platform 2 first returns the pre-collected search results complying with the at least one search criteria included in the search query 10 and only validates these pre-collected search results below the confidence threshold in parallel and/or subsequently. The search platform 2 then returns the validated search results to the client 4, thereby updating the initially returned pre-collected search results below the confidence threshold with the corresponding validated search results and, thus, increasing the probability of these search results being valid.
Hence, similar to the third example and to the fourth example, search platform 2 receives a search query 10 from client 4 (Figures 12 and 13). Search platform 2 then performs a search in the database of pre-collected search results for search results corresponding to the search criteria included in the search query 10. Search platform 2 then returns all pre-collected search results, irrespective the pre-collected search results' confidence factor values (below, at, or above the threshold) to the client 4 by message 20 (again, message 20 may include one or more individual sub-messages). The pre-collected search results below the confidence threshold are, however, re-validated by search platform 2 with primary data source 5 by message 16 in a similar manner as in the third example or in the fourth example. The search platform 2 receives the validated search results from primary data source 5 with message 17. Search platform 2 then transmits the validated search results to client 4 by message 21. Client 4 processes the validated search results and updates the corresponding pre-collected search results below the confidence threshold initially received from the search platform 2 with the validated search results (e.g. by overwriting the pre-collected search results below the confidence threshold initially received from the search platform 2 with the validated search results and displaying the updated search results to the user).
The validation processes formed by messages 16, 17 and 21 may occur incrementally and in parallel or subsequently to returning the initial pre-collected search results by message 20. To this end, messages 16, 17 and 21 may be subdivided into a plurality of sub-messages as already explained above with reference to Figures 8 and 9. In addition, a validation control as also described with reference to Figure 8 and 9 may be employed, e.g. the validation process of messages 16, 17 and 21 may be capped to a given amount of validation time or computation / collection resources. Furthermore, the validation control may employ an ordered validation process, e.g. by validating the pre-collected search results below the confidence threshold in ascending order in terms of the confidence values of the pre-collected search results.
Figure 14 shows an application example of the database system 1. This application example relates to a database system used in the travel industry. More specifically, in this embodiment, the computation platform 3 maintains data on air travel offers. A plurality of search platforms 2 store prices related to these air travel offers which the computation platform 3 calculates on the basis of calculation rules, in particular flight fares and their associated calculation rules. In the example of Figure 14, the computation platform 3 may be a Massive Computation Platform (MCP) as disclosed by EP 2521074 Al . The search platforms 2 and the MCP 3 are coupled via communication links which are utilized to transmit pre-computed priced travel recommendations from the MCP 3 to the search platforms 2.
Furthermore, the database system 1 includes a re-computation controller 7 which is responsible for monitoring the validity of the pre-computed priced travel recommendations stored in the search platforms 2 and for deciding which pre-computed priced travel recommendations are to be re-computed by MCP 3. In the example of Figure 14, the re- computation controller 7 employs a probabilistic model for tracking the validity probabilities of the pre-computed priced travel recommendations stored in the search platforms 2. The probabilistic model may be based on the parameters as described above with reference to Figure 3. To this end, re-computation controller is equipped with several communication interfaces in order to input statistical data for estimating change rates of travel recommendations as well as to recognize external events such as fare changes, customer promotions and flight availability changes. In the example of Figure 14, the confidence factor values of the pre-computed priced travel recommendations are maintained centrally by the re- computation controller 7 for all search platforms 2. In the course of processing search queries from clients 4, search platform 2 request confidence factor values associated with the pre- computed priced travel recommendations fulfilling the search criteria included in the search queries from re-computation controller 7 via interface 30. In response to this request, re- computation controller 7 performs the appropriate processing (for example, calculates β ~λ·'· for each requested pre-collected search result on the basis of the respective values of the validity rate λ, the timestamp TS and TSS) and returns the requested confidence factor values to the search platform 2. Alternatively, each search platform 2 may maintain the confidence factor values associated with the stored pre-computed priced travel recommendations, for example as shown by Figure 3 and described above. In this case, the search platform 2 either are equipped with the aforementioned communication interfaces in order to maintain the probabilistic model by themselves. In other embodiments, the confidence factor values stored in the search platforms 2 may also be updated by the re-computation controller 7 e.g. on a periodic basis. As also indicated by Figure 14, the search platforms 2 may implement various applications. For example, a pre-shopping application serves as an unbinding information platform by which the clients 4 can obtain information about flight routes, flight schedules and prices, hotel room availability, rental car services, etc. without having to make an actual reservation. Another application may be an advertisement banner application which provides data for travel advertisement banner to Internet websites being subscribed to such banner advertisement. Whenever a client 4 retrieves an Internet website hosting advertisement banners, the banner content is dynamically loaded from a banner application search platform 2 in response to banner search queries automatically generated by client 4. The dynamically loaded banner content may depend on interests of the user determined e.g. by cookies or browsing history data of client 4. For this application, the first example of filtering pre- collected search results as described with reference to Figure 4 and 5 may be suitable because it may not be necessary avoid holes in the priced travel advertisements. Rather, short response times of the advertisement banner may be more important.
As another additional optional functionality, the confidence factor associated with the pre- collected search results that are actually returned to the client 4 may be transmitted to client 4 along with the actual search results. In the case of re-validated search results returned to the client 4 (third, fourth and fifth examples, Figures 8 to 13), confidence factor values of 100% may be returned to client 4. Client 4 may be arranged to process the confidence factor values (which are all at or above the confidence threshold), for example, to indicate the varying confidence of the various search results to the user. This indication may, for example, be realized by the client 4 by grouping the received search results into classes of different confidence intervals, e.g. search results with a confidence factor value of 100% (i.e. the revalidated search results), search results with a confidence factor between 95% and 100% and further search results with a confidence factor below 95%, but still above the confidence threshold.
Finally, Fig. 15 is a diagrammatic representation of a computer system which provides the functionality of the search platform 2. Within the search platform 2 a set of instructions, to cause the computer system to perform any of the methods performed by the search platform as discussed herein, may be executed. The search platform 2 includes a processor 101 , a main memory 102 and a network interface device 103, which communicate with each other via a bus 104. Optionally, the search platform 2 may further include a static memory 105 and a disk-drive unit 106. A video display 107, an alpha-numeric input device 108 and a cursor control device 109 may form a distribution list navigator user interface. The network interface device 103 is wired and/or wireless interface which connects the data search platform 2 to the computation/collection platform 3, the sources of statistical data needed to fill up the probabilistic model such as a statistics search platform, the Internet and/or any other network. The network interface device 103 utilizes either standard communication protocols such as the HTTP/TCP/IP protocol stack, IEEE 802.1 1 and/or proprietary communication protocols. A set of instructions (i.e. software) 1 10 embodying any one, or all, of the methodologies described above, resides completely, or at least partially, in or on a machine-readable medium, e.g. the main memory 102 and/or the processor 101. Among others, the instructions may implement the search platform's capabilities to process incoming search queries 10, to perform database lookups among the pre-collected search results and to generate and transmit messages like response messages 1 1 , 12, 14 and 20 as well as request message 17. A machine-readable medium on which the software 1 10 resides may also be a non- volatile data carrier 1 1 1 (e.g. a non-removable magnetic hard disk or an optical or magnetic removable disk) which is part of disk drive unit 106. The software 1 10 may further be transmitted or received as a propagated signal 1 12 via the Internet through the network interface device 103.
Client 4 may reside in a stationary computer or a mobile device such as a smartphone, a cell phone, a laptop, a tablet computer or the like which may be of a similar structure as shown by Figure 15. Accordingly, the instructions 1 10 embodied in the processor/memory implement the client's functionality to generate and transmit search query 10 and receive, process response messages 1 1 , 12, 14, 19 and 20 and display the search results received from search platform 2 and/or switch 6.
As described above, switch 6 may be included in the search platform 2 or may be provided as a separate hardware entity. In the latter case, switch 6 may also be of similar structure as shown by Figure 15.
The present approach of utilizing confidence factor values associated with pre-collected search results and confidence thresholds allow increasing the reliability of pre-collected/pre- computed search results provided to clients at search time. It can be advantageously combined with an improved strategy of re-computing/re-collecting the pre-computed/pre-collected search results as, for example, described by PCT/EP2013/002390. Although certain products and methods constructed in accordance with the teachings of the invention have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all embodiments of the teachings of the invention fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.

Claims

1. A method of handling queries in a database system, the database system comprising at least one client and at least one search platform, the search platform maintaining pre- collected search results being associated which confidence factors, wherein a confidence factor indicates a probability of the associated pre-collected search result being valid, the method comprising:
receiving, at the at least one search platform, a query indicating at least one search criterion;
utilizing, with the at least one search platform, the confidence factors associated with the identified pre-collected search results to increase the mean probability of pre-collected search results returned to the client of being valid, the returned pre-collected search results complying with the at least one search criterion.
2. The method of claim 1 , wherein utilizing the confidence factors comprises:
identifying, at the at least one search platform, pre-collected search results complying with the at least one search criterion and being associated with confidence factors having values exceeding a given threshold.
3. The method of claim 1 or claim 2, wherein the confidence factor values are derived from a probabilistic model modelling a probability of validity of pre-collected search results over time.
4. The method of any of claim 1 to 3, wherein the probability of a pre-collected search result i being valid at a time t after a previous collection of the pre-collected search result i is given by e-^", wherein denotes a decrease rate of the probability of the pre-collected search result i being valid and t, denotes a time of since a last recollection of the pre-collected search result i.
5. The method of claim 4, wherein e~Aiti is calculated in response to the query based on a stored value of λί and a difference between a stored value of a time of the last recollection of the pre-collected search result i and a current time.
6. The method of claim 4 or claim 5, wherein a value of
Figure imgf000025_0001
is derived from past recollections of the pre-collected search result i.
7. The method of any of claims 2 to 6, wherein the threshold is one of prescribed by the client.
8. The method of any of claims 2 to 6, wherein the threshold is set autonomously by the search platform.
9. The method of any of claims 2 to 8, wherein the pre-collected search results complying with the at least one search criterion and being associated with confidence factors having values exceeding the given threshold are identified by applying, at the search platform, the threshold as a further search criterion in addition to the at least one search criterion indicated by the query.
10. The method of any of the claims 2 to 8, wherein utilizing the confidence factors the further comprises:
validating pre-collected search results complying with the at least one search criterion and being associated with confidence factor values below the threshold with a primary data source returning more valid database query results;
returning the validated pre-collected search results corresponding to the query to the client.
1 1. The method of claim 10, further comprising:
returning the pre-collected search results complying with the at least one search criterion and being associated with confidence factor values below the threshold are to the client before validating the pre-collected search results complying with the at least one search criterion and being associated with confidence factor values below the threshold with the primary data source; and - updating the pre-collected search results complying with the at least one search criterion and being associated with confidence factor values below the threshold returned to the client at the client with the validated pre-collected search results corresponding to the query.
12. The method of claim 9 or claim 10, wherein the pre-collected search results complying with the at least one search criterion and being associated with confidence factor values below the threshold are validated in an ascending order in terms of the confidence factor values.
13. The method of any of claims 1 to 8, wherein utilizing the confidence factors comprises filtering out pre-collected search results complying with the at least one search criterion and being associated with confidence factor values below the threshold.
14. A search platform within a database system, the search platform being arranged to:
maintain pre-collected search results being associated which confidence factors, wherein a confidence factor indicates a probability of the associated search result being valid,
receive a query from a client, the query indicating at least one search criterion, utilize the confidence factors associated with the identified pre-collected search results to increase the mean probability of pre-collected search results returned to the client of being valid, the returned pre-collected search results complying with the at least one search criterion.
15. The search platform of claim 13 being arranged to identify pre-collected search results complying with the at least one search criterion and being associated with confidence factors having values exceeding a given threshold.
16. The search platform of claim 14 being arranged- to apply the threshold as a further search criterion in addition to the at least one search criterion indicated by the query.
17. The search platform of claim 14 or claim 15 being arranged to:
validate pre-collected search results complying with the at least one search criterion and being associated with confidence factor values below the threshold with a primary data source returning more valid database query results;
return the validated pre-collected search results corresponding to the query to the client.
18. The search platform of claim 14 or claim 15 being arranged to filter out pre-collected search results complying with the at least one search criterion and being associated with confidence factor values below the threshold.
19. A database system comprising at least one client and at least one search platform,
the client being arranged to transmit a query indicating at least one search criterion to the at least one search platform,
the at least one search platform being arranged to
o maintain pre-collected search results being associated which confidence factors, wherein a confidence factor indicates a probability of the associated search result being valid, and to o utilize the confidence factors associated with the identified pre- collected search results to increase the mean probability of pre- collected search results returned to the client of being valid, the returned pre-collected search results complying with the at least one search criterion.
20. Non-transitory computer readable storage medium having computer program instructions stored therein, which when executed on a computer system cause the computer system to
maintain pre-collected search results being associated which confidence factors, wherein a confidence factor indicates a probability of the associated search result being valid,
receive a query from a client, the query indicating at least one search criterion, utilize the confidence factors associated with the identified pre-collected search results to increase the mean probability of pre-collected search results returned to the client of being valid, the returned pre-collected search results complying with the at least one search criterion.
PCT/EP2015/000249 2014-02-13 2015-02-06 Increasing search result validity WO2015120968A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US14/179,815 2014-02-13
US14/179,815 US9984165B2 (en) 2014-02-13 2014-02-13 Increasing search result validity
EP14290034.9 2014-02-13
EP14290034.9A EP2908255B1 (en) 2014-02-13 2014-02-13 Increasing search result validity

Publications (1)

Publication Number Publication Date
WO2015120968A1 true WO2015120968A1 (en) 2015-08-20

Family

ID=52468966

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/000249 WO2015120968A1 (en) 2014-02-13 2015-02-06 Increasing search result validity

Country Status (1)

Country Link
WO (1) WO2015120968A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928788A (en) * 2019-11-22 2020-03-27 泰康保险集团股份有限公司 Service verification method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7562027B1 (en) * 1999-11-01 2009-07-14 Ita Software, Inc. Availability processing in a travel planning system
EP2249261A1 (en) * 2009-05-08 2010-11-10 Comcast Interactive Media, LLC Recommendation method and system
EP2541473A1 (en) * 2011-06-27 2013-01-02 Amadeus S.A.S. Method and system for a pre-shopping reservation system with increased search efficiency
WO2013160721A1 (en) * 2012-04-26 2013-10-31 Amadeus S.A.S. Database system using batch-oriented computation
WO2014026753A1 (en) * 2012-08-14 2014-02-20 Amadeus S.A.S. Updating cached database query results

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7562027B1 (en) * 1999-11-01 2009-07-14 Ita Software, Inc. Availability processing in a travel planning system
EP2249261A1 (en) * 2009-05-08 2010-11-10 Comcast Interactive Media, LLC Recommendation method and system
EP2541473A1 (en) * 2011-06-27 2013-01-02 Amadeus S.A.S. Method and system for a pre-shopping reservation system with increased search efficiency
WO2013160721A1 (en) * 2012-04-26 2013-10-31 Amadeus S.A.S. Database system using batch-oriented computation
WO2014026753A1 (en) * 2012-08-14 2014-02-20 Amadeus S.A.S. Updating cached database query results

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Modeling the Internet and the Web: Probabilistic Methods and Algorithms", 1 January 2003, WILEY, ISBN: 978-0-47-084906-4, article BALDI ET AL: "Modeling the Internet and the Web: Probabilistic Methods and Algorithms", XP055028469 *
MICHAEL FRASCA ET AL: "Can models of scientific software-hardware interactions be predictive?", PROCEDIA COMPUTER SCIENCE, vol. 4, 14 May 2011 (2011-05-14), pages 322 - 331, XP028269543, ISSN: 1877-0509, [retrieved on 20110514], DOI: 10.1016/J.PROCS.2011.04.034 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928788A (en) * 2019-11-22 2020-03-27 泰康保险集团股份有限公司 Service verification method and device
CN110928788B (en) * 2019-11-22 2023-09-19 泰康保险集团股份有限公司 Service verification method and device

Similar Documents

Publication Publication Date Title
EP2908255B1 (en) Increasing search result validity
US9984165B2 (en) Increasing search result validity
KR101916837B1 (en) Database system using batch-oriented computation
CA2875735C (en) Updating cached database query results
US9235620B2 (en) Updating cached database query results
US9953332B2 (en) Method and system for generating delivery estimates
US10628502B2 (en) Graph server querying for managing social network information flow
RU2612583C2 (en) Marketplace for timely event data distribution
AU2012378630B2 (en) Categorizing and ranking travel-related search results
US20130290324A1 (en) Categorizing and ranking travel-related database query results
US20160171008A1 (en) Updating cached database query results
EP3128441B1 (en) Handling data requests
Li et al. Top-$ k $ k Vehicle Matching in Social Ridesharing: A Price-Aware Approach
US10740824B2 (en) Product delivery system and method
WO2015120968A1 (en) Increasing search result validity
CN112016008A (en) Accurate induction system of urban rail transit passenger flow under multi-scene
EP2698729B1 (en) Updating cached database query results
EP3540606B1 (en) Product delivery system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15703876

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15703876

Country of ref document: EP

Kind code of ref document: A1