WO2015124275A1 - Validite a long terme de resultats de recherche precalcules - Google Patents

Validite a long terme de resultats de recherche precalcules Download PDF

Info

Publication number
WO2015124275A1
WO2015124275A1 PCT/EP2015/000287 EP2015000287W WO2015124275A1 WO 2015124275 A1 WO2015124275 A1 WO 2015124275A1 EP 2015000287 W EP2015000287 W EP 2015000287W WO 2015124275 A1 WO2015124275 A1 WO 2015124275A1
Authority
WO
WIPO (PCT)
Prior art keywords
computation
computed search
computed
search results
search result
Prior art date
Application number
PCT/EP2015/000287
Other languages
English (en)
Inventor
Guillaume Legrand
Damien Ciabrini
Original Assignee
Amadeus S.A.S.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from EP14290040.6A external-priority patent/EP2911070B1/fr
Priority claimed from US14/183,911 external-priority patent/US9582536B2/en
Application filed by Amadeus S.A.S. filed Critical Amadeus S.A.S.
Publication of WO2015124275A1 publication Critical patent/WO2015124275A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Definitions

  • the present invention generally relates to database technology. More specifically, it is directed to a strategy for keeping pre-computed search results stored as database records in a database up-to-date.
  • a common problem in database technology is to ensure short response times to database queries which require processing large volumes of data. For example, such computing-power consuming processing has to be performed in response to so-called "open queries" which contain only little input information (e.g. only one or two parameters out of a dozen possible parameters are specified and/or the specified value ranges of the parameters are broad) and, consequently, lead to a large number of results in general. Possibilities to speed up data processing by increasing hardware performance are limited. Thus, attention is drawn to improving the mechanisms underlying the processing of large data volumes.
  • One general approach to shorten query times is to pre-compute expected queries and to maintain the corresponding query results in a cache system. Queries are then actually not processed on the large data basis, but are directed to the cache system.
  • WO 01/33472 concerns an availability system used in a travel planning system.
  • the system includes a cache having entries of availability information regarding airline seats.
  • a cache manager manages entry information in the cache in order to keep information in the cache correct, current, complete or otherwise as useful as possible.
  • the cache manager determines if a stored answer is stale and, if this is the case, sends an availability query to a source of availability information/Cache entries to be modified are obtained by asynchronous notifications from external systems and determined by a deterministic, predictive or statistical model.
  • WO 02/25557 pertains to an information retrieval system wherein information received from information sources is cached for future use, such as for future client requests.
  • Proactive queries can be generated to populate a cache and/or to update presently cached information.
  • proactive queries are ordered on the basis of statistics or predictive indications such a nearness of departure time, the age of cached data, remaining seats in an aircraft, holidays or special events or equipment type.
  • updates are received by external notifications from airlines such as AVS messages.
  • WO 99/22315 describes a mechanism for automatically refreshing documents in a cache by using a statistics-based probabilistic model. For each document, the cache determines a probability Psi(t) that a cached object i is stale at a particular time t (i.e. the server has changed that object) and a probability Pri(h) that object i is requested by a user by request time h. The cache refreshes those objects with the highest product Pi - Psi(t) x Pri(h), i.e. the probability that an outdated object is returned to the user with the next request. To maintain these probability values, the cache maintains and tracks historical statistics for the cached objects such as an estimated mean interval between server updates EUI. EUI of an object is e.g. updated when the object itself is updated by the server or the object is not updated after its estimated mean refresh time has elapsed.
  • EUI of an object is e.g. updated when the object itself is updated by the server or the object is not updated after its estimated mean refresh
  • the present invention provides a method of re-computing pre- computed search results performed in a database environment.
  • the database environment is at least composed of at least one search platform maintaining pre-computed search results, a re- computation controller and a computation platform.
  • Long-term accuracy of the pre-computed search results is provided by the following activities:
  • the re-computation controller assigns a re-computation indicator to any of the pre-computed search results.
  • the re-computation indicator for a pre-computed search result i is based on at least the following factors:
  • the computation platform re-computes pre-computed search results having a re-computation indicator indicating the highest need for re-computation.
  • the number of pre-computed search results re-computed by the computation platform is limited by the computation platform's computation resources available for the re-computation within the given time interval.
  • a re-computation controller for employment in a database environment, the database environment comprising a search platform maintaining pre-computed search results and a computation platform.
  • the re-computation controller provides long-term accuracy of the pre-computed search results by being arranged to:
  • a non-transitory computer readable storage medium which as computer program instructions stored therein, which when executed on a computer system cause the computer system to perform these activities.
  • the computation resources to re-compute pre-computed search result i depend on whether or not other pre-computed search results related to the pre- computed search result i are re-computed during the given time interval and wherein the computation resources needed to re-compute pre-computed search result i are dynamically estimated depending on which other pre-computed search results related to the pre-computed search result i are selected for re-computation during the given time interval.
  • the pre-computed search results for re-computation by the computation platform within the given time interval are iteratively selected.
  • This iterative selection includes an estimating of the re-computation resources to re-compute the pre- computed search results which in turn comprises:
  • step d) proceeding with step b) if less than 100% of the computation platform's overall computation resources available for re-computation within the given time interval is exhausted.
  • Figure 1 schematically shows a distributed database environment.
  • Figure 2 illustrates a probabilistic model predicting decreasing accuracy of a pre-computed search result over time.
  • Figures 3a, 3b and 3c visualize the effects of a re-computation strategy focusing on recomputing volatile pre-computed search results.
  • Figures 4a and 4b visualize the effects of a re-computation strategy taking into account re- computation frequencies, re-computation costs for re-computing pre-computed search results and their search popularity.
  • Figure 5 depicts re-computation costs for different types of pre-computed search result sets.
  • Figure 6 shows an example of a database environment implementing the methods presented herein.
  • Figure 7 presents a view on an exemplary inner structure of the re-computation controller.
  • Figure 8 is an exemplary schematic view of the internal architecture of the query processing server.
  • search results corresponding to expected queries are generally pre-computed and stored as database records in a database.
  • This database is queried by requesting entities (such as clients, applications, browsers installed on user terminals, etc.) in the course of a search and pre-computed search results fulfilling search criteria indicated by the query are returned to the client in response to the query.
  • requesting entities such as clients, applications, browsers installed on user terminals, etc.
  • pre-computed search results fulfilling search criteria indicated by the query are returned to the client in response to the query.
  • search results is used as a general term including any type of information retrieval requests such as transactional queries, requests for batch computations and other forms.
  • Figure 1 illustrates such a database environment 1 on an abstract level.
  • Basic data hereinafter also referred to as “calculation data”
  • a computation platform 3 which is connected to a re-computation controller 2.
  • the latter issues re-computation orders to the computation platform 3 which, in turn, transmits the corresponding results to the search platform 4 and, in addition, to the re-computation controller 2 which also maintains the pre-computed search results for reasons of re-computation control.
  • End users 5 such as applications on user terminals access the pre-computed search results from the search platform 4.
  • one or several search platforms 4 may be present in environment 1.
  • the pre-computed search results may be maintained in a distributed manner over the several search platforms 4 and re- computation controller 2 may control the re-computation of all pre-computed search results distributed over search platforms 4.
  • the search platform 4 may also offer a heterogeneous set of pre-computed search requests, e.g. some search platforms 4 maintain pre-computed search requests relating to air travel, other search platforms 4 stored pre-computed search request related to insurances and other search platforms 4 keep pre-computed (or pre-crawled) search requests related to Internet websites.
  • Such a heterogeneous environment may be controlled by one single re-computation controller 2 or by a plurality of re-computation controllers 2.
  • the plurality of search platforms 4 may be utilized to mirror the same pre- computed search results, for example, for reasons of redundancy.
  • the approach of pre-computing search results and storing them in the search platform accessible to querying clients leads to the general situation that the calculation data may change over time and, thus, the pre-computed search results get outdated or invalid (both terms are used synonymously herein).
  • Pre-computed search results which are still up-to-date, i.e. which match the corresponding real-time computation equivalents (results which would be actually computed on demand without having pre-computed search results available), are called "accurate" pre-computed search results hereinafter.
  • the search platform keeping the pre-computed search results correctly represents the current state of the data domain underlying the cached query results, i.e. the calculation data the pre-computed search results stored in the search platform are - in general - accurate.
  • metrics are defined to evaluate how "necessary" or "unnecessary" a re-computation is. For instance, it is not worth reshooting an entire massive pre-computation every day if less than half of the computed query results turn out to be outdated. On the other hand, if particular classes of query results are known to change frequently, re-computing them several times per day might be beneficial for the accuracy. Consequently, an effective way of assessing or estimating search result accuracy is needed, generally taking into account both the associated gain on accuracy and the cost of re- computation.
  • re-computations of pre- computed search results are decided based on a predictive model which yields estimations of the accuracy of the pre-computed search results kept in the search platform.
  • This predictive model models the discrepancies between the pre-computed search results and presumed actual search results, i.e. it approximates the accuracy or inaccuracy of any pre-computed search result.
  • the model models, for example, the probable validity of the pre-computed search results over time. Presumptions on the pre-computed results' validity are concluded and extrapolated from past real- world experiences on the subject-matter of the respective data domain.
  • the underlying calculation data may belong to the domain of air travel and contain information on flights such as departure and destination airport, airline, departure and return dates, fares, booking classes and the like.
  • This air-travel related data is kept in the computation platform and is queried by customers in order to get knowledge of e.g. availability and prices of air flights or any other priced travel products/services.
  • Computing e.g. prices on the basis of the basic flight data is resource- and time-consuming.
  • the actual prices are pre-computed and stored in the search platform.
  • the probabilistic model models the validity of the travel recommendation prices over time. The required knowledge to build such a model can be taken from real-world experiences on the behavior and development of e.g. travel recommendation prices prior to the departure date.
  • the age ti of the pre-computed search result i the time since the last computation of this pre-computed search result by the computation platform 3.
  • the validity rate ⁇ of the pre-collected search result i is a measure of how long the pre-collected search result i remains valid or how fast the pre-collected search result i becomes invalid due to changes of the underlying original data.
  • This validity rate of a given pre-computed search result i is, for example, statistically derived from the occurrence and the outcomes of past (re-)computations or (re-)collections and comparisons of the re-collected search result with its previous state or values. For example, it has been determined that a particular pre-collected search result i has a validity rate ⁇ of 10% per hour meaning that the probability of i being valid decreases by 10% every hour. At the time of its (re-)collection or (re-)computation, i is generally
  • Function 10 represents a pre-computed search result which potentially remains more accurate (or, more correctly, stays at a higher probability of being valid over time) than another pre- computed search result associated with function 1 1.
  • the pre-computed search result represented by function 10 has 70% probability of being still valid at 35 hours after its last re-computation, while the other pre-computed search result characterized by function 1 1 is only valid up to about 50% at 35 hours after its latest re-computation.
  • Functions 10 and 1 1 may also represent whole sets of pre-computed search results and then indicate proportions of the sets of pre-computed search results likely being valid at a time passed since the last re-computation of the set.
  • pre-computed search results generated by the computation platform 3 may not necessarily be accurate even at computation time if e.g. the computation platform 3 itself bases its computations on cached (and therefore outdated) data. This leads to additional discrepancies between pre-computed search results computed by the computation platform 3 and computation results hypothetically been generated by accurate underlying data. This discrepancy may be measured if respective feedback is available It can be inferred e.g. from the previous computations that pre-computed search result i has a probability ai to be accurate at the time of computation by computation platform 3. It means that the probability for a
  • pre-computed search result to be accurate after a given time t is ®-i e 1 .
  • the accuracy of the overall pre-computed search results kept in the search platform 4 according to this exemplary model may then be considered as the mean accuracy ("global accuracy”):
  • the "popularity" pi of the pre-computed search result i this is the average access frequency to this pre-computed search result by the end users.
  • the accuracy of the whole sum of pre-computed search results in the database 2 as seen by the end users may also be defined in that each accuracy value is weighted by the popularity of the respective pre-computed search result.
  • the proportion of accurate accesses to the pre-computed search results as opposed to the expected proportion of accurate pre-computed search results is
  • the approach determines the set of pre-computed search results C to re-compute which:
  • Plot Ptnl This approach could be further refined by considering the computation costs (i.e. computing resources) required to re-compute a set of pre-computed search request. If the computation costs for re-computing pre-computed search request is denoted as c i , the highest gain-cost- ratio is sought in order to increase the "user accuracy" most efficiently, i.e. best increase with least utilization of computation resources. This gain-cost-ratio can be defined to be
  • the process for determining the pre-computed search results to be re-computed then includes the following two activities:
  • the re-computation controller 2 sorts the pre-computed search request by this gain- cost-ratio.
  • the re-computation controller 2 selects the top pre-computed search requests from this sorted list until their cumulated computation costs reaches the amount of computation resources R available at the computation platform 3, e.g. for a certain period of time forming a re-computation cycle.
  • FIG. 3a, 3b and 3c A behavior of the pre-computed search results' accuracy over time when employing this re- computation approach is indicated by Figures 3a, 3b and 3c. All three Figures, as well as the further Figures 4a and 4b, show graphs obtained by simulation on artificial pre-computed search results using a computation platform 3 equipped with an adequate amount of computation resources.
  • the term "global accuracy” refers to the average popularity of all the simulated pre-computed search results.
  • the number of re-computation cycles, which last for example 20 minutes each, is laid on the X axis.
  • the percentage of valid pre-computed search results is plotted on the Y axis.
  • Function 20 show the development of the pre-computed search results' validity when employing a na ' ive re- computation strategy, namely always re-compute those pre-computed search results which have not been re-computed for the longest period of time (the so-called "oldest" pre-computed search results).
  • the re-computation strategy is e.g. based on the formula . It is readily apparent ci
  • the gain-cost ratio approach does not significantly improve the global accuracy of the pre- computed search results.
  • the pre-computed search results' accuracy still stays above function 20, although insignificantly.
  • function 21 even falls below function 20 and remains only slightly above 70% accuracy throughout the following re-computation cycles.
  • the gain-cost ratio approach actually leads to a decreased validity of the pre-computed search results, compared to the nai ' ve re-computation strategy of always re-computing the "oldest" pre-computed search results.
  • Figure 3 b shows the effects of the gain-cost-ratio-oriented strategy in relation to the na ' ive re- computation strategy if the gain-cost-ratio-oriented strategy also considers the "popularity" of the pre-computed search results, i.e. the more popular pre-computed search results are recomputed more often than the less popular pre-computed search results (which is, as described above, reflected by the term "user accuracy” implying that the "user experience" is improved for which the pre-computed search results being more often requested than others are more important) .
  • the gain-cost-ratio-oriented strategy of Figure 3b is characterized e.g. by the formula p t .
  • the gain-cost-ratio- oriented re-computation strategy generally achieves an improved accuracy of the pre- computed search results than the naive re-computation (being shown by graph 22).
  • This improvement is caused by the fact that the na ' ive re-computation strategy does, by definition, not take into account the popularity of the pre-computed search results. This results in the "up and down" of graph 22 because the re-computing unpopular search result leads to a decrease of the "user accuracy".
  • the comparison between the na ' ive strategy and the gain-cost-ratio-oriented strategy considering popularity is biased to some extent.
  • Figure 3 c shows a cumulative accuracy distribution of all pre-computed search results stored in search platform 4 at the end of the simulation.
  • the X axis indicates the percentage of pre-computed search results
  • the Y axis again indicates the percentage of pre-computed search results being valid.
  • line 24 indicates the na ' ive "re-compute oldest first" strategy
  • graph 25 shows the strategy based on the gain-cost ratio (i.e. a value of 40% on the X axis and 0,7 on the Y axis indicates that 40% of the pre-computed search results have an accuracy of less than 70%).
  • Figures 3a, 3b and 3c convey the insight that the gain-cost-ratio-oriented re- computation strategy is not optimal. To the contrary, it generally results in a decreased average accuracy of pre-computed search results, compared with the na ' ive strategy to recompute the "oldest" pre-computed search results, when considering the long-term development. Hence, selecting pre-computed search results for re-computation by employing the gain-cost ratio apparently constitutes a short-term optimization only. The inventors have recognized that this strategy spends too much computing resources on the very volatile pre- computed search results, i.e. pre-computed search results which become invalid more often than others.
  • the gain- cost-ratio oriented strategy focuses on the left-hand side of Figure 3 c and re-computes the 18% or 20% of the pre-computed search results with the least accuracy more often than the other 80% with the effect that the 18% or 20% remain at a relatively moderate accuracy level, while the other 80% are virtually neglected.
  • the na ' ive algorithm leaves the 18% or 20% of the pre-computed search results with the least accuracy in a "bad state" (i.e. with very low accuracy down to 0% to 45%), but achieves a better average result for the other 80% (Figure 3c) and in long term (Figure 3a).
  • the re-computation controller 2 assigns a re-computation indicator to any of the pre-computed search results stored in the database 4.
  • This re-computation indicator indicates the priority for re-computation.
  • the re-computation indicator is formed in particular way, by generally taking into account the following factors:
  • an access frequency measure which indicates a request frequency from the database 4, i.e. the "popularity" as introduced above; a re-computation frequency;
  • the re-computation indicator of pre-computed search result i is based at least on the two factors of the probability that the pre-computed search result i is still valid and on the re- computation frequency of the pre-computed search result i.
  • the re-computation indicator for a particular pre- computed search result i is generated by weighting the probability that the pre-computed search result i is still valid ("expected accuracy") with the access frequency measure of the pre-computed search result i indicating the frequency of the pre-computed search result i being requested from the database 4 ("popularity") and by multiplying the re-computation frequency of the search result i with the measure for the computation resources needed to recompute pre-computed search result i.
  • both products are divided in order to form the re-computation indicator.
  • the re-computation controller 2 selects those pre-computed search results for re-computation which have a re-computation indicator indicating the highest need for re-computation.
  • the re- computation controller 2 After having identified the pre-computed search results being most critical to re-compute, the re- computation controller 2 issues a re-computation order to the computation platform 3 to recompute them within the next re-computation cycle.
  • the computation platform 3 executes this order in a batch-oriented manner and forwards the re-computed search results to the search platform 4.
  • the computation platform also returns the result of the re-computation back to the re-computation controller 2 at the same time. This enables the re-computation controller 2 to continuously assess the re-computation indicator of the pre-computed search results as they are currently stored in the database 4.
  • this particular pre-computed search result i is re-computed more often than another pre- computed search results i, for example, twice as often, it can be considered that re- computation of i is twice more expensive than the re-computation of i (assuming that a single re-computation of i and a single re-computation of i consumes the same amount of computation resources of the computation platform 3 - which is not necessarily the case as explained further below).
  • the expense of re-computing pre-computed search result i in terms of the relative number of re-computations can be thus defined as:
  • ComputationExpensei c, x fi where fi denotes a refresh frequency of pre-computed search result i.
  • the pre-computed search results with the highest immediate gain i.e. the gain-cost ratio as presented above
  • the pre-computed search results with the highest gain-expense ratio are selected for re-computation, i.e. with the highest ' ⁇ .
  • re-computation frequency oriented strategy by using this re-computation criteria is briefly referred to as "re-computation frequency oriented" strategy.
  • t-i can be used as an estimation of the re-computation expense.
  • the re-computation strategy being directed to a long term increase of the pre-computed search results' accuracy then selects pre-computed search results with highest re-computation indicator being defined
  • Figure 4a shows the user accuracy achieved by the re-computation frequency oriented strategy (indicated by graph 32) in comparison with the results yielded by the na ' ive strategy of always re-computing the oldest pre-computed search results (indicated by graph 30) and the gain-cost-ratio-oriented strategy including consideration of the "popularity" as explained
  • re-computation indicator is additionally based on the initial accuracy value 3 ⁇ 4 indicating the expected accuracy of the pre-computed search result i at the 1 time of its re-computation, as it has been introduced further above.
  • the initial accuracy value 3 ⁇ 4 indicating the expected accuracy of the pre-computed search result i at the 1 time of its re-computation, as it has been introduced further above.
  • Gj x tj Vi a i ⁇ - acct i ) ⁇ re-computation indicator is, for example, defined by c i Vtot c i
  • the re-computation indicator may be given in a more generic way, with a probabilistic model giving the probability of pre-computed search result i to be invalid as Pinvaiid(i), irrespective how Pinvaiid(i) is calculated or estimated.
  • the re-computation indicator is e.g. defined as
  • some embodiments may be directed to optimize the global accuracy without taking into account the end user perspective, i.e. the "popularity" of the pre-computed search results.
  • the re-computation indicator is given by
  • an initial accuracy factor is further taken into account
  • Other embodiments may neglect the re-computation resources required to re-compute a pre- computed search result. In particular, this applies to environments in which every the re- computation of any pre-computed search result requires the same amount of computation resources.
  • Some embodiments feature a further refined re-computation indicator directed to a re- computation strategy for long-term optimization of the user accuracy implementing the re- computation frequency oriented strategy as deduced above.
  • This refined re-computation indicator corresponds to the following expression:
  • a set of re-computation frequencies is defined if iedd °f a pre-computed search result i.
  • Corresponding periods between re-computation of pre-computed search result i are defined
  • the average user accuracy is then defined as
  • the re-computation strategy optimizing the user accuracy indicator can then be employed as follows:
  • the pre-computed search results with the highest ⁇ ( ⁇ 1 ⁇ 2) as defined here are to be recomputed first in order to have the ⁇ (tf) as equal as possible is an increasing function).
  • the refined re-computation indicator is
  • the re-computation frequency oriented strategy has so far been discussed by assuming that re-computation of any pre-computed search result by the computation platform 3 requires substantially the same amount of computation resources irrespective of whether the pre- computed search results are computed separately, together with adjacent search results (e.g. same origin, destination and adjacent dates) or any other non-adjacent search results.
  • adjacent search results e.g. same origin, destination and adjacent dates
  • this assumption cannot be made in general because, for example, certain pre-computed search results and/or their corresponding underlying calculation data are interrelated to each other. Re-computing such interrelated pre-computed search results together (i.e. within the same re-computation cycle) could include synergetic effects and may thus be more efficient than re-computing them separately.
  • the database 4 keeps travel-related pre-computed search results and makes them available to end users.
  • the following example is not supposed to limit the issue of interrelated pre-computed search results such a travel data application. Rather, similar or analog conditions allowing a synergetic and therefore more efficient re-computation of interrelated pre-computed search results are present in database systems independent from the content of pre-computed data sets.
  • any process of re-computing pre-computed search results will aim at a mutualization of re- computation sub-tasks that have to be executed commonly for any pre-computed search result of a set of pre-computed search results.
  • re-computing pre-computed search results together that have such re-computation task in common is generally favorable over recomputing pre-computed search requests together which do not share similar re-computation sub-tasks.
  • the pre-computed search requests are round-trip flight data records, each specifying a travel origin and destination and a departure and arrival date (or, alternatively to the arrival date, a stay duration relating to the departure date).
  • the database 4 contains pre-computed round-trip travel recommendations for any origin- destination pair and any departure-arrival-date pair to be covered.
  • Table 1 indicates a small excerpt from the pre-computed travel recommendations kept in database 4, the excerpt being travel recommendations for the city pair Nice-Boston ("NCE-BOS”) and for departure dates from 1 st July to 5 th July with maximum stay duration of five days, the abbreviation "pc-fr x" standing for "pre-computed travel recommendation number x".
  • NCE-BOS Dep JUL 1. Dep JUL 2. Dep JUL 3. Dep JUL 4. Dep JUL 5.
  • fares that can be applied to these flights on these dates.
  • a typical fare is rule which yields a price for the whole journey.
  • Fares have restrictions on the departure dates, on the return dates, on the flights to be applied on, and many others. Fares can be combined together, discounted in some specific cases and so on.
  • sub-task 1 does not need to be re-done for every pre-computed search result pc-fr 0 to pc-fr 24.
  • the sub-tasks 2a, 3a and sub-task 4 are, on the other hand, specific to one departure date. They can therefore be re-used for all pre-computed travel recommendations relating to one and the same departure date. Table 2 indicates this for the pre-computed travel
  • NCE-BOS Dep JUL 1. Dep JUL 2. Dep JUL 3. Dep JUL 4. Dep JUL 5.
  • sub-tasks 2b, 3b and 4b are specific to one return date and, thus, are commonly performed for pre-computed travel recommendations relating to one and the same return date. This is illustrated by table 3 for the pre-computed travel recommendations pc-fr 9, pc-fr 13, pc-fr 17 and pc-fr 21, all of which refer to the return date of 7 th July:
  • NCE-BOS Dep JUL 1. Dep JUL 2. Dep JUL 3. Dep JUL 4. Dep JUL 5.
  • sub-task 4 only a part of sub-task 4, namely retrieving such fares which are not valid for the whole outbound part of the travel and for the whole return part of the travel, but are specific for sub-sets or particular travel recommendations, has to be performed separately for each pre-computed travel recommendation, while the other sub-tasks can be performed in common for all pre-computed travel recommendations relating to the same origin-destination city pair (true for sub-task 1) or at least for pre-computed travel recommendations relating to the same departure date (sub-tasks 2a, 3 a and 4a) or to the same return date (sub-tasks 2b, 3 b and 4b).
  • the more pre-computed travel recommendations relate to one origin- destination city pair and the more pre-computed travel recommendations relate to one departure date and return date, respectively, the more computation resources can be spared by mutualizing these sub-tasks across the respective pre-computed flight requests.
  • Figure 5 shows six graphs of exemplary pre-computed travel recommendation sets, each set belonging to one origin- destination city pair.
  • graph 36 In decreasing number of pre-computed travel recommendations being associated with a city pair, graph 36 relates to the city pair New York-Buffalo, graph 37 to New York-Minsk, graph 38 to New York-Hilo on Hawaii, graph 39 to New York-Bilbao, graph 40 to New York-Male and, finally, graph 41 to New York-Mauritius.
  • the X axis of the diagram of Figure 5 denotes the number of pre-computed travel recommendations, while the Y axis plots a measure of re-computation resources needed to perform a re-computation of pre-computed travel recommendations, namely CPU time.
  • Figure 5 shows that re-computation of some pre-computed travel recommendation sets requires more computation resources than others. For example, recomputing pre-computed travel recommendations from set 41 including pre-computed travel recommendations for flights between New York and Mauritius is generally more costly than re-computing pre-computed travel recommendations from set 36 including pre-computed travel recommendations for flights between New York and Buffalo.
  • the computation resources needed to re-compute a pre-computed search result i generally depend on whether or not other pre-computed search results related to the pre- computed search result i are re-computed during the same computation cycle.
  • the computation resources to re-compute the pre-computed search results are not static, but vary with the selection of the set of pre-computed search results to be re-computed during the computation cycle.
  • computation resources varying with the number of interrelated pre-computed search results being re-computed together are taken into account by the re-computation strategy employed by some embodiments as follows:
  • the computation resources needed to re-compute the pre-computed search results to be re-computed are dynamically estimated by the re-computation controller 2 while selecting the pre-computed search results to be re-computed during the next computation cycle. This estimation depends on which other pre-computed search results related to the pre- computed search result i are selected for re-computation during the next re-computation cycle.
  • this is achieved by an iteratively refined estimation of the computation resources needed to re-compute the pre-computed search results to be re-computed while determining the subset of the pre-computed search results to be actually re-computed.
  • This iterative estimation of the varying computation resources includes the following activities: a) For any pre-computed search result i, the computation resources c, needed to re-compute pre-computed search result i are initialized with a first approximated value. This value assumes that the re-computation of pre- computed search result i is independent from the computation of other pre- computed search results selected for re-computation during the next re- computation cycle.
  • a portion of the pre-computed search results for re-computation is then selected. This selection is, for example, done in accordance with the re- computation indicator as it has been explained above.
  • the selected portion does not already exhaust the complete available computation resources of the computation platform 3 to 100%, but only consumes a part of the available computation resources for the next re-computation cycle.
  • the selected portion only requires a given percentage of the computation platform's 3 computation resources available for re-computation within the next re-computation cycle. In the embodiments, specific percentages are used as the given percentage, such as 1%, 2%, 5%, 10%, 20%, 25%, 30%, 40%, 50% or 60% or higher values below 100%.
  • the selection of this portion of pre-computed search results to be recomputed in the next re-computation cycle is based on the current values for the computation resources needed to re-compute the portion of pre-computed search results, i.e. in the very first selection iteration still on the basis of the values of the initialization activity a), i.e. without taking into account any interrelations or dependencies between the pre-computed search results to be re-computed.
  • the re -computation controller 2 re-assesses the computation resources c, needed to re-compute pre-computed search result i by taking into account which pre-computed search results related to the pre- computed search result i have been selected for re-computation in activity c).
  • this re-assessment provides refined values of Ci and in total a refined value of the percentage of the computation platform's 3 computation resources available for re-computation within the next re-computation cycle necessary to re-compute the pre-computed search results selected for re-computation in the next computation cycle so far.
  • the refined value of ⁇ Ci of the already selected pre-computed search results is generally less than the (theoretic) value of ⁇ c, of the already selected pre-computed search results neglecting their interrelation and assuming a re-computation of the already selected pre-computation without re-computing any interrelated pre-computed search results.
  • the re-assessment of the pre-computed search results not (yet) being selected for re-computation is sensible because they are all candidates for a selection in the next iteration(s).
  • the computation resources needed for re-computing these not-yet-selected pre-computed search results with interrelated pre-computed search result(s) are generally lower (and is therefore generally to be decreased by this activity c)) than the computation resources required if no interrelated pre-computed search result was selected for pre- computation.
  • the re-computation controller 2 refers back to activity b) if less than 100% of the computation platform's overall computation resources available for re- computation within the next re-computation cycle is exhausted. Generally, this approach is independent from the specific manner of how the re-computation indicator is calculated or which kind of re-computation strategy is employed.
  • Figure 6 shows a travel-related example of a database environment 1 depicting additional details in comparison with Figure 1.
  • the re-computation controller 2 maintains a
  • the re-computation controller 2 controls the re-computation of the pre-computed search results by employing the re-computation indicator as described in detail above.
  • Pre- computed search results to be re-computed are ordered to be re-computed by computation orders which the re-computation controller 2 transmits to the computation platform 3.
  • the computation platform re-computes the respective pre-computed search results.
  • the re-computation performed by the re- computation platform 3 may be based on underlying data such fares kept in a fares database 7, transportation schedules kept in schedule database 8 and transportation availability data kept in availability database 9.
  • the re-computation platform 3 sends the re-computed search results to search platform 4 and returns them to re-computation controller 2.
  • the re-computation controller 2 may be integrated with the computation platform 3 and/or the search platform 4.
  • the pre-computed search results updated in this manner are requested by a search application 5 from the search platform 6 e.g. by using web service interfaces.
  • FIG. 7 some embodiments employ a modular structure of the re- computation controller 2 to achieve the methods described above. Some of these parts are already described in the unpublished International application PCT/EP2013/002390 to which it is referred for a more detailed explanation of the re-computation controller's structure. Some more details regarding the re-computation controller's internal logic and their relations are elaborated here. As shown by Figure 7, the re-computation controller 2 exemplarily includes the following components:
  • Internal Data representation component 10 This component provides tools to build, store, update and access big matrixes representing the pre-computed search results stored in the database 4.
  • the main function of Internal Data representation component 10 is to provide a "mirror" of the pre-computed search results stored in the database 4 serving as the basis for analyzing the pre-computed search results in order to decide which of them are to be re-computed in the next re-computation cycle.
  • the Internal Data representation component 10 does not hold a one-to-one copy of the pre-computed search results as stored in the database 4, but an appropriate representation which does not have to include every details of the pre-computed search results as stored in the database 4, but, on the other hand, includes additional control data associated with the pre-computed search results such as the times of their last re-computation and, in particular, the re-computation indicator.
  • Input manager 1 1 This component inputs data from heterogeneous sources such as a validity rate database or data source, a popularity database or data source, an initial accuracy database or data source, a costs database or data source, and/or sources indicating real-time events potentially influencing the validity of the pre-computed search results. These data is e.g. used to generate and update the re-computation indicators associated with the pre-computed search results as explained in detail above.
  • the input manager 1 1 converts the incoming data into the appropriate data formats and updates corresponding matrixes representing the pre-computed search results as stored by the Internal Data representation component 10.
  • Analyzer 12 This component computes intermediate data matrixes implied by the probabilistic model (accuracy, criticality) on the basis of the matrices stored by the Internal Data representation component 10.
  • Events manager 13 This component aggregates information on real-time events information and amends the validity predictions given by the probabilistic model accordingly.
  • Optimizer 14 This component runs the re-computation strategy, i.e. the re- computation frequency oriented re-computation and the iterative selection of pre- computed search results taking into account varying computation costs of interrelated pre-computed search results as described in detail above. After having determined the pre-computed search results to be recomputed, the optimizer 14 generates re- computation orders and issues them to the computation platform 3. Furthermore, it updates the re-computation time of these pre-computed search results stored in the Internal Data representation component 10. The latter two modules, the events manager 13 and the optimizer 14, are grouped under the name "consolidator" in PCT/EP2013/002390.
  • Fig. 8 is a diagrammatic representation of a computer system which provides the functionality of the re-computation controller 2 as shown by Figures 2, 6 and 7.
  • the re-computation controller 2 includes a processor 101 , a main memory 102 and a network interface device 103, which communicate with each other via a bus 104.
  • it may further include a static memory 105 and a disk-drive unit 106.
  • a video display 107, an alpha-numeric input device 108 and a cursor control device 109 may form a distribution list navigator user interface.
  • the network interface device 103 connects the data re-computation controller 2 to the computation platform 3, the sources of statistical data needed to fill up the predictive model such as a statistics servers, a volatility database or data source and an initial accuracy database or data source, the sources of real-time events, the Internet and/or any other network.
  • a machine-readable medium on which the software 1 10 resides may also be a non-volatile data carrier 1 1 1 (e.g. a non-removable magnetic hard disk or an optical or magnetic removable disk) which is part of disk drive unit 106.
  • the software 1 10 may further be transmitted or received as a propagated signal 1 12 via the Internet through the network interface device 103.
  • the present re-computation strategy provides a means to automatically generate re- computation decisions which are directed to improve the validity of pre-computed search results. It determines which pre-computed search results are to be re-computed and controls the re-computation also time-wise by taking into account the available computation resources at the computation platform.
  • the accuracy/validity of the pre-computed search results is estimated on the probabilistic model which models the up-to-dateness and out-of-dateness, respectively, over time, and takes into account a re-computation frequency of the pre-computed search results.
  • Pre-computed search results which are re-computed more often than others are considered to be more "expensive" to keep up-to-date.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne des résultats de recherche précalculés, qui sont recalculés pour fournir une précision à long terme. Un organe de commande de recalcul attribue un nouvel indicateur de recalcul à tous les résultats de recherche précalculés. L'indicateur de re-calcul associé à un résultat de recherche précalculé est basé sur au moins une probabilité selon laquelle le résultat de recherche précalculé est encore valide, et sur la fréquence de recalcul du résultat de recherche. Dans un intervalle de temps donné, une plate-forme de calcul recalcule ces résultats de recherche précalculés comportant un indicateur de recalcul indiquant la nécessité la plus élevée de recalcul. Le nombre des résultats de recherche précalculés qui sont recalculés par la plate-forme de calcul est limité par les ressources de calcul disponibles de la plate-forme de calcul pour effectuer le recalcul dans l'intervalle de temps donné.
PCT/EP2015/000287 2014-02-19 2015-02-10 Validite a long terme de resultats de recherche precalcules WO2015124275A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US14/183,911 2014-02-19
EP14290040.6A EP2911070B1 (fr) 2014-02-19 2014-02-19 Validité à long terme de résultats de demande précalculés
EP14290040.6 2014-02-19
US14/183,911 US9582536B2 (en) 2014-02-19 2014-02-19 Long-term validity of pre-computed request results

Publications (1)

Publication Number Publication Date
WO2015124275A1 true WO2015124275A1 (fr) 2015-08-27

Family

ID=52464322

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2015/000287 WO2015124275A1 (fr) 2014-02-19 2015-02-10 Validite a long terme de resultats de recherche precalcules
PCT/EP2015/000286 WO2015124274A1 (fr) 2014-02-19 2015-02-10 Recalcul de résultats de recherche précalculés

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/000286 WO2015124274A1 (fr) 2014-02-19 2015-02-10 Recalcul de résultats de recherche précalculés

Country Status (1)

Country Link
WO (2) WO2015124275A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020165304A1 (fr) 2019-02-14 2020-08-20 Amadeus S.A.S. Traitement de requêtes de base de données complexes

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11636112B2 (en) 2018-04-03 2023-04-25 Amadeus S.A.S. Updating cache data
CA3038199A1 (fr) * 2018-04-03 2019-10-03 Amadeus S.A.S. Mise a jour de donnees de cache

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999022315A1 (fr) * 1997-10-28 1999-05-06 Cacheflow, Inc. Rafraichissement d'antememoire active adaptative
US20050234971A1 (en) * 2004-04-14 2005-10-20 Oracle International Corporation Using estimated cost to refresh a set of materialized views (MVS)
US20090204753A1 (en) * 2008-02-08 2009-08-13 Yahoo! Inc. System for refreshing cache results

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999022315A1 (fr) * 1997-10-28 1999-05-06 Cacheflow, Inc. Rafraichissement d'antememoire active adaptative
US20050234971A1 (en) * 2004-04-14 2005-10-20 Oracle International Corporation Using estimated cost to refresh a set of materialized views (MVS)
US20090204753A1 (en) * 2008-02-08 2009-08-13 Yahoo! Inc. System for refreshing cache results

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CARNEY D ET AL: "Scalable application-aware data freshening", PROCEEDINGS 19TH. INTERNATIONAL CONFERENCE ON DATA ENGINEERING. (ICDE'2003)., 5 March 2003 (2003-03-05), BANGALORE, INDIA, pages 481 - 492, XP010678762, ISBN: 978-0-7803-7665-6, DOI: 10.1109/ICDE.2003.1260815 *
JUNGHOO CHO ET AL: "Synchronizing a database to improve freshness", PROCEEDING SIGMOD '00 PROCEEDINGS OF THE 2000 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, vol. 29, no. 2, 1 June 2000 (2000-06-01), Dallas, TX USA, pages 117 - 128, XP002449369 *
LEHNER W ET AL: "FAST refresh using mass query optimization", PROCEEDINGS 17TH. INTERNATIONAL CONFERENCE ON DATA ENGINEERING. (ICDE'2001)., 2 April 2001 (2001-04-02), HEIDELBERG, GERMANY, pages 391 - 398, XP010538085, ISBN: 978-0-7695-1001-9, DOI: 10.1109/ICDE.2001.914852 *
NGUYEN HOANG VU ET AL: "On Scheduling Data Loading and View Maintenance in Soft Real-time Data Warehouses", 15TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA COMAD 2009,, 12 December 2009 (2009-12-12), Mysore, India, XP055115405 *
SUNDARESAN R ET AL: "Slacker coherence protocol for pull-based monitoring of on-line data sources", CLUSTER COMPUTING AND THE GRID, 2003. PROCEEDINGS. CCGRID 2003. 3RD IE EE/ACM INTERNATIONAL SYMPOSIUM ON, 12 May 2003 (2003-05-12), pages 250 - 257, XP010639759, ISBN: 978-0-7695-1919-7 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020165304A1 (fr) 2019-02-14 2020-08-20 Amadeus S.A.S. Traitement de requêtes de base de données complexes
FR3092920A1 (fr) 2019-02-14 2020-08-21 Amadeus Traitement d’interrogations de base de données complexes
US11748348B2 (en) 2019-02-14 2023-09-05 Amadeus S.A.S. Processing complex database querys

Also Published As

Publication number Publication date
WO2015124274A1 (fr) 2015-08-27

Similar Documents

Publication Publication Date Title
EP2911070B1 (fr) Validité à long terme de résultats de demande précalculés
EP2885725B1 (fr) Mise à jour des résultats d'interrogation de base de données antémémorisées
EP3128441B1 (fr) Gestions de demandes de données
US9235620B2 (en) Updating cached database query results
US20060149713A1 (en) System, method, and computer program product for improving accuracy of cache-based searches
US20160171008A1 (en) Updating cached database query results
US9582536B2 (en) Long-term validity of pre-computed request results
US10956955B2 (en) Managing pre-computed search results
US10657449B2 (en) System and method for load distribution in a network
WO2015124275A1 (fr) Validite a long terme de resultats de recherche precalcules
EP3627349B1 (fr) Recalcul des resultats de recherche precalcules
KR20150060747A (ko) 네트워크에서 부하를 분배하는 시스템 및 방법
EP2698729B1 (fr) Mise à jour des résultats d'interrogation de base de données antémémorisées
AU2021220143A1 (en) Optimization of delivery associate incentives
CN107004026B (zh) 管理预先计算的搜索结果
US10489413B2 (en) Handling data requests
US8688488B2 (en) Method and apparatus for the prediction of order turnaround time in an information verification system
EP3016000A1 (fr) Gestion de résultats de recherche précalculés
JP2019213289A (ja) 電力調整制御装置および方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15705909

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15705909

Country of ref document: EP

Kind code of ref document: A1