FR3099598A1 - Apprentissage automatique distribué pour la validité des données antémé- morisées - Google Patents
Apprentissage automatique distribué pour la validité des données antémé- morisées Download PDFInfo
- Publication number
- FR3099598A1 FR3099598A1 FR1908757A FR1908757A FR3099598A1 FR 3099598 A1 FR3099598 A1 FR 3099598A1 FR 1908757 A FR1908757 A FR 1908757A FR 1908757 A FR1908757 A FR 1908757A FR 3099598 A1 FR3099598 A1 FR 3099598A1
- Authority
- FR
- France
- Prior art keywords
- client
- data
- cached data
- probabilistic model
- piece
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010801 machine learning Methods 0.000 claims abstract description 36
- 238000000034 method Methods 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 description 18
- 230000006978 adaptation Effects 0.000 description 11
- 230000002596 correlated effect Effects 0.000 description 11
- 230000000875 corresponding effect Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 4
- 230000003247 decreasing effect Effects 0.000 description 4
- 238000013500 data storage Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Les interrogations de bases de données sont traitées par un premier client stockant des données antémémorisées. Le premier client et un deuxième client hébergent un modèle probabiliste produisant des valeurs de validité qui s’associent avec les données antémémorisées pour indiquer une probabilité que les données antémémorisées stockées coïncident avec les données d’origine correspondantes. Le premier client sélectionne de façon aléatoire des interrogations au moment où l’interrogation respective est reçue d’un de la pluralité des clients. Pour chacune des interrogations sélectionnées de façon aléatoire, un premier élément des données antémémorisées stockées au premier client et correspondant à l’interrogation sélectionnée de façon aléatoire est récupéré et un premier élément des données d’origine correspondant à l’interrogation sélectionnée de façon aléatoire de ladite au moins une source de données d’origine est récupéré. Le modèle probabiliste est adapté en utilisant un algorithme d’apprentissage automatique basé sur le premier élément récupéré des données antémémorisées et sur le premier élément récupéré des données d’origine et mis à disposition du deuxième client. Figure pour l’abrégé : Fig. 2
Description
The present invention relates to processing queries in a distributed database system. More specifically, it relates to improve the validity of cached data by way of machine learning.
Increasing data validity of data cached in distributed computing environments has already led to a number of approaches and solutions.
US 2010/0106915 A1 describes supplying poll-based client/server notification system in a distributed cache in order to track changes to cache items. Local caches on the clients utilize the notification system to keep the local objects in synchronization with a backend cache service; and further dynamically adjust the 'scope' of notifications required based on the number and distribution of keys in the local cache. The server maintains the changes and returns the changes to clients that perform the appropriate filtering. Notifications are associated with a session and/or an application. An artificial intelligence system is employed to learn about new notification events.
US 2006/0271557 A1 describes caching database query results, whereby at a caching system one or more queries are stored and results are received from a database in response to said one or more queries. In addition, information maintained by said database indicating transactions executed at said database are monitored and possible changes to said information maintained by said database are monitored. At least a portion of said information stored by said database is assessed to determine if portions of said database have been modified. A notification is provided to said caching system if said accessing determines that portions of said database have been modified.
US 2016/0125029 A1 describes Enterprise Resource Planning (ERP) reporting using a cache server to cache previous query results. Query latency is further reduced by routing queries and responses to queries through the cache server rather than via direct communication between a querying device and a server hosting the database.
US 8,117,153 B2 describes managing a distributed database cache. The database cache is distributed over at least two data processing systems (sub-store). For configurations that use a remote backing database, different synchronization strategies between a sub-store and the remote backing store may be configured. The backing store could be a backend database or a remote sub-store (for example, in client/server mode where the client is a cache to an in-memory database). These strategies may be applied on a per-request basis. The synchronization strategy can be set for a transaction, a particular request, or on a specific table or set of tables.
WO 01/76192 A2 describes a distributed edge network architecture in which a data center serves as a primary repository for content uploaded by content providers. From the data center, the content is replicated at all, or a selected group of, geographically dispersed "intermediate" point of presence (POP) sites. A plurality of edge POP sites communicates with the intermediate POP sites and serve as network caches, storing content as it is requested by end users. In one embodiment, the content distribution manager (CDM) implements a policy to manage cache space on all edge file servers using file access data stored in the central database. Files requested relatively infrequently, and/or files which have not been requested for a relatively long period of time when compared with other files may be marked “to be deleted” from the edge POP.
According to a first aspect, a method of processing queries in a distributed database system is provided. The distributed database system comprises at least one original data source storing an amount of original data, and a plurality of clients comprising at least a first client and a second client. The first client and the second client store cached data which corresponds to at least a part of the amount of original data. The first client and the second client host a probabilistic model yielding validity values associated with the cached data indicating a probability that the cached data stored at the client coincides with the corresponding original data. The method comprises, at the first client, randomly selecting queries from among a plurality of queries handled by the distributed database system, at the time the respective query is received from one of the plurality of clients. For each of the queries randomly selected, a first piece of the cached data stored at the first client and matching the randomly selected query is retrieved, and a first piece of the original data matching the randomly selected query from the at least one original data source is retrieved. For queries not being randomly selected, a second piece of the cached data stored at the first client and matching the query is retrieved and the validity value of the second piece of the cached data stored at the first client is evaluated. If the validity value is below a given threshold, a second piece of the original data matching the query from the at least one original data source is retrieved, and the second piece of the cached data stored at the first client by the second piece of the original data is updated. The probabilistic model of the first client is adapted based on the retrieved first piece of the cached data and the retrieved first piece of the original data using a machine learning algorithm and made available to the second client.
In some embodiments, for queries randomly selected, an additional first piece of cached data stored by at least one client of the plurality of clients and matching the randomly selected query is retrieved.
In some embodiments, the probabilistic model is further adapted based on the additional first piece of cached data.
In some embodiments, the method further comprises, at the second client, receiving at least a part of the adapted probabilistic model of the first client from the first client and adapting the probabilistic model of the second client based on the retrieved adapted probabilistic model of the first client.
In some embodiments, making available the adapted probabilistic model to the second client comprises making available an updated validity value for the first cached data associated with a weight value to the second client, wherein the weight value indicates a level of correlation of the first piece of cached data stored at the first client with another piece of cached data stored at the second client, and wherein adapting the probabilistic model of the second client based on the retrieved adapted probabilistic model of the first client comprises adapting the probabilistic model of the second client based on the updated validity value for the first cached data associated and the associated weight value.
In some embodiments, the method further comprises, at the first client, retrieving at least a part of the adapted probabilistic model of the second client from the second client, wherein adapting the probabilistic model of the first client is further based on the retrieved adapted probabilistic model of the second client.
According to another aspect, a computing machine is provided, the computing machine acting as a first client for handling data in a distributed computing environment comprising a plurality of clients comprising at least a first client and a second client and at least one original data source storing an amount of original data, the computing machine being arranged to execute the method of any one of the aforementioned aspects.
According to still another aspect, a computer program is provided, the computer program product comprising program code instructions stored on a computer readable medium to execute the method steps according to any one of the aforementioned aspects, when said program is executed on a computer.
The present mechanisms will be described with reference to accompanying figures. Similar reference numbers generally indicate identical or functionally similar elements.
The subject disclosure generally pertains to handling queries in a database system. The term "query" includes all types of database requests including e.g. read requests to retrieve data and write requests to insert, change or delete data.
A search platform may maintain pre-collected, pre-processed and/or pre-computed search results in a database, also referred to as cached data hereinafter. The search platform receives queries such as search queries, processes each of the search queries received from a client and performs corresponding searches within the pre-collected or pre-computed search results. The search platform may send a search request to perform a search for each search query to the database, also referred to as cache source hereinafter. The search request includes at least part of the one or more search criteria or parameters of the query. The cache source executes the search based on the search request, i.e. searches for data among the cached data matching with the search criteria or parameters of the search request and sends found cached data as pre-computed or pre-collected search results back to the search platform. The pre-computed or pre-collected search results (i.e. cached data) have parameter values matching the search criteria or parameters of the search request. Any matching pre-computed or pre-collected search result of the cache source may be returned by the search platform to the client from which the search query was received.
In case matching pre-computed or pre-collected search results are not found in the cache source, the search platform may make the search criteria of the search query less restrictive and may perform a revised search by sending revised search requests with the less restrictive search criteria to the cache source. The search platform may also perform the revised search by not applying all given search criteria, but only a subset of the search criteria. The particulars of which and how the search criteria of search queries are made less restrictive, and/or which search criteria or parameters are less important in search queries, may be set by given rules or determined using machine learning techniques, the latter resulting in e.g., probabilistic models based on historical search criteria and/or pre-computed or pre-collected search results.
Machine Learning as a branch of Artificial Intelligence possess the ability to flexibly track and assess changing environments. Hence, Machine Learning systems are, for example, employed to establish and maintain a probabilistic model to approximate the validity of cached data, in order to keep the cached data valid by employing computation resources efficiently. An example is given by European Patent application number 19 166 538. Various algorithms and techniques are applied for machine learning systems such as supervised, semi-supervised and unsupervised learning, reinforcement learning, feature learning, sparse dictionary learning, anomaly detection etc.
Machine learning systems often need to process large amounts of training data until their underlying model converges and can effectively be used in a productive system. Within some fields, such as telecommunications, public security, health insurance and retail, even “Big Data” in the range of Tera- and Zettabytes will be required to extract e. g. meaningful user models and user patterns. Therefore, state-of-the-art machine learning systems require high computation power, increased storage capacity, and increased cache capacity at the computing system which hosts the machine learning system. In addition, a significant amount of network transmissions can occur when the machine learning system is hosted e. g. on a single machine, such as a central server being part of a distributed computing environment.
A further aspect is that a single and centrally operated machine learning system constitutes a single point of failure. When no alternative machine learning system is available, in case of failure, the functionality of the distributed computing environment may be severely affected.
In order to reduce the amounts of data to be transmitted as well as the general loss of the machine learning functionality in the case of a failure, decentralized data gathering applied in a distributed machine learning system is taught herein, thereby making use of the data replication capability of a peer-to-peer system.
The present disclosure generally relates to storing and making available cached data in a distributed manner, i.e. at a plurality of network nodes of a peer-to-peer system. Generally, the plurality of network nodes, also referred to as clients hereinafter, store copies, pre-computed, pre-processed or in any way prepared data (i.e. the cached data as introduced above) corresponding to original data. The original data is held at an original data source 3 (which might be a distributed system on its own). Data stored in a peer-to-peer system can be distributed over several clients of the peer-to-peer system in several ways. In one case, each client holds cached data which are disjunct with respect to cached data held by the other clients of the peer-to-peer network. One client in the network does not share the same data with another client. As an example, a number of data records being identifiable and retrievable by a number of associated key-values X1, X2, …, XNmay be stored by three clients, referred to as first client, second client and third client. The number of data records forms at least a part of the cached data which corresponds to the original data of the original data source. The first client may store the key-values X1, X2, …, Xn, the second client stores the key-values Xn+1, …, Xmand the third client stores the key-values Xm+1, …, XN. Hence, in such storing model, no data record of the cached data is stored at more than one cache client.
In another case, all clients of the network hold the same data, e.g. all the cached data corresponding to the original data. With reference to the aforementioned example, the first, second and third client all store the same data recording identified by the key-values X1, X2, …, XN.
It should be noted, that in the aforementioned example, not every client of the first, the second and the third client may hold an up-to-date version of cached data records as the underlying original data may change from time to time at the original data source 3 and, therefore, cached data held by a client may be outdated, whereas the other client may hold an up-to-date version of the cached data record.
In practice, mixed data storage models might be employed where the cached data distributed over the various clients partially overlap. In the aforementioned example of the three clients forming part of a distributed computing environment, the first client may e.g. store the cached data records with the key-values X1, X2, …, Xn, the second client may store Xn-2, Xn-1, Xn, Xn+1, …, Xm, the third client may store the key-values Xm-2, Xm-1, Xm, Xm+1, …, XN.
In addition, cache data stored across various clients may have a level of correlation to each other. The highest level of correlation is identity. For example, the cached data record with the key-value X2stored at the first client is correlated to the cached data record with the same key-value X2stored at the second client (irrespective of whether or not both versions of the cached data record X2are up-to-date, i.e. in line with the original data underlying X2). Hence, if the original data corresponding to the cached data X2is changed, both cached data records X2have the identical probability of being outdated by this change. At a lower level of correlation, cached data records may be correlated due to the known principle of locality, e.g. as they (or the underlying original data) are "neighbours" in the database of the original data source 3. For example, given that the cached data record with key-value X2is closely correlated with neighbouring cached data record X3, it is likely that an invalidation of X2(due to a change of the underlying original data) also renders X3invalid. Moreover, there may also be cached data records which only have a very low correlation or no correlation at all. For such uncorrelated cached data records, a high probability that a first cached data record is outdated does not have any implication whether or not a second data record which is uncorrelated to the first cached data record is outdated as well.
The present disclosure refers to a distributed machine learning system applied on a peer-to-peer system according to which a plurality of clients does not only hold a number of cached data, but a client also holds a respective probabilistic model modelling the validity likelihood of the cached data stored at the client. The machine learning entity at a client may be adapted to the storage model of the cached data across the clients, i.e. either a disjunct storage model, a "data stored at multiple clients" model, or a particularly overlapping storage model. More specifically, a client of the peer-to-peer network holding a particular fraction of the distributed data may have information on which ones of the other clients of the network hold the same or parts of the particular fraction of the data and/or cached data which is correlated to the own cached data. The machine learning entities at the clients exchange information about adaptions of the local probabilistic model, optionally taking into account the correlation of their cached data.
FIG. 1 illustrates a distributed computing system as utilized herein in a schematic manner comprising at least a first client 1, a second client 2, an original data source 3 and a number of communication interfaces 4. Original data source 3 may, in some embodiments, be composed of several individual original data sources.
First client 1, second client 2 and original data source 3 may be constituted of several hardware machines depending on performance requirements. Both, first client 1, second client 2 and original data source 3 are embodied e.g. as stationary or mobile hardware machines comprising computing machines 100 as illustrated in FIG. 7, and/or as specialized systems such as embedded systems arranged for a particular technical purpose, and/or as software components running on a general or specialized computing hardware machine (such as a web server and web clients).
First client 1, second client 2 and original data source 3 are interconnected by the communication interfaces 4. Each of the interfaces 4 utilizes a wireline or wireless Local Area Network (LAN) or a wireline or wireless Metropolitan Area Network (MAN) or a wireline or wireless Wide Area Network (WAN) such as the Internet or a combination of the aforementioned network technologies and are implemented by any suitable communication and network protocols.
A message sequence chart for a mechanism according to some embodiments is presented in FIG. 2. A distributed database system comprises at least one original data source 3 which stores an amount of original data. Furthermore, the distributed database system comprises a plurality of clients comprising at least a first client 1 and a second client 2, wherein the first client 1 and the second client 2 store cached data which may overlap as described in the aforementioned cases and which corresponds to at least a part of the amount of original data stored in the data source. The first client 1 and the second client 2 host a probabilistic model yielding validity values associating with the cached data indicating a probability that the cached data stored at the client coincides with the corresponding original data.
According to some embodiments, in an activity 10, the first client 1 randomly selects queries or subsamples of queries from among a plurality of queries handled by the distributed database system, at the time the respective queries are received from one of the plurality of clients. For each of the queries randomly selected, the first client 1 retrieves in an activity 11 a first piece of the cached data stored at the first client and matching the randomly selected query, and retrieves in an activity 12 a first piece of the original data matching the randomly selected query from the at least one original data source. To cite an example, first client 1 may retrieve as first piece of cached data an SQL-processable tables “table_A1_cached” and “table_A2_cached”. Furthermore, first client 1 may retrieve as first piece of the original data from the data source 3 the SQL-processable tables “table_A1_original” and “table_A2_original”.
First client 1 additionally retrieves in an activity 13 for queries not being randomly selected a second piece of the cached data stored at the first client 1 and matching the query, wherein the query belongs to the queries not being randomly selected. Further citing the example of SQL-processable tables, the second piece of cached data may be table “table_B1_cached”. First client 1 evaluates in an activity 14 the validity value of the second piece of the cached table “table_B1_cached”.
In some embodiments, the validity value is given by
[Math. 1]
and is evaluated as validity probability value by first client 1, wherein t denotes a current time or the estimated time of receipt of the data, e. g. table “table_B1_cached”. A validity (or invalidity) rate λ may be employed to provide an estimate of the probability for requested data to stay valid after a given time. This is also referred as the probability of requested data (e. g. table “table_B1_cached”) being valid or, in other words, not being outdated. Two exemplary functions of this probable accuracy decreasing over time are depicted by FIG. 6. Function F represents requested data which potentially remains more accurate (or, more correctly, stays at a higher probability of being valid over time) than another requested data associated with function G. For example, the requested data represented by function F has 70% probability of being still valid at 35 hours after its last generation, while the other requested data characterized by function G is only valid up to about 50% at 35 hours after its latest generation. First client 1 compares the validity probability value for second piece of cached data, e. g. table “table_B1_cached”, with a given threshold value. If the validity probability value is below a given threshold, first client 1 retrieves in an activity 15 a second piece of the original data matching the query from the at least one original data source, wherein the second piece of the original data may be table “table_B1_original”. First client 1 updates in an activity 16 the second piece of the cached data stored at the first client by the second piece of the original data, which in the case of the presently described example means the replacement of table “table_B1_cached” with table “table_B1_original”.
[Math. 1]
First client 1 adapts in an activity 17 the probabilistic model of the first client based on the retrieved first piece of the cached data (e. g. table “table_A1_cached”) and the retrieved first piece of the original data (e. g. table “table_A1_original”) using a machine learning algorithm. In particular, the retrieved cached data and the retrieved original data are compared to determine whether retrieving the original data would have been appropriate or not. That is, if the comparison results in that the retrieved cached data coincides with the retrieved original data, retrieving the original data would have been inexpedient as the cached data was still valid; otherwise, it would have been sensible as the cached data was invalid. This result indicating an efficiency of retrieving the original data from the original data source (also referred to as polling) is stored together with the retrieved cached data and the retrieved original data, and used to adapt the probabilistic model from which the validity rate associated with the cached data is derived. The adaptation is made by using a machine learning algorithm as described e.g. by European Patent application number 19 166 538. Accordingly, the probabilistic model is improved and the decision of whether polling (i.e., retrieving the original data from the database and updating the cached data) is necessary and thus efficient is adapted, thereby increasing the precision of the probability model. This allows to reduce the number of polling by improving the polling decision compared to e.g. EP 2 908 255 A1.
As an example, the validity rate λ for the first piece of cached data may be adapted by first client 1. The in-/validity rate λ is a measure of how long the piece of cached data (e. g. table “table_A1_cached”) remains valid or how fast the cached data becomes invalid due to changes of the underlying original data (e. g. table “table_A1_original”). This validity rate of a given cached data is, for example, statistically derived from the occurrence and the outcomes of past (re-)computations or (re-)collections and comparisons of the requested data with its previous state or values. For example, it has been determined by an earlier adaption of the probabilistic model that particular requested data has an invalidity rate λ of 10% per hour meaning that the probability of the cached data being valid decreases by 10% every hour. At the time of its (re-)collection or (re-) computation, table “table_A1_cached” is generally 100% valid. After one hour, table “table_A1_cached” is valid with a probability of 90%. After two hours the validity of table “table_A1_cached” is 81% (=90% decreased by another 10%). After three hours, table “table_A1_cached” probable validity is at 72.9%, and so on.
The adaption of the probabilistic model carried out by first client 1 may result in a new invalidity rate λ of 20% per hour for the first piece of cached data, e.g. due to a determination that the first piece of cached data has recently become outdated on a more regular basis. This means that after one hour, table “table_A1_cached” is valid with a probability of 80%. After two hours the validity of table “table_A1_cached” is 64% (=80% decreased by another 20%). After three hours, table “table_A1_cached” probable validity is at 51,2%, and so on.
After adaptation of the probabilistic model, in an activity 18, first client 1 makes the adapted probabilistic model to the second client 2 available. Various technical manners of making available are envisaged. For example, in some embodiments, the first client 1 stores the data specifying the probabilistic model adaption at a given location for retrieval by the second client 2 (pull). Additionally or alternatively, the first client 1 may send the data specifying the probabilistic model adaption to the second client 2 without any retrieval request (push).
In some embodiments, making available the adapted probabilistic model of the first client 1 to the second client 2 takes into account the level of correlation between the cached data of the first client 1 for which the probabilistic model was adapted and the cache data held at the second client 2. Generally, the adaptions to the probabilistic model of the first client 1 which refer to such cached data of the first client 1 which has at least some significant correlation (e.g. correlation meeting at least a given threshold) to cached data stored at the second client 2 is made available. Adaptions to the probabilistic model of the first client 1 which refer to cached data of the first client 1 which has no correlation or no significant correlation (e.g. below the given threshold) may not be made available to the second client 2 because such parts of the probabilistic model of the first client 1 is, at least currently, not useful for the second client 2.
More specifically, if a data storage model is applied according to which all clients hold the same cached data, i.e. the cached data stored by the first client 1 and stored by the second client 2 relate to the same underlying original data which means that the cached data held by the first client 1 and by the second client 2 are fully correlated, the activity of making the adapted model available may be executed by transmitting any updated validity rates λ for cached data records of the first client 1 to the second client 2. With continued reference to the example above, the first client 1 may transmit the updated invalidity rate λ of 20% per hour to the second client 2 over the communication interface 4 or by sending an indication that an updated invalidity rate for the first piece of cached data is now available at first client 1. At second client 2, the adapted validity rate λ can be then be applied to the probabilistic model of the second client 2 for the first piece of cached data stored at the second client 2. Hence, the second client 2 may adapt its own probabilistic model by replacing the current value of the invalidity rate for the first piece of cached data by the updated value of 20%.
The same is applicable to data storage models according to which the cached data held by the first client 1 partially overlaps. Here, the updated value for validity rate λ of the first piece of cached data, as adapted by first client 1, is made available to the second client 2 if the second client 2 stores the first piece of cached data as well and/or stores other pieces of cached data which are correlated to the first piece of cached data. Similar considerations apply when validity rates are assigned to subsets of multiple pieces of cached data. For example, first client 1 holds cached data subsets A1, B1 and C1, each including multiple pieces of cached data being correlated to each other. A probabilistic model has been established for each of the cached data subsets with the corresponding validity rates λA1 for cached data subset A1, λB1for cached data subset B1 and λ' for cached data subset C1. Second client 2 holds data subset A2 which overlaps fully with data subset A1 of first client 1. Furthermore, second client 2 holds data subset D2 which overlaps partially with data subset C1 of first client 1. In addition, second client 2 holds data subset Y2 which does not overlap with any data subset hold by first client 1. Data subset B1 hold by first client 1 has no overlap with any subset of second client 2 either.
First client 1 adapts the probabilistic model for the data subsets A1, B1 and C1, whereby adapted validity rates are obtained, namely λ'A1for data subset A1, λ'B1for data subset B1 and λ'C1for data subset C1. First client 1 and second client 2 may be aware of the different degrees of overlap between various subsets held by themselves and the corresponding different degrees of correlation between the subsets of cached data. To address these different degrees of correlation, in some embodiments, first client 1 or second client 2 may associate weight values with the adapted validity rates. For fully overlapping data (high correlation), as this may be realized between data subset A1 and data subset A2 a weight value '1' may be associated to λ'A1. For data subsets with no overlap at all (low correlation if pieces of data in the subsets are local to each other, or no correlation), such as for data subset B1 a weight value of '0' may be assigned to λ'B1. For partially overlapping data subsets (medium or medium-to-high correlation), such as for data subset B1, a weight value between zero and one, such as '0.8'may be assigned to λ'C1. The weight factor may be higher, the closer the corresponding cached data held at first client 1 are to the second client 2 (closer in terms of the above-mentioned principle of locality).
By utilizing the weight values, first client 1 makes, in some embodiments, in activity 18 those adapted validity rates available, for which a weight value greater than zero has been assigned. In some embodiments, first client 1 makes available adapted validity rates associated with the weight values, so that the second client 2 is able to adapt its own probability model based on the adapted validity rates of the first client 1 and the associated weight values. In the specific example above, first client 1 makes available only validity values λ'A1and λ'C1, optionally tuples λ'A1:1 and λ'C1:0.8. In some embodiments, first client 1 sends the adapted model parameters λ'A1and λ'C1possibly accompanied with the associated weight values to second client 2 in a message comprising an indication to which subsets of cached data these model parameters belong. In some embodiments, first client 1 sends an indication to second client 1 that the adapted model parameters λ'A1and λ'C1are now available at first client 1.
In some embodiments, in a process of decentralized parameter gathering, a variety of validity rates may be calculated by different clients according to the method described above and shown in FIG. 2, wherein each individual validity rate λi may refer to cache data (e.g. individual sets of cached data, shares of correlated cached data) of different nature and fields, such as image processing, traffic flows etc. At a central entity acting as a parameter server such as a specific client or a massive computation unit, these validity rates λi may be collected and further processed, e.g. to assess the learning process of individual clients, to monitor failure-free operation of clients, etc.
The exchange of model parameters as e. g. validity rates λ in a distributed machine-learning (e.g. peer-to-peer) relieves a central entity traditionally maintaining the probabilistic model to track invalidity probabilities of cached data (as e.g. described by above-mentioned European Patent application number 19 166 538 and the other documents of the prior art mentioned above) from network and computation load, and thus eliminates a technical bottleneck issue in such system. Since the machine learning capability carried out by first client 1 can be re-used by another client, the possibility of constantly adapting the probabilistic model is given, therefore rendering the machine learning process more effective and increasing overall resiliency of the whole network.
In some embodiments shown in FIG. 3, first client 1 retrieves for queries randomly selected an additional first piece of local cached data stored by at least one other client of the plurality of clients and matching the randomly selected query (activity 11a). Alternatively or additionally, first client 1 may acquire in an activity 11a additional data cached at a further data source. On the one hand, these additional pieces of cached data may be used to prepare and return the response to the query to the requesting client. On the other hand, these additional pieces of cached data can be utilized to more universally adapt the probabilistic model of the first client 1 and, thus, to accelerate the machine learning process at the first client 1.
To this end, the first client 1 may also retrieve, for each additional piece of cached data, an additional piece of the original data matching the randomly selected query from the at least one original data source. As the one or more additional piece of data match the search criteria of the query, at least a certain level of correlation can be assumed between the first piece of data and the one or more additional pieces of data. Hence, a determination of the validity of the one or more additional pieces of data (e.g. by comparison of the additional cached pieces of data with the corresponding original pieces of data as described above for the first piece of data) may be used to adapt the probabilistic model of the first client more effectively, using a broader basis of learning data. Weight values indicating a level of correlation between additional pieces of cached data and the first piece of cached data as well as current validity values of the additional pieces of cached data (e.g. given by formula Math. 1 as mentioned above) may be retrieved from the other client or other data source and be utilized to adapt the probabilistic model of the first client. Optionally, the first client 1 may include the retrieved one or more additional pieces of cached data (possibly updated by the retrieved pieces of original data) in the cached data stored at the first client 1.
As an example, client 1 may retrieve the table “table_a2_cached” from second client 2 and use it together with the table “table_a1_cached” for the adaptation of the probabilistic model. Since in a distributed machine learning environment deployed in a peer-to-peer network every client can simultaneously carry out the functions of first client 1, the network traffic that occurs when e. g. the tables are transmitted to the clients acting as first clients is no more directed to just one node, but directed to more sources. Therefore, the network traffic may be more evenly distributed over the distributed computing environment.
Now referring to FIG. 4, in some embodiments, after adaptation of the probabilistic model by first client 1, second client 2 retrieves in an activity 19 at least a part of the adapted probabilistic model from the first client 1 and further adapts in an activity 20 its own probabilistic model based on the retrieved adapted probabilistic model of the first client 1. Again, taking the aforementioned example of the partially overlapping data subsets C1 and D2, which are held by first client 1 and second client 2 respectively, second client 1 receives the validity rate λ'C1already adapted by first client 1 based on data subset C1, e. g. on the table “table_A1_cached” and calculated a further, adapted validity rate λ'D2which may be based on data subset D2., e. g. the table “table_A2_cached”.
As mentioned above, in some embodiments, the second client 2 receives an updated validity rate for the first cached data associated with a weight value. The weight value indicates a level of correlation of the first piece of cached data stored at the first client 1 with another piece of cached data stored at the second client 2. In such embodiments, adapting the probabilistic model of the second client 2 based on the retrieved adapted probabilistic model of the first client 1 comprises adapting the probabilistic model of the second client 2 based on the updated validity value for the first cached data associated and the associated weight value. More specifically, the second client may apply the updated validity rate received from the first client 1 on a validity rate of its own probabilistic model indicating a validity probability of the other piece of cached data of the second client 2, the other piece of cached data of the second client 2 being correlated to the first piece of cached data of the first client 1 to a degree as indicated by the associated weight value.
If the weight value indicates a full correlation (e.g. weight value 1), the second client may include the updated validity rate received from the first client 1 into its own probabilistic model, namely the validity rate of the other cache data of the second client 2, without discount, i.e. in a similar manner as native updates obtained by analogous execution of the mechanism of Fig. 2 at the second client 2. For example, in case of a full correlation, the second client 2 may replace the current validity rate of the other cached data by the updated validity rate received from the first client 1. The second client 2 may also update the validity rate of the other cached data by including the received updated validity rate of the first piece of cached data of the first client 1 in other manners, for example by way of exponential smoothing.
If the weight value indicates a lower degree of correlation (e.g. weight value 0.8 indicating a correlation of 80%), the second client 2 may integrate the updated validity rate received from the first client 1 in a discounted manner. For example, the second client may calculate an updated weighted average by using or adding the updated validity rate received from the first client to the series of validity rates of an exponential smoothing in a discounted manner, e.g. by multiplying the updated validity rate received from the first client 0.8 with the associated weight value.
In addition, the second client 2 may also adapt validity rates of further pieces of cached data stored at the second client 2 which are correlated to the other piece of cached data to which the received weight value refers. In this regard, the second client 2 may derive further weight values depending on the level of correlation between the other piece of cached data and the further pieces of cached data. For example, a further piece of cached data may be correlated with the other piece of cached data to a degree of 90%, resulting in a weight value of 0.8 x 0.9 = 0.72. Hence, the updated validity rate received from the first client 1 can be applied to update the validity rate of this further piece of cached data of the second client by using a weight value of 0.72, reflecting the correlation between the first piece of cached data of the first client 1 and the other piece of cached data of the second client 2 on the one hand and the other piece of cached data and the further piece of cached data of the second client 2 on the other hand.
With reference to FIG. 5, in some embodiments, before adaptation of its own probabilistic model, first client 1 retrieves in an activity 17a from second client 2 at least a part of the adapted probabilistic model of the second client 2. Taking again the aforementioned example of the partially overlapping data subsets C1 and D2, first client 1 may retrieve a validity rate λ''D2from second client 2 whereby λ''D2may have been adapted on table “table_A2_cached” and subsequently adapts in activity 17 its own probabilistic model based on the retrieved adapted probabilistic model (λ''D2) of the second client 2 and in addition on data subset C1, e. g. table “table_A1_cached”. First client 1 would therefore obtain an adapted validity rate λ''C1. In activity 20, first client 1 makes its adapted probabilistic model, such as adapted validity rate λ''C1, available to still other clients.
In the subject disclosure, any machine learning algorithm which can be used in connection with probabilistic models may be established. For example, the machine learning algorithm, given a plurality of selected search queries converted into a set of structured features, builds a probabilistic model to predict a target (e.g., first indicator value stored with the search queries). The term “feature” here concerns any information that can be deduced from the search queries. The conversion of the plurality of selected search queries may comprise computing values of all features, using the search criteria and/or reference data (e.g., tables showing default output values for the given electronic circuit, etc.). For example, from all search criteria of the plurality of search queries and the reference data, the set of features to be used by the machine-learning algorithm may be selected. The set of features may therefore contain a mixture of search criteria and features requiring computations, but a feature may also correspond to one or more search criteria (e.g., all input parameters for a given electronic circuit to be simulated by the computer-based electronic circuit simulations, timestamp of the search query, etc.). The set of features are structured as the machine-learning algorithm requires input of e.g. a structured data file (e.g., comma separated data file), with one row by search query and one column by feature. The order of the columns follows a given pattern in order for the probabilistic model to be correctly used once deployed. The machine learning algorithm may also iteratively build a decision tree. Starting from the plurality of the selected search queries, the machine learning algorithm tests features and conditions on these features to split the plurality in two child sets. One of these tests may be “is the input parameters for the given electronic circuit equal to a voltage having a value of 10” (i.e., in general, check whether the feature equals a given value), or “is the input parameter for the given electronic circuit equal to a voltage having a value greater than 20” (i.e., in general, check whether the feature is greater/smaller than the given value). From all the features and conditions tested, the machine learning algorithm keeps only the feature and condition which separates the most the plurality into a first child set containing search queries with their target equal to one value, and a second child set containing the search queries with their target not equal to the one value. In other words, the decision tree is built by the machine learning algorithm such that a feature is taken as a node with two outgoing paths, one for search queries having the feature which equals the given value (the first child set) and another path for search queries having the feature which does not equal the given value (the second child set). That is, the search queries are divided into two child sets based on the feature. The machine learning algorithm progressively builds the decision tree, by then also splitting the child sets in smaller sets with the same feature and condition selection logic. The machine learning algorithm stops once the decision tree reaches a given size or complexity provided in parameter thereof.
FIG. 7 is a diagrammatic representation of the internal component of a computing machine of first client 1, second client 2 and/or original data source 3. The computing machine 100 includes a set of instructions to cause the computing machine 100 to perform any of the methodologies discussed herein when executed by the computing machine 100. The computing machine 100 includes at least one processor 101, a main memory 106 and a network interface device 103 which communicate with each other via a bus 104. Optionally, the computing machine 100 may further include a static memory 105 and a disk-drive unit. A video display, an alpha-numeric input device and a cursor control device may be provided as examples of user interface 102. The network interface device 103 connects the computing machine 100 to the other components of the distributed computing system such as first client 1, second client 2, original data source 3 or further components.
Computing machine 100 also hosts the cache 107. The cache 107 may store the received database tables also in a cache. The cache 107 within the present embodiments may be composed of hardware and software components that store the database tables so that future requests for the database tables can be served faster than without caching. There can be hardware-based caches such as CPU caches, GPU caches, digital signal processors and translation lookaside buffers, as well as software-based caches such as page caches, web caches (Hypertext Transfer Protocol, HTTP, caches) etc.
A set of computer-executable instructions (i.e., computer program code) embodying any one, or all, of the methodologies described herein, resides completely, or at least partially, in or on a machine-readable medium, e.g., the main memory 106. Main memory 106 hosts computer program code for functional entities such as database request processing 108 which includes the functionality to receive and process database requests and data processing functionality 109. The instructions may further be transmitted or received as a propagated signal via the Internet through the network interface device 103 or via the user interface 102. Communication within computing machine is performed via bus 104. Basic operation of the computing machine 100 is controlled by an operating system which is also located in the main memory 106, the at least one processor 101 and/or the static memory 105.
In general, the routines executed to implement the embodiments, whether implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions, or even a subset thereof, may be referred to herein as "computer program code" or simply "program code". Program code typically comprises computer-readable instructions that are resident at various times in various memory and storage devices in a computer and that, when read and executed by one or more processors in a computer, cause that computer to perform the operations necessary to execute operations and/or elements embodying the various aspects of the embodiments of the invention. Computer-readable program instructions for carrying out operations of the embodiments of the invention may be, for example, assembly language or either source code or object code written in any combination of one or more programming languages.
Claims (8)
- A method of processing queries in a distributed database system, the distributed database system comprising:
at least one original data source storing an amount of original data, and
a plurality of clients comprising at least a first client and a second client, wherein the first client and the second client store cached data which corresponds to at least a part of the amount of original data;
wherein the first client and the second client host a probabilistic model yielding validity values associating with the cached data indicating a probability that the cached data stored at the client coincides with the corresponding original data;
the method comprising, at the first client:
randomly selecting queries from among a plurality of queries handled by the distributed database system, at the time the respective query is received from one of the plurality of clients;
for each of the queries randomly selected,
retrieving a first piece of the cached data stored at the first client and matching the randomly selected query, and
retrieving a first piece of the original data matching the randomly selected query from the at least one original data source;
for queries not being randomly selected,
retrieving a second piece of the cached data stored at the first client and matching the query;
evaluating the validity value of the second piece of the cached data stored at the first client;
if the validity value is below a given threshold,
retrieving a second piece of the original data matching the query from the at least one original data source, and
updating the second piece of the cached data stored at the first client by the second piece of the original data; and
adapting the probabilistic model of the first client based on the retrieved first piece of the cached data and the retrieved first piece of the original data using a machine learning algorithm,
making available the adapted probabilistic model to the second client. - The method according to claim 1, further comprising, for queries randomly selected, retrieving an additional first piece of cached data stored by at least one client of the plurality of clients and matching the randomly selected query.
- The method according to claim 2, wherein adapting the probabilistic model is further based on the additional first piece of cached data.
- The method according to any one of claims 1 to 3, further comprising, at the second client:
receiving at least a part of the adapted probabilistic model of the first client from the first client;
adapting the probabilistic model of the second client based on the retrieved adapted probabilistic model of the first client. - The method according to claim 4, wherein making available the adapted probabilistic model to the second client comprises making available an updated validity value for the first cached data associated with a weight value to the second client, wherein the weight value indicates a level of correlation of the first piece of cached data stored at the first client with another piece of cached data stored at the second client, and wherein adapting the probabilistic model of the second client based on the retrieved adapted probabilistic model of the first client comprises adapting the probabilistic model of the second client based on the updated validity value for the first cached data associated and the associated weight value.
- The method according to any one of claims 1 to 5, further comprising, at the first client:
retrieving at least a part of the adapted probabilistic model of the second client from the second client;
wherein adapting the probabilistic model of the first client is further based on the retrieved adapted probabilistic model of the second client. - A computing machine acting as a first client for handling data in a distributed computing environment comprising a plurality of clients comprising at least a first client and a second client and at least one original data source storing an amount of original data, the computing machine being arranged to execute the method of any one of claims 1 to 6.
- A computer program product comprising program code instructions stored on a computer readable medium to execute the method steps according to any one of the claims 1 to 6 when said program is executed on a computer.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR1908757A FR3099598B1 (fr) | 2019-07-31 | 2019-07-31 | Apprentissage automatique distribué pour la validité des données antémé- morisées |
US16/934,653 US11449782B2 (en) | 2019-07-31 | 2020-07-21 | Distributed machine learning for cached data validity |
ES20188834T ES2905613T3 (es) | 2019-07-31 | 2020-07-31 | Aprendizaje automático distribuido para la validez de datos en la memoria caché |
EP20188834.4A EP3771998B1 (fr) | 2019-07-31 | 2020-07-31 | Apprentissage automatique distribué pour la validité des données antémémorisées |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR1908757 | 2019-07-31 | ||
FR1908757A FR3099598B1 (fr) | 2019-07-31 | 2019-07-31 | Apprentissage automatique distribué pour la validité des données antémé- morisées |
Publications (2)
Publication Number | Publication Date |
---|---|
FR3099598A1 true FR3099598A1 (fr) | 2021-02-05 |
FR3099598B1 FR3099598B1 (fr) | 2021-08-27 |
Family
ID=68654707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
FR1908757A Active FR3099598B1 (fr) | 2019-07-31 | 2019-07-31 | Apprentissage automatique distribué pour la validité des données antémé- morisées |
Country Status (4)
Country | Link |
---|---|
US (1) | US11449782B2 (fr) |
EP (1) | EP3771998B1 (fr) |
ES (1) | ES2905613T3 (fr) |
FR (1) | FR3099598B1 (fr) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11561978B2 (en) * | 2021-06-29 | 2023-01-24 | Commvault Systems, Inc. | Intelligent cache management for mounted snapshots based on a behavior model |
EP4246342A1 (fr) | 2022-03-18 | 2023-09-20 | Amadeus S.A.S. | Adaptation de mise à jour de cache |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001076192A2 (fr) | 2000-03-30 | 2001-10-11 | Intel Corporation | Architecture repartie de reseau edge |
US6725333B1 (en) * | 1999-04-22 | 2004-04-20 | International Business Machines Corporation | System and method for managing cachable entities |
US20040249798A1 (en) * | 2003-06-06 | 2004-12-09 | Demarcken Carl G. | Query caching for travel planning systems |
US20060271557A1 (en) | 2005-05-25 | 2006-11-30 | Terracotta, Inc. | Database Caching and Invalidation Based on Detected Database Updates |
US20100106915A1 (en) | 2008-10-26 | 2010-04-29 | Microsoft Corporation | Poll based cache event notifications in a distributed cache |
US8117153B2 (en) | 2006-03-28 | 2012-02-14 | Oracle America, Inc. | Systems and methods for a distributed cache |
US20140052750A1 (en) * | 2012-08-14 | 2014-02-20 | Amadeus S.A.S. | Updating cached database query results |
EP2908255A1 (fr) | 2014-02-13 | 2015-08-19 | Amadeus S.A.S. | Augmenter la validité de résultat de recherche |
US20160125029A1 (en) | 2014-10-31 | 2016-05-05 | InsightSoftware.com International | Intelligent caching for enterprise resource planning reporting |
US20160171008A1 (en) * | 2012-08-14 | 2016-06-16 | Amadeus S.A.S. | Updating cached database query results |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU3638401A (en) | 1999-11-01 | 2001-05-14 | Ita Software, Inc. | Method and apparatus for providing availability of airline seats |
US20070288890A1 (en) | 2006-05-17 | 2007-12-13 | Ipreo Holdings, Inc. | System, method and apparatus to allow for a design, administration, and presentation of computer software applications |
CN104348852B (zh) | 2013-07-26 | 2019-06-14 | 南京中兴新软件有限责任公司 | 一种实现电信能力群发的方法、装置及系统 |
US9251478B2 (en) * | 2013-07-29 | 2016-02-02 | Amadeus S.A.S. | Processing information queries in a distributed information processing environment |
GB2524075A (en) | 2014-03-14 | 2015-09-16 | Ibm | Advanced result cache refill |
US9800662B2 (en) * | 2014-07-16 | 2017-10-24 | TUPL, Inc. | Generic network trace with distributed parallel processing and smart caching |
EP3128441B1 (fr) | 2015-08-03 | 2018-10-10 | Amadeus S.A.S. | Gestions de demandes de données |
US10901993B2 (en) * | 2018-04-03 | 2021-01-26 | Amadeus S.A.S. | Performing cache update adaptation |
US11636112B2 (en) * | 2018-04-03 | 2023-04-25 | Amadeus S.A.S. | Updating cache data |
CA3038018C (fr) | 2018-04-03 | 2024-02-27 | Amadeus S.A.S. | Execution d'adaptation de mise a jour de cache |
-
2019
- 2019-07-31 FR FR1908757A patent/FR3099598B1/fr active Active
-
2020
- 2020-07-21 US US16/934,653 patent/US11449782B2/en active Active
- 2020-07-31 EP EP20188834.4A patent/EP3771998B1/fr active Active
- 2020-07-31 ES ES20188834T patent/ES2905613T3/es active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6725333B1 (en) * | 1999-04-22 | 2004-04-20 | International Business Machines Corporation | System and method for managing cachable entities |
WO2001076192A2 (fr) | 2000-03-30 | 2001-10-11 | Intel Corporation | Architecture repartie de reseau edge |
US20040249798A1 (en) * | 2003-06-06 | 2004-12-09 | Demarcken Carl G. | Query caching for travel planning systems |
US20060271557A1 (en) | 2005-05-25 | 2006-11-30 | Terracotta, Inc. | Database Caching and Invalidation Based on Detected Database Updates |
US8117153B2 (en) | 2006-03-28 | 2012-02-14 | Oracle America, Inc. | Systems and methods for a distributed cache |
US20100106915A1 (en) | 2008-10-26 | 2010-04-29 | Microsoft Corporation | Poll based cache event notifications in a distributed cache |
US20140052750A1 (en) * | 2012-08-14 | 2014-02-20 | Amadeus S.A.S. | Updating cached database query results |
US20160171008A1 (en) * | 2012-08-14 | 2016-06-16 | Amadeus S.A.S. | Updating cached database query results |
EP2908255A1 (fr) | 2014-02-13 | 2015-08-19 | Amadeus S.A.S. | Augmenter la validité de résultat de recherche |
US20160125029A1 (en) | 2014-10-31 | 2016-05-05 | InsightSoftware.com International | Intelligent caching for enterprise resource planning reporting |
Also Published As
Publication number | Publication date |
---|---|
US11449782B2 (en) | 2022-09-20 |
ES2905613T3 (es) | 2022-04-11 |
EP3771998B1 (fr) | 2021-12-01 |
FR3099598B1 (fr) | 2021-08-27 |
US20210034995A1 (en) | 2021-02-04 |
EP3771998A1 (fr) | 2021-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11580109B2 (en) | Method and apparatus for stress management in a searchable data service | |
US11604781B2 (en) | System and method for clustering distributed hash table entries | |
US10528537B2 (en) | System and method for fetching the latest versions of stored data objects | |
US20190303382A1 (en) | Distributed database systems and methods with pluggable storage engines | |
US11323514B2 (en) | Data tiering for edge computers, hubs and central systems | |
US6904433B2 (en) | Method for using query templates in directory caches | |
US7801912B2 (en) | Method and apparatus for a searchable data service | |
EP3771998B1 (fr) | Apprentissage automatique distribué pour la validité des données antémémorisées | |
US10936590B2 (en) | Bloom filter series | |
CN112561197A (zh) | 一种带有主动防御影响范围的电力数据预取与缓存方法 | |
US9317432B2 (en) | Methods and systems for consistently replicating data | |
EP3935520A1 (fr) | Traitement de données distribué | |
Barbosa et al. | Looking at both the present and the past to efficiently update replicas of web content | |
US11966393B2 (en) | Adaptive data prefetch | |
EA027808B1 (ru) | Система управления базой данных | |
Wu | Optimizing Consensus Protocols with Machine Learning Models: A cache-based approach | |
CN117472918B (zh) | 数据处理方法、系统、电子设备及存储介质 | |
CN118170721A (zh) | 基于Spark的分布式日志检索系统以及方法 | |
CN116680276A (zh) | 数据标签存储管理方法、装置、设备及存储介质 | |
Yasin et al. | Cooperative Web Proxy Caching for Media Objects Based on Peer-to-Peer Systems | |
Yue et al. | SFMapReduce: An Optimized MapReduce Framework for Small Files |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PLFP | Fee payment |
Year of fee payment: 2 |
|
PLSC | Publication of the preliminary search report |
Effective date: 20210205 |
|
PLFP | Fee payment |
Year of fee payment: 3 |
|
PLFP | Fee payment |
Year of fee payment: 4 |
|
PLFP | Fee payment |
Year of fee payment: 5 |
|
PLFP | Fee payment |
Year of fee payment: 6 |