WO2014048540A1 - Method and system of storing and retrieving data - Google Patents

Method and system of storing and retrieving data Download PDF

Info

Publication number
WO2014048540A1
WO2014048540A1 PCT/EP2013/002655 EP2013002655W WO2014048540A1 WO 2014048540 A1 WO2014048540 A1 WO 2014048540A1 EP 2013002655 W EP2013002655 W EP 2013002655W WO 2014048540 A1 WO2014048540 A1 WO 2014048540A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
cache
database
software application
database systems
Prior art date
Application number
PCT/EP2013/002655
Other languages
English (en)
French (fr)
Inventor
Jean-Charles Redoutey
Joel Singer
Florent Balard
Florian Prud'Homme
Romain Bouteloup
Colin Pitrat
Original Assignee
Amadeus S.A.S.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from EP12368027.4A external-priority patent/EP2713284B1/en
Priority claimed from US13/628,517 external-priority patent/US9037801B2/en
Application filed by Amadeus S.A.S. filed Critical Amadeus S.A.S.
Priority to AU2013324689A priority Critical patent/AU2013324689B2/en
Priority to SG11201501650WA priority patent/SG11201501650WA/en
Priority to IN1332DEN2015 priority patent/IN2015DN01332A/en
Priority to JP2015533468A priority patent/JP6511394B2/ja
Priority to KR1020157007498A priority patent/KR101690288B1/ko
Priority to CA2882498A priority patent/CA2882498C/en
Priority to CN201380050168.7A priority patent/CN104662539B/zh
Publication of WO2014048540A1 publication Critical patent/WO2014048540A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication

Definitions

  • the present invention relates generally to data management systems of the type used by large providers of goods and services to keep track of their overall product offering and level of availability, and more particularly to a system that allows a high level of inquiries issued by remote-users of the data storage to be responded without or within a very short delay, while not impacting the completion of the transactions that constantly update content as a result of the administration of the data storage.
  • DBMS database management system
  • databases are used to keep track in real-time of the actual seat capacity, the current state of reservations along with the configurations of the fleet of flights operated by a given airline.
  • an airline's inventory usually contains all flights with their available seats and is generally divided into service classes (e.g. First, business or Economy class) and many booking classes, for which different prices and booking conditions apply.
  • service classes e.g. First, business or Economy class
  • Inventory control steers how many seats are available in the different booking classes for instance by opening and closing individual booking classes for sale.
  • the Fare Quote System the price for each sold seat is determined.
  • inventory control has an interface to an airline's Revenue Management System to support a permanent optimization of the offered booking classes in response to changes in demand.
  • Users access an airline's inventory through an availability application having a display and graphical user interface. It contains all offered flights for a particular city-pair with their available seats in the different booking classes.
  • Airline inventory databases are usually managed by airlines. Airline inventory databases can also be set up by companies that provide travel services to many actors of the travel industry including the airlines, the traditional travel agencies and all sorts of other online travel service providers too. Such a company is for example AMADEUS, a European travel service provider with headquarters in Madrid, Spain. Some inventories are directly run by airlines and are interfaced with a global distribution systems (GDS) or a central reservation system (CRS).
  • GDS global distribution systems
  • CRS central reservation system
  • Cache may be an application cache, located at application tier, which basically reuses pieces of data previously fetched from the database by the application. This immediately raises the issue of the data quality then delivered in response to further user interrogations since database contents may have been updated in the mean time. This turns out to be truly challenging for some applications where databases are constantly updated and require a high quality of data. This is for instance the case of applications related to airline's inventory where the freshness of the data directly impacts the possibility to sell seats and the price offered to customers.
  • this type of application caches requires the implementation of sophisticated mechanisms, between database et cache, that allow invalidation and/or replacement of the previously fetched pieces of data when updated in database thus keeping application cache and database contents indeed consistent.
  • cache is inserted in the path between the database and the application so that it is always queried first by the application. If the queried data is not present in cache, then it is fetched from the database and brought into the cache before being delivered to the application. All these solutions have in common to require that cache and database be tightly coupled and need to be aware of each other. As a consequence, these solutions are not easily scalable when service provider must deploy more computer resources to cope with an increase of traffic and serve more customers while maintaining system performances.
  • this invention provides a method of storing data in a data storage system and retrieving data from the data storage system, comprising a software application, one or more database systems and a plurality of cache nodes, the software application being configured to receive user requests requiring at least one reading of data or one writing of data, the software application being further configured to send read queries and write queries to the data storage system for processing the user requests, the method being characterized in that the software application interfaces independently the one or more database systems and the plurality of cache nodes and in that the method comprises the following steps performed by the software application with at least one data processor: upon reception of a user request requiring at least a reading of data, the software application sends a read query solely to at the plurality of cache nodes.
  • the software application receives a queried data (i.e., a data that is retrieved) from at least one cache node in response to the read query, then it uses the queried data to process the user request.
  • a queried data i.e., a data that is retrieved
  • the software application uses the queried data to process the user request.
  • the software application receives a miss from all cache nodes in response to the read query, meaning thereby that the data has not been found in the cache node, then it fetches the one or more database systems; if the queried data is present in the database system, upon having retrieved the queried data from the one or more database systems, the software application uses the queried data to process the user request and sends the queried data to at least one cache node and an instruction to add the queried data to the at least one cache node.
  • the software application upon reception of a user request requiring at least a writing of data, sends an instruction for writing the one or more database systems and also sends an instruction for concurrently writing the plurality of cache nodes; thereby, populating the plurality of cache nodes at each missed read query, i.e. at each read query for which the queried data is not found in all cache nodes, and at each write query of the data storage system.
  • Each data is thus stored identically in at least one cache node of the plurality of cache nodes and in the one or more database systems, ensuring thereby that the database systems and the plurality of cache nodes are always fully synchronized.
  • the invention allows having the database completely independent from the plurality of cache comprising the plurality of cache nodes contrary to known solutions involving a replication component integrated in the database to perform the update of the cache, the database and cache being thereby not fully independent which limits the scalability of the entire storage system and requires specific database.
  • the computerized data system equipped with a database and a cache that are completely independent and unaware of each other thus permits an unbounded scalability of the data system by simply bringing more computer and storage capacity when necessary to cope with an increase of traffic.
  • the invention can be implemented with standard databases and DBMS.
  • the invention also allows reducing the cost of the maintenance.
  • the increasing of the storage resources does not need any operation on the database.
  • the software application is in charge of updating the data in the database and of populating the caches either through reflecting a writing of the database or through adding a queried data that is present in the database but not yet present in the cache
  • end-users can be provided with high quality data i.e., the most up-to-date data.
  • caches are rapidly populated which allows increasing the throughput right upon the addition of a new cache node to the system.
  • the invention allows providing user with precise and customer tailored replies.
  • a write query comprises at least one of: addition, update and deletion of data in the database systems
  • the method according to the invention may comprise any one of the following facultative features and steps:
  • the data model of cache and database may be identical but does not need to be strictly identical though. The only requirement is that they must be consistent so that exact same addressing keys can be derived for accessing cache and database records. The keys must also allow database records to be locked for write operation consistency.
  • data records are either stored identically in database and in cache, when present, or in a way which guarantees consistency of the addressing of the same data records in cache and in database.
  • cache data model can be adapted versus the database model to expedite the retrieving of data so that access time of the cache is improved while addressing is kept fully consistent between the two entities.
  • the data model of the cache nodes is the same as the data model of the one or more databases.
  • Each data of each cache node is stored identically in the database system.
  • Each data of the database system is stored identically in each cache node.
  • the instruction to write the one or more database systems is sent by the software application to the one or more database systems.
  • the instruction for concurrently writing the plurality of cache nodes is sent by the software application to the plurality of cache nodes
  • One single software application accesses the database system and the cache nodes.
  • the data storage system comprises one single database system.
  • the cache comprises cache nodes, comprising each data storage means which are not persistent.
  • the software application receives a positive acknowledgement on completion of a successful addition of the queried data to the at least one cache node.
  • a cache node or a cache is different from a cache buffer.
  • the cache buffer stores temporarily the data during the writing. No data is retrieved from the cache buffer in response to a user request.
  • the cache buffer is dedicated to the processing of the writes. If the commit fails, then the application software sends an instruction to the at least one cache node to delete said new data that has been previously set.
  • the at least one cache node that contains said new data deletes it from its content. If a plurality of cache nodes contain said new data, then all the cache nodes of said plurality delete it.
  • the software application decides to which cache node or which cache nodes among the plurality of cache nodes the instruction to add data or the instruction for updating or deleting data is sent.
  • the decision takes into account a load balancing.
  • miss is returned to the software application instead of the queried data
  • the software application sends to at least one cache node a data of absence which is added to the at least one cache node for the corresponding queried data, the data of absence becoming immediately available for all next queries;
  • Missing data The data user requested by end-users that are not eventually found in database are then stored in cache as "missing data" so that a next interrogation of the cache can return immediately the information that the user requested data is neither present in cache nor in database. This prevents further interrogation of the database from slowing down the database system.
  • the cache node stores a specific value associated to the data, said specific value indicating that the data is not present in the database.
  • the software application interfaces independently the one or more database systems on a first dedicated interface, and the plurality of cache nodes on a second dedicated interface.
  • the data model is chosen in such a way that it is directly map-able between the database and the cache
  • Each set of data is grouped by functional entity and indexed by a key which makes the set of data immediately accessible as a whole thanks to this key both in the database system and in the cache nodes.
  • the software application is a software application of a travel provider's inventory.
  • the software application, the database system and the cache nodes are comprised in an inventory of a travel provider.
  • the travel provider is an airline.
  • the user request received at the software application is sent by at least one of: travel agency, online travel agency, on online-customer.
  • the data model of the cache nodes and the database are consistent so that exact same addressing keys can be derived for accessing cache nodes and database data.
  • the data are either stored identically in the database and in at least one cache node, when present, or in a way which guarantees consistency of the addressing of the same data in cache and in database.
  • this invention provides a computer-program product or a non-transitory computer-readable medium that contains software program instructions, where execution of the software program instructions by at least one data processor results in performance of operations that comprise execution of the above method.
  • the exemplary embodiments also encompass a method of storing data in a data storage system and retrieving data from the data storage system, comprising a software application, one or more database systems and a plurality of cache nodes, the software application being configured to receive user requests requiring at least one reading of data or one writing of data, the software application being further configured to send read queries and write queries to the data storage system for processing the user requests, the method being characterized in that the software application interfaces independently the one or more database systems and the plurality of cache nodes and in that the method comprises the following steps performed by the software application with at least one data processor:
  • the software application upon reception of a user request requiring at least a reading of data, the software application sends a read query solely to the plurality of cache nodes; if the software application receives the queried data (i.e., the data that is retrieved) from at least one cache node, then it uses the queried data to process the user request,
  • the software application if the software application receives a miss from all cache nodes, then it fetches the one or more database systems; if the queried data is present in the database system, upon having retrieved the queried data from the one or more database systems, the software application uses the queried data to process the user request and sends to at least one cache node the queried data and an instruction to add the queried data to the at least one cache node; if not found in database, add in cache an information that indicates that the data does not exist
  • each data is stored identically in at least one cache node of the plurality of cache nodes and in the one or more database systems or in a way which guarantees consistency of the addressing of the same data in cache and in database.
  • the software application upon reception of a user request requiring at least a writing of data, sends an instruction for writing the one or more database systems and also sends an instruction for concurrently writing the plurality of cache nodes; thereby, populating the plurality of cache nodes at each missed read query and at each write query of the data storage system.
  • this invention provides a method of storing data in a data storage system of an airline's Inventory and retrieving data from the data storage system, comprising a software application, one or more database systems and a plurality of cache nodes, the software application being configured to receive user requests requiring at least one of: a reading of data to know an availability regarding at least one flight and a writing of data to modify an availability regarding at least one flight; the software application being further configured to send read queries and write queries to the data storage system for processing the user requests, the method being characterized in that the software application interfaces independently the one or more database systems and the plurality of cache nodes and in that the method comprises the following steps performed by the software application with at least one data processor:
  • the software application receives the queried data (i.e., the data that is retrieved) from at least one cache node, then it uses the queried data to process the user request,
  • the software application if the software application receives a miss from all the cache nodes, then it fetches the one or more database systems; if the queried data is present in the database system, upon having retrieved the queried data from the one or more database systems, the software application uses the queried data to process the user request and sends the queried data to at least one cache node and an instruction to add the queried data to the at least one cache node;
  • this invention provides a data storage system comprising one or more database systems, at least one cache node, at least one data processor and a software application, where execution of the software application by the at least one data processor results in performance of operations that comprise execution of any one of the above methods and wherein the one or more database systems and the at least one cache node are configured to be independently driven by the software application.
  • the number of cache nodes and the processing power of the computerized means for running the software application are adapted to meet the aggregated peak traffic generated by all end-users of the software application.
  • the data storage system according to the invention may comprise any one of the following facultative features and steps:
  • the number and storage resource of the cache nodes is adapted to hold the whole database system contents.
  • Some data of the database system are stored in more than one cache node.
  • the hit ratio query of the at least one cache node eventually reaches 100% when the whole database system contents has been transferred into the at least one cache node by the software application.
  • this invention provides an Inventory of a travel provider comprising the data storage system of the present invention.
  • FIGURE 1 depicts a data storage system according to the invention.
  • FIGURE 2 illustrates the process that eventually permits to obtain in application a data requested by an end user and which is not yet present in cache.
  • FIGURE 5 gives further details on the timing of the data writing performed simultaneously by the application in database and in cache.
  • FIGURE 6 illustrates the case where requested data is neither present in cache nor in database.
  • FIGURE 7 illustrates the case where a writing of the database and cache is a delete.
  • Figure 1 describes a data storage system 100 according to the invention in which a software application 10 is interfacing independently, on one hand, a database system 20 and, on the other hand, a cache system also referred to as cache and comprising one or more cache nodes 30.
  • the database cache system of the invention described hereafter are specific mainly because the whole database content may eventually be transferred into a set of cache nodes that operate as a front- end processing layer shielding all the reading traffic that would otherwise reach the database systems 20 thus dramatically improving the performances of the data storage system 100. A sufficient number of cache nodes are then deployed to support the whole traffic and to handle together the whole data base content. Hence, when the system has been up and running for a significant period of time all data entities contained in the back-end database are eventually transferred or present into the set of cache nodes so that there is no longer any cache miss since all read queries are then handled by the cache nodes. Writings of the database are systematically performed in cache and in database so that cache and database contents are always consistent. Even though data storage system hereafter described is thus more a high speed front-end storing and processing system to a database used as a repository of data the term of cache is however used in the following description of the invention.
  • the data storage system 100 follows the traditional tree-tier architecture often used by data processing systems.
  • the middle tier 120 is the software application 10 tier from where the proprietary software application 10 of the service provider is run. In the example previously used of a GDS this is typically the inventory application of any airline which is aimed at keeping track of all reservations and booking of seats among the airline fleet of flights.
  • the client tier 130 is comprised of all remotely located users 40 of the application 10.
  • the end users are typically travel agents in traditional travel agencies. They are as well individuals that use any of the many available travel web sites or online travel agencies from which they can issue travel requests and possibly book, online, air trips.
  • the lower tier is the storage tier 110 that comprises the database system 20.
  • the invention does not make any assumption on the database system used by the service provider. It is most often based on a standard data base management system (DBMS) commercially available but it can be as well a proprietary database system. Whichever database system is used by the service provider it is implemented from a sufficient amount of hardware and software resources to hold and process all the data of the service provider.
  • DBMS data base management system
  • All hardware resources needed to implement the data storage system 100 are shown as individual computer-like machines globally referred to by numeral reference 101. Persistent, non-volatile, storage is assumed to be available from each individual computer and also as separate data disk 102 when necessary, for example to permanently hold the database contents.
  • the term 'user request' or 'request' designates a demand coming from a user 40 and that reaches an application 10.
  • the user can be a person such as a traveler or a travel agent or can be a computerized system that sends requests.
  • the term 'data query' or 'query' designates a demand sent by the application 10 to a cache node 30 and/or to the database system 20.
  • a query can be a read query or a write query.
  • the application 10 receives user requests and sends data queries, these queries being either read queries or write queries.
  • database 20 is the ultimate data repository of the service provider.
  • the database 20 then preferably adheres to the ACID (Atomicity, Consistency, Isolation and Durability) set of properties guaranteeing that database transactions are thus processed reliably in terms of: Atomicity, Consistency, Isolation and Durability.
  • ACID Anamicity, Consistency, Isolation and Durability
  • cache is functionally located at storage tier like the database.
  • Interface 14 and the one or more cache nodes 30 are assumed to be able to handle all the traffic of the data storage system 100, whichever throughput is targeted, just by providing and deploying at software application 10 tier 120, and at storage tier 110 for the cache nodes, enough hardware and software resources to meet the expected throughput.
  • processing more data is simply obtained by adding more computing and storing resources to the existing ones.
  • This way of doing provides a system scalability which is not limited by architectural considerations other than the number of computer platforms that need to be deployed to achieve the targeted throughput, i.e., their cost, power dissipation and floor occupancy.
  • the data storage system 100 is based on a global key/value data model where contents are consistent in cache and in database so that a same key can be used to retrieve both.
  • the data model is thus chosen in such a way that it is directly map-able in database and in cache.
  • each set of data is grouped by functional entity and indexed by a common unique key. This makes them immediately accessible as a whole from the unique key both in database and in cache although contents may somehow differ.
  • a leg is a part of a flight.
  • a flight can go from Nice (NCE) to New York (NYC) with a stop at Paris (CDG). It has two legs: NCE-CDG and CDG-NYC. (Note that it contains three O&D: NCE-CDG, NCE-NYC and CDG-NYC.)
  • the schedule information is stored in a relational database.
  • the "mother” table has a Flight-Date primary key.
  • One of the "child” tables has a Leg-Date primary key.
  • Some writings (updates for instance) are done at flight level, others at leg level.
  • Locking at flight level is used in both cases. This is used to prevent any modification on the flight and also on all legs of the flight. The lock cannot be set at leg-date level because an update of the flight would then update all legs and could lead to concurrent updates.
  • the data model of the database and caches if not strictly identical, must be consistent so that same indexing keys can be derived for accessing cache and database records while allowing database records to be locked.
  • the architecture shown in Figure 1 works with a cache organized as a single layer client side distributed cache which supports the whole throughput and also simplifies significantly the management of the cache data consistency.
  • Having a client side distributed cache means that data distribution among the various cache nodes 30 composing the cache is known and computed on client side at software application 10 tier.
  • all cache nodes 30 are thus fully independent and scalability of the system is indeed potentially unlimited.
  • actually getting more processing power by adding new cache nodes 30 in the storage tier is only achievable if a balanced distribution of data within the nodes is also maintained.
  • data are distributed based on their key properties. For instance, flight oriented data are distributed on the basis of their flight number.
  • the data storage system 100 of the invention does not require any type of synchronization mechanism between cache and database.
  • the cache is used by the software application 10 in an explicit way, i.e.: it is up to the software application 10 tier to use either one of the two data sources: database or cache, or both at during the same user request, e.g., when database or cache must be written.
  • the direct consequence of this approach is that database is kept totally unaware of the existence of a cache and is not at all impacted by the presence, or not, of a cache in the data structure of the invention.
  • the cache is totally decoupled from the database. Both structures can then fully evolve independently if necessary. It is worth noting that data writings within the cache are not using an invalidation policy. All writings result in the immediate replacement of the data into the cache. When the whole database contents is eventually mapped into the cache and distributed over all available cache nodes 30, hit ratio reaches 00% even in case a very high level of concurrent writings happens.
  • Cache data can always be considered as valid and there is no need for extra process to check for it. Indeed, every cache miss triggers the addition of the missing value into the cache from the database. This is done once for all thus ensuring the lowest possible load on the database which is fetched only once per data entity to retrieve. This occurs mostly when cache becomes operational, e.g., after a power-on of the system following an addition of a cache node 30, a failure or the cache node 30, an operation of maintenance etc. The invention assumes there is enough room in the distributed cache nodes 30 to receive the whole database contents.
  • the absence of data requested by an end-user in the database is also recorded in the cache. If a piece of data requested by an end user can neither be found in cache nor retrieved from the database then an absence of data is recorded into the cache so that next time cache is queried no fetching of the corresponding piece of data will be attempted from the database in order to further limit database load.
  • the architecture described in Figure 1 is extensible to any type of data that can be key-value oriented. Also, it is applicable to any process that can be key-value oriented. It is in particular applicable to any of the processes devised to check flight availability.
  • the cache part of the system is pretty simple and composed of one or more standalone computers offering a basic remote key/value protocol.
  • Three basic operations on the cache are defined that let software application 10 updates it, populates the cache from the database, and retrieves data from the cache. They are:
  • Get (key) Return from cache the value associated with the key.
  • the invention does not make any assumption on the way they are actually implemented by the software application 10 provided the expected level of performance can be reached.
  • bulk operations are defined which makes possible to send and process several basic operations together.
  • the main part of the system is on the software application 10 tier to control data distribution over all cache nodes 30. Key/value data are spread among the nodes composing the cache. To obtain that distribution be as much as possible equally spread over all nodes a property of the key is extracted and the corresponding cache node 30 is computed by the formula:
  • node_number key_property_as_a_number MODULO the number_of_nodes
  • Flight oriented data use the property that consecutive flight numbers are usually used for flight having same properties. In this case the flight number is directly used as a base for the distribution.
  • Figure 2 and 3 show how cache is populated and maintained coherent with database contents under the sole control of the software application 10.
  • Figure 2 describes the process that eventually permits to obtain in software application 10 a data requested by an end user and which is not yet present in cache. This situation mostly prevails when a cache is being populated, e.g., after a power-on of the system or because a new node has been inserted or removed and a rebalancing of the cache node 30 contents is in progress.
  • cache is first read through a "Get" operation 210.
  • this is for example to answer one of the numerous user requests that are issued by end users of the database to find if seats are available in a particular flight on a certain date, in a certain class, etc.
  • cache If the corresponding data is not present in cache, i.e., typically the corresponding data has not yet been brought in cache by a previous read, cache then returns a "Miss" 220 to the software application 10. Otherwise, the information is obviously just returned to the software application 10 from the cache which ends the "Get” operation.
  • the software application 10 can thus fulfill the user request of the end-user.
  • Additional data are typically other data that may be necessarily retrieved to fulfill the user request. For instance, some data can be get from a cache node, while other data that are also necessary to fulfill the same user request must be get from other cache nodes and/or must be read from the database systems 20.
  • the software application 10 Upon receiving the information that queried data is not present in cache, the software application 10 interrogates the database with a "Read" operation 230. The missing information is then returned 240 to the software application 10. Reading of the data from the database occurs on the database dedicated interface 12 previously described. This is done by issuing, from the software application 10, the corresponding queries to the database management system (DBMS) used by the data storage system 100 of the invention.
  • DBMS database management system
  • the software application 10 Upon receiving from database the data missing in the cache the software application 10 then performs an "Add" operation 250 to store the data into the cache. From this time on, the data is present 270 in cache as long as cache stays operational and is not reconfigured. At completion of this operation a positive acknowledgement (OK) 260 is returned to the software application 10.
  • OK positive acknowledgement
  • this process occurs only once while cache is up and running for any given pieces of data that are stored identically or consistently in database and in cache nodes 30. This occurs the first time the data is requested by software application 10 and is not yet present in cache. After which corresponding data is possibly updated if database contents needs to be changed, for example, because airline seats have been sold. In this case, as described hereafter, the software application 10 updates both the cache and the database so that it is never necessary to re-execute the process of Figure 2.
  • Figure 3 describes the process of updating concurrently database and cache from the software application 10.
  • the software application 10 To always keep coherent database and cache contents, the software application 10 always updates both cache and database. The updating of the cache is then done with a "Set” operation 310 previously described. Simultaneously, an "Update" 305 of the database is performed using the query language of the DBMS in use. The update is effective after the operation has been committed 320 to the database by the application.
  • the Set is not done when the update is done in database but when the commit is done.
  • the application keeps the data to be set in memory until the commit is done. There are possibly a high number of steps between the Update 305 and the Set 310. However, the Set 310 and the Commit 320 are intended to be performed in a row.
  • cache of the invention is populated both from read and writes operations since the process of Figure 3 does not assume that any particular conditions need to be fulfilled to write into the cache. This contributes significantly to expedite the population of the cache nodes 30 after a power-on as compared to systems where only reads are used to populate cache. This is possible and is thus simply done because, as already stated, data entities stored by database and cache are both kept updated which is not the case in other cache solutions where database and cache contents may be significantly different generally in an attempt to keep cache storage requirements minimum or when cache data entities delivered to the software application 10 are built from disjoint pieces of data extracted from various parts of the database.
  • Figure 4 describes the process of Figure 2 in the particular case where a concurrent writing (update for instance) of the database is requested by the software application 10 thus interfering with its execution.
  • cache contents must not be further updated by the following "Add" 250 that results from the fetching 230 of the missing data from the database since this latter has been updated in the mean time.
  • the "Add” 252 is then actually aborted.
  • a negative acknowledgement (KO) 262 is returned which let know the software application 10 that the update of the cache has not been actually performed by the "Add" operation.
  • the invention uses the add command so that we can send data to the cache without having to lock the data in database. Indeed, if the data is still not in the cache when trying to add it, it will effectively be added. If it has been updated in the meanwhile by an update process, the add will fail but this is expected: the update process had the lock on the database and so the primacy on the update for this key, hence it is normal this is the one that stays in the cache.
  • FIG. 5 gives further details on the timing of the data updates performed simultaneously by software application 10 in database and in cache.
  • the software application 10 begins the update transaction by issuing the corresponding queries 510 to the database to retrieve the current stored values. Simultaneously, to prevent concurrent updates to occur from another software application 10, the database management system (DBMS) locks the current stored values. Within software application tier, data are processed by software application 10. When data are ready to be updated 530 by DBMS an update of a buffer cache 540 in software application 10 is also performed that holds the new data to be forwarded and stored in cache.
  • DBMS database management system
  • Figure 6 describes the case where queried data is neither present in cache nor in database. This covers the cases where end-users are requesting pieces of information that are not held in database.
  • the absence of corresponding data is also recorded into the cache. Then, next time the cache is interrogated from the software application 10 the information that the queried data is not present in database is directly delivered by the cache itself thus further reducing the database load.
  • the process is similar to the one described in Figure 2. After a "Get" operation 210 issued to cache has returned a "Miss" 220, reading 230 of the corresponding data in the database also returns to the software application 10 a database "Miss" 640. Then, the absence of data is added 650 into the cache. Like with data, the absence of data is becoming available immediately 270 in cache which also returns an acknowledgment 260 to the software application 10.
  • each data is associated with a header to form a record and the header indicates whether the content is missing in the database system 20.
  • the cache node stores a specific value associated to the data, said specific value indicating that the data is not present in the database. Thus, reading only the value of the record enables to know whether it is worth fetching the database system.
  • Figure 7 illustrates the case already mentioned in figure 3 where the specific update operation from the application is a delete 705 of data from the database. This operation is overall performed as explained in figure 3 except that deleted data is not actually removed from the cache but replaced by the indication of an "absence of data".
  • delete is committed 320 to the database by the application the corresponding information is stored in cache with a specific "SET" operation 310.
  • the "absence of data” is becoming immediately available 330.
  • cache is later interrogated it can provide directly the information that the requested data is no longer neither present in cache nor in database.
  • the invention does not assume any particular way of computing keys from the data entities that are stored and retrieved identically from database and cache. Most of the time, depending on the type of data to be handled by a particular application, some hashing function is used and the node addressing is then just derived from the hashed key by further computing it modulo the number of nodes. Hence, if the number of nodes is changed, a different result is obtained for retrieving a particular data entity that possibly needs to be looked for in a different node of the new configuration.
  • the problem comes from the fact that a configuration update is not atomic and must be transparently performed while database system is fully operational. Not all cache clients are made aware of the new configuration at the same time. This means that some writes of data would be done on the basis of the new configuration while others could still use the old configuration. The result would be an inconsistent set of data between cache and database.
  • Dual feeding consists in maintaining one extra configuration in addition to the one normally used for the cache, hence the name of "dual-feed”.
  • the extra configuration is not used by default but can be activated for the time of the configuration change. When it is activated, all write operations are sent both to the standard configuration and to the dual-feed configuration.
  • a time-to-live (TTL) is a property associated to each item in the cache. As the name suggests, it corresponds to the period of time item is valid. Once it expires, the item can no longer be retrieved from the cache, resulting in a cache miss as if the data was missing. This can be set by configuration: one for the standard configuration and one for the dual-feed configuration. When no time to live is set, the item never expires.
  • dual-feed configuration As the activation of dual-feed configuration is not atomic either, it must be activated in a first place with a short time to live. Once the dual feed configuration is fully activated, the time to live can be removed. It is only once the time to live has expired that the standard configuration and the dual-feed configuration can be swapped. Once the configuration change is over, the dual- feed can be deactivated. During the steps where the configuration is being propagated (activation / deactivation of dual-feed), some "invalid" data can be written but only in places where they are not read. Thus, the procedure is as follows:
  • the proposed architecture offers such scalability that the whole system may not be later in a position to work properly without the cache.
  • all maintenance operations are meant to be done online, impacting at most one node (or the equivalent proportion of the traffic) at a time (eg. cache node upgrade or replacement made one by one, global cache changes performed using a dual feed mechanism) to lower the impact on the database.
  • the system will preferably use the database to retrieve the data that should have been hosted by this node.
  • the present invention allows keeping data consistent between the cache and the database thanks to a mechanism which is non-strictly speaking ACID compliant but highly scalable, impactless on the database, allowing 100% hit ratio and, above all, fully meeting data quality needs.
  • the invention allows to cache highly dynamic data i.e., typically up to several tens writings per second per unitary data, while still benefiting from the off-load effect of the cache.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
PCT/EP2013/002655 2012-09-27 2013-09-04 Method and system of storing and retrieving data WO2014048540A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
AU2013324689A AU2013324689B2 (en) 2012-09-27 2013-09-04 Method and system of storing and retrieving data
SG11201501650WA SG11201501650WA (en) 2012-09-27 2013-09-04 Method and system of storing and retrieving data
IN1332DEN2015 IN2015DN01332A (enrdf_load_stackoverflow) 2012-09-27 2013-09-04
JP2015533468A JP6511394B2 (ja) 2012-09-27 2013-09-04 データの保存および取得の方法およびシステム
KR1020157007498A KR101690288B1 (ko) 2012-09-27 2013-09-04 데이터를 저장하고 검색하는 방법 및 시스템
CA2882498A CA2882498C (en) 2012-09-27 2013-09-04 Method and system of storing and retrieving data
CN201380050168.7A CN104662539B (zh) 2012-09-27 2013-09-04 存储并检索数据的方法和系统

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP12368027.4 2012-09-27
EP12368027.4A EP2713284B1 (en) 2012-09-27 2012-09-27 Method and system of storing and retrieving data
US13/628,517 2012-09-27
US13/628,517 US9037801B2 (en) 2012-09-27 2012-09-27 Method and system of storing and retrieving data

Publications (1)

Publication Number Publication Date
WO2014048540A1 true WO2014048540A1 (en) 2014-04-03

Family

ID=49150900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/002655 WO2014048540A1 (en) 2012-09-27 2013-09-04 Method and system of storing and retrieving data

Country Status (8)

Country Link
JP (1) JP6511394B2 (enrdf_load_stackoverflow)
KR (1) KR101690288B1 (enrdf_load_stackoverflow)
CN (1) CN104662539B (enrdf_load_stackoverflow)
AU (1) AU2013324689B2 (enrdf_load_stackoverflow)
CA (1) CA2882498C (enrdf_load_stackoverflow)
IN (1) IN2015DN01332A (enrdf_load_stackoverflow)
SG (1) SG11201501650WA (enrdf_load_stackoverflow)
WO (1) WO2014048540A1 (enrdf_load_stackoverflow)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125138A (zh) * 2019-12-26 2020-05-08 深圳前海环融联易信息科技服务有限公司 一种轮询查询数据的方法、装置、计算机设备及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106846802B (zh) * 2017-02-09 2021-01-05 陕西公路交通科技开发咨询公司 一种高速路数据处理方法及装置
KR102415155B1 (ko) 2018-05-11 2022-06-29 삼성에스디에스 주식회사 데이터 검색 장치 및 방법
FR3081238A1 (fr) * 2018-05-17 2019-11-22 Amadeus S.A.S. Mise en memoire cache de base de donnees
FR3092920B1 (fr) * 2019-02-14 2022-04-01 Amadeus Traitement d’interrogations de base de données complexes
SG10202008564PA (en) * 2020-09-03 2021-12-30 Grabtaxi Holdings Pte Ltd Data Base System and Method for Maintaining a Data Base
CN116521969B (zh) * 2023-02-28 2023-12-29 华为云计算技术有限公司 一种数据检索方法、服务端、系统及相关设备

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6609126B1 (en) * 2000-11-15 2003-08-19 Appfluent Technology, Inc. System and method for routing database requests to a database and a cache
US20100180208A1 (en) * 2009-01-15 2010-07-15 Kasten Christopher J Server side data cache system

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08147201A (ja) * 1994-11-18 1996-06-07 Nippon Telegr & Teleph Corp <Ntt> トラヒックデータキャッシュ方法
US6256710B1 (en) * 1995-04-28 2001-07-03 Apple Computer, Inc. Cache management during cache inhibited transactions for increasing cache efficiency
US6067550A (en) * 1997-03-10 2000-05-23 Microsoft Corporation Database computer system with application recovery and dependency handling write cache
US7434000B1 (en) * 2004-06-30 2008-10-07 Sun Microsystems, Inc. Handling duplicate cache misses in a multithreaded/multi-core processor
US7415487B2 (en) * 2004-12-17 2008-08-19 Amazon Technologies, Inc. Apparatus and method for data warehousing
US8417680B2 (en) * 2005-12-02 2013-04-09 International Business Machines Corporation System for improving access efficiency in database and method thereof
US20090204583A1 (en) * 2006-06-08 2009-08-13 International Business Machines Corporation Method for providing access to data stored in a database to an application
US7711657B1 (en) * 2006-06-26 2010-05-04 Hewlett-Packard Development Company, L.P. Resource-reservation pricing structures based on expected ability to deliver
US8095618B2 (en) * 2007-03-30 2012-01-10 Microsoft Corporation In-memory caching of shared customizable multi-tenant data
JP5163171B2 (ja) * 2008-02-15 2013-03-13 日本電気株式会社 キャッシュシステムおよびサーバ
CN102103523A (zh) * 2009-12-22 2011-06-22 国际商业机器公司 锁分配控制的方法和装置

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6609126B1 (en) * 2000-11-15 2003-08-19 Appfluent Technology, Inc. System and method for routing database requests to a database and a cache
US20100180208A1 (en) * 2009-01-15 2010-07-15 Kasten Christopher J Server side data cache system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125138A (zh) * 2019-12-26 2020-05-08 深圳前海环融联易信息科技服务有限公司 一种轮询查询数据的方法、装置、计算机设备及存储介质
CN111125138B (zh) * 2019-12-26 2023-08-25 深圳前海环融联易信息科技服务有限公司 一种轮询查询数据的方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
AU2013324689B2 (en) 2016-07-07
JP6511394B2 (ja) 2019-05-15
CA2882498C (en) 2020-11-17
AU2013324689A1 (en) 2015-04-09
CA2882498A1 (en) 2014-04-03
KR101690288B1 (ko) 2016-12-28
KR20150075407A (ko) 2015-07-03
CN104662539A (zh) 2015-05-27
JP2015535995A (ja) 2015-12-17
IN2015DN01332A (enrdf_load_stackoverflow) 2015-07-03
CN104662539B (zh) 2018-02-23
SG11201501650WA (en) 2015-04-29

Similar Documents

Publication Publication Date Title
US9037801B2 (en) Method and system of storing and retrieving data
CA2882498C (en) Method and system of storing and retrieving data
US20200081879A1 (en) Persistent data storage techniques
KR101623663B1 (ko) 클라이언트/서버 시스템에서 분산된 복제 콘텐츠들의 강한 일관성을 유지하는 방법 및 시스템
US20200167370A1 (en) Maintaining a relationship between two different items of data
US11520770B2 (en) System and method for providing high availability data
JP5080478B2 (ja) 大規模データベースとインターフェースを取ることを目的とする多層ソフトウェア・システムにおいてキャッシュの内容の一貫性を維持するためのシステムおよび方法
US9621409B2 (en) System and method for handling storage events in a distributed data grid
JP5292489B2 (ja) 持続的データ記憶技術
US7603389B2 (en) Optimized statement caching for transaction replay
Rys Scalable sql: How do large-scale sites and applications remain sql-based?
US10776165B2 (en) Optimized database resource handling
EP3146430A1 (en) System and method for supporting a distributed data structure in a distributed data grid
EP2713284B1 (en) Method and system of storing and retrieving data
WO2015179092A1 (en) System and method for supporting a distributed data structure in a distributed data grid
Forsythe et al. A Four-Tier Model of A Web-Based Book-Buying System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13759667

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2882498

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2015533468

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20157007498

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2013324689

Country of ref document: AU

Date of ref document: 20130904

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 13759667

Country of ref document: EP

Kind code of ref document: A1