WO2013064815A1 - Method and database system for manipulating data - Google Patents

Method and database system for manipulating data Download PDF

Info

Publication number
WO2013064815A1
WO2013064815A1 PCT/GB2012/052697 GB2012052697W WO2013064815A1 WO 2013064815 A1 WO2013064815 A1 WO 2013064815A1 GB 2012052697 W GB2012052697 W GB 2012052697W WO 2013064815 A1 WO2013064815 A1 WO 2013064815A1
Authority
WO
WIPO (PCT)
Prior art keywords
database management
management system
data
database
performance metric
Prior art date
Application number
PCT/GB2012/052697
Other languages
French (fr)
Inventor
Pete CHEYNE
Michael Joseph GRUNDER
Matthew David Wells
Original Assignee
Performance Horizon Group
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Performance Horizon Group filed Critical Performance Horizon Group
Publication of WO2013064815A1 publication Critical patent/WO2013064815A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the invention relates to a method and a database system for manipulating data. More particularly, the invention relates to a method and database system for manipulating data wherein the database system comprises a plurality of database management systems.
  • a relational database management system such as MySQL
  • MySQL stores data in a table.
  • the data in a relational database management system is stored on a hard disc within a computer system. This provides advantages in the robustness of the database management system and persistence of the data stored therein.
  • the term "persistence" as used herein encompasses the ability of the data to be retrieved from a storage medium after a loss of electrical power.
  • access to a database provided on a hard disc may have high latency.
  • a further example of a database is a non-SQL database management system such as the key-value data store database management system, Redis.
  • Redis typically stores data in RAM, which has advantages of low latency when accessing the database.
  • the robustness of the data is low compared to a relational database management system and the data may also lack persistence.
  • a method for manipulating data in a database system comprising a plurality of database management systems, the method comprising: determining a performance metric for at least one of the plurality of database management systems; selecting, in dependence on the performance metric, at least one database management system of the plurality of database management systems; and controlling the at least one database management system to manipulate the data.
  • database system encompasses a system comprising at least one data storage element and at least one control element, wherein the control element controls at least the writing of data to the data storage element, the reading of data from the data storage element and the deletion of data from the data storage element.
  • the data storage elements are typically comprised within a database server.
  • database management system encompasses a software package comprising computer programs to control access to a database.
  • the database management system is typically run on a database server.
  • the performance of each may be assessed.
  • the database management system that performs the best may therefore be selected for use when manipulating the data, which provides improved efficiency of the database system.
  • the performance metric may be the time taken to manipulate previous data using the at least one database management system.
  • the time taken to manipulate previous data may be an indication of the likely time that will be taken to manipulate the present data and may therefore be useful information when selecting the at least one database management system to optimise the speed with which the data is manipulated.
  • selecting the at least one database management system may comprise selecting the database management that has taken the least time to manipulate previous data.
  • selecting at least one database management system may comprise selecting a database management system if the performance metric is 500 milliseconds or less.
  • selecting at least one database management system may comprise selecting a database management system if the performance metric is 250 milliseconds or less.
  • the performance metric may comprise the time taken to write previous data to the database system using the at least one database management system.
  • controlling the at least one database management system may comprise writing the data to the database system using the at least one database management system.
  • the performance metric may comprise the time taken to read previous data from the database system using the at least one database management system.
  • controlling the at least one database management system may comprise reading the data from the database system using the at least one database management system.
  • the performance metric may comprise the time taken by the at least one database management system to delete previous data from the database system.
  • the method may further comprise updating the performance metric for the selected at least one of the plurality of database management systems based on the performance of the selected at least one database management system when controlled to manipulate the data.
  • updating the performance metric may comprise determining the average performance metric over a set time period.
  • the set time period may be from 30 seconds to 90 seconds.
  • the set time period may be 60 seconds.
  • the set time period may be from 4 minutes to 6 minutes.
  • the set time period may be 5 minutes.
  • the set time period may be from 25 minutes to 35 minutes.
  • the set time period may be 30 minutes.
  • the set time period may be from 1 10 minutes to 130 minutes.
  • the set time period may be 120 minutes.
  • updating the performance metric may comprise determining the average performance metric over a set number of data manipulation operations.
  • the plurality of database management systems may comprise a plurality of database management system types.
  • Including a plurality of types of database management systems e.g. SQL and non-SQL database management systems
  • SQL and non-SQL database management systems enables the advantages of each of the plurality of types to be utilised according to the characteristics of the data to be manipulated.
  • the plurality of database management systems may comprise a plurality of the same database management system type.
  • the plurality of database management systems may comprise a relational database management system.
  • relational database management system may comprise a MySQL database management system.
  • the plurality of database management systems may comprise a non-SQL database management system.
  • non-SQL encompasses the body of database management systems that are not relational database management systems and, more specifically, are not based on an SQL database management system.
  • the non-SQL database management system may comprise a key-value data store database management system.
  • the key-value data store database management system may comprise a Redis database management system.
  • the non-SQL database management system may comprise a document oriented database management system.
  • the document oriented database management system may comprise a MongoDB database management system.
  • the selected at least one database management system may comprise more than one database management system.
  • the selected at least one database management system may comprise all of the plurality of database management systems.
  • the data may be manipulated by all of the plurality of database management systems. This may be done sequentially or non-sequentially. This therefore produces a database that is replicated by all of the more than one database management systems within the database system.
  • Each data manipulation type e.g. write, read or delete, may be undertaken first by the best performing database management system for that type of data manipulation.
  • the remaining database management systems may be scheduled to undertake the data manipulation later. This optimises the execution of each type of data manipulation such that it is carried out by the highest performing database management system while ensuring that the correct data is replicated across all database management systems.
  • the selected at least one database management system may comprise a first database management system and at least one further database management system, the first database management system being selected based on the determined performance metric, and wherein the first and at least one further database management systems are each controlled to manipulate the data.
  • the first database management system and the at least one further database management system may be controlled sequentially.
  • the term "sequentially” encompasses manipulating the data using the first and at least one further database management systems in consecutive database system instructions. In this way, the versions of the database of each of the first and at least one further database management systems may be maintained to be consistent with each other.
  • faults with one or more of the database management systems may be identified quickly. That is, a fault with the further database management system may be identified soon after the manipulation of the data by the first database management system.
  • the first database management system and the at least one further database management system may be controlled non-sequentially.
  • non-sequentially encompasses manipulating the data using the first database management system and scheduling the manipulation of the data using the at least one further database management system at a later time.
  • Non-sequential manipulation of the data allows processor resources to be utilised more efficiently during time critical operations.
  • the method further comprises receiving computer program instructions to manipulate data, and controlling the at least one database management system may comprise compiling and issuing commands to the selected at least one database management system based on the received computer program instructions.
  • the method provides for a developer of a database system to manipulate data in computer program instructions of a single computer programming language.
  • the method provides for the compilation and issuance of the computer program instructions into commands specific to the at least one selected database management system.
  • the developer may therefore access and utilise any of the database management systems using only a single computer programming language.
  • the method may further comprise manipulating the data using the at least one database management system.
  • a computer program product comprising computer readable code configured to carry out the method described above.
  • a database system for manipulating data comprising: a plurality of database management systems; a processor in electrical communication with each of the plurality of database management systems and configured to receive instructions for manipulating data in the database system, wherein the processor is further configured to determine a performance metric for at least one of the plurality of database management systems; select, in dependence on the performance metric, at least one database management system of the plurality of database management systems; and control the at least one database management system to manipulate the data.
  • the plurality of database management systems may comprise a relational database management system.
  • relational database management system may comprise a MySQL database management system.
  • the plurality of database management systems may comprise a non-SQL database management system.
  • the non-SQL database management system may comprise a key-value data store database management system.
  • the key-value data store database management system may comprise a Redis database management system.
  • the non-SQL database management system may comprise a document oriented database management system.
  • the document oriented database management system may comprise a MongoDB database management system.
  • the processor is in electrical communication with each of the plurality of database management systems via a network.
  • a method for manipulating data in a database system comprising a plurality of database management systems, the method comprising: determining at least one characteristic of data to be manipulated; selecting, in dependence on the at least one characteristic, at least one database management system of the plurality of database management systems; and controlling the at least one database management system to manipulate the data.
  • each database management system of the plurality of database management systems may provide different advantages to the storage of data having different characteristics. Therefore, by determining the data characteristics it is possible to make best use of the database management systems.
  • determining the at least one characteristic may comprise determining the required persistence of the data to be manipulated.
  • the at least one characteristic may be one of persistent data and non-persistent data.
  • selecting at least one database management system may comprise selecting a relational database management system if the at least one characteristic is persistent data.
  • the term “persistent data” encompasses data that is required to persist if electrical power to the database system is lost. Such data may be data that is of high importance and/or is to be used as a backup for data stored in a different database management system providing less persistence of data.
  • the term “non-persistent data” encompasses data that is not required to persist after a loss of electrical power to the database system. Non-persistent data need not necessarily be required to be lost after such an electrical power loss. Non-persistent data may therefore include a "don't care" scenario in which the fate of the data following an electrical power loss is not of great importance.
  • a disc based storage medium By selecting the relational database management system for storing persistent data the advantage of a disc based storage medium may be utilised. That is, a disc based medium is non-volatile and so the data stored on the disc will persist following a loss of electrical power.
  • determining the at least one characteristic may comprise determining the size of the data to be manipulated.
  • selecting at least one database management system may comprise selecting the key-value data store database management system if the size of the data is less than a low data size threshold value.
  • the low data size threshold value may be configurable dependent on the type of data to be manipulated.
  • selecting at least one database management system may comprise selecting the document oriented database management system if the size of the data is greater than a high data size threshold value.
  • the high data size threshold value may be configurable dependent on the type of data to be manipulated.
  • the document oriented database management system may manipulate some data in RAM and some data on disc. Therefore, the document oriented database management system may provide advantages regarding fast access with the ability to manipulate larger amounts of data.
  • the method may further comprise selecting a first database management system in dependence on the at least one characteristic, and selecting a second database management system to be a relational database management system.
  • a computer program product comprising computer readable code configured to execute the method described above.
  • a database system for manipulating data comprising: a plurality of database management systems; a processor in electrical communication with each of the plurality of database management systems and configured to receive instructions for manipulating data in the database system, wherein the processor is further configured to determine a characteristic of data to be manipulated, select, in dependence on the at least one characteristic, at least one database management system of the plurality of database management systems; and controlling the at least one database management system to manipulate the data.
  • Different database management systems may make use of more than one data storage medium.
  • one database management system may emphasize in, and therefore be optimised for use with, a particular data storage medium.
  • a relational database such as MySQL may qualify in the use of a disc based data storage medium.
  • a key value data store database management system may qualify in the use of a memory (e.g. RAM) data storage medium.
  • Figure 1 is a block schematic diagram of a database system
  • Figure 2 is a flow diagram showing a method of manipulating data in a database system according to a first embodiment
  • Figure 3 is a flow diagram showing a method of manipulating data in a database system according to a second embodiment.
  • Figure 4 is a flow diagram showing a method of manipulating data in a database system according to a third embodiment.
  • the method and system provide technical benefits in the speed of manipulation of data and the increased integrity of data manipulated within the database system.
  • the database system 100 comprises a computer 102, a computer network 104 and a plurality of database servers 106a-106c.
  • the computer 102 comprises computer program instructions 108 produced by a database developer to control the database system and a computer processor 1 10 configured to implement a method for manipulating data within the database system 100.
  • the computer program instructions 108 may not, in reality, be comprised within the computer 102 but may have been produced remotely and at a different time to be executed on the processor 1 10 of the computer 102.
  • the processor 1 10 is therefore configured to receive computer program instructions 108 for manipulating data within the database system 100.
  • Each of the plurality of database servers 106a-106c comprises a database management system 1 12a-1 12c and a data storage medium 1 14a-1 14c.
  • the database system 100 therefore comprises a plurality of database management systems 1 12a-1 12c that manage access to databases stored on the data storage media 1 14a-1 14c.
  • the database management systems 1 12a-1 12c are software packages configured to run on processors within the database servers 106a-106b.
  • the database management systems 1 12a-1 12c control the creation, maintenance and use of a database stored on the data storage media 1 14a-1 14c.
  • the database management systems 1 12a-1 12c may provide functions for controlling data access, enforcing data integrity and recovering data after failures.
  • the processor 1 10 is in electrical communication with the plurality of database management systems 1 12a-1 12c.
  • the electrical communication is provided via the computer network 104. It will be understood that the electrical communication may be provided by other means such as local electrical communication connections wherein each of the elements of the database system 100 are located in close proximity, e.g. in the same room or building.
  • the computer 102 may be located remotely from the plurality of database servers 106a-106c and communication between the processor 1 10 and the plurality of database management systems 1 12a-1 12c is provided by the network 104.
  • the network may, for example, be the Internet or any LAN or WLAN according to the requirements of the database system.
  • the database servers 106a-106c may be located at a database farm maintained at a location remote to the computer 102.
  • the processor 1 10 may be configured to undertake the methods as described below.
  • the database management system 1 12a on the database server 106a may be a relational database management system such as MySQL.
  • the data storage medium 1 14a may be a disc drive.
  • the database management system 1 12b on the database server 106b may be a key-value data store database management system such as Redis.
  • the data storage medium 1 14b may be RAM and/or Flash memory.
  • the database management system 1 12c on the database server 106c may be a document oriented database management system such as MongoDB.
  • the data storage medium 1 14c may be a combination of RAM and/or Flash memory and a disc drive.
  • the database system 100 is not limited to the database management systems disclosed above.
  • the database system 100 may comprise any other database management system 1 12a-1 12c as required by the database developer.
  • FIG. 2 there is shown a flow diagram of a method for manipulating data in a database management system 100 comprising a plurality of database management systems 1 12a-1 12c according to a first embodiment.
  • the plurality of database management systems 1 12a-1 12c may comprise a plurality of different types of database management system as specified above.
  • the plurality of database management systems may comprise SQL and non-SQL database management systems, e.g. a relational database management system, a key-value data store database management system and a document oriented database management system.
  • instructions 108 are received for the manipulation of data within the database system 100.
  • the instructions may be received by the processor 1 10 in the form of computer program instructions 108 to manipulate the data.
  • the computer program instructions 108 may be in a specific computer programming language.
  • the programming language may comprise functions specific to that language for the manipulation of data.
  • the computer program instructions 108 may comprise any instruction types for manipulating data within a database system 100 including but not limited to write data instructions, read data instructions and delete data instructions.
  • the computer program instructions 108 may also comprise instructions relating to the administration of a database system 100 such as instructions to set up a database system 100 and also to recover a database following database failure.
  • At step 202 at least one performance metric of at least one of the plurality of database management systems 1 12a-1 12c is determined.
  • a database management system may be selected based on which of them is performing best. Under initial conditions, i.e. the first time data is manipulated in the database system 100, the method may manipulate the data according to the computer instructions 108 using each of the plurality of database management systems 1 12a-1 12c. This may be undertaken sequentially or non-sequentially. After the initial manipulation of data, the performance metrics for each of the plurality of database management systems 1 12a- 1 12c is determined and stored. These performance metrics may then be used to select the at least one database management system for future manipulation of data. The performance metric may be the time taken for a database management system 1 12a-1 12c to manipulate previous data.
  • the processor 1 10 may store the time taken for each of the plurality of database management systems 1 12a-1 12c to manipulate previous data. Based on the determined time, selecting the at least one database management system 1 12a-1 12c may comprise selecting the database management system that has taken the least time to manipulate previous data.
  • Each database management system of the plurality of database management systems 1 12a-1 12c may therefore have associated with it one or more performance metrics.
  • Each of the one or more performance metrics may correspond to a specific type of data manipulation.
  • each database management system may have a performance metric for writing data to the database system 100, for reading data from the database system 100, and for deleting data from the database system 100. Therefore, the received computer instruction 108 may be processed to determine the type of data manipulation required and the correct set of determined performance metrics may be used to select at least one database management system to manipulate the data.
  • the fastest performing database management system may be used for each type of data manipulation.
  • problems with the electrical communication of the processor 1 10 to the plurality of database management systems 1 12a-1 12c may also be incorporated into the performance metric. That is, if the network over which electrical communication is established is not performing well due to high network traffic volumes or problems with the network, then the time taken for one or more database management systems to undertake data manipulation may be increased. The increased time (and therefore reduced performance) results in an alternative database management system being selected to manipulate future data. The method is thereby able to overcome problems associated with the use of a particular network or network router.
  • the method of Figure 2 may further comprise the step of updating the performance metric for the selected at least one of the plurality of database management systems 1 12a-1 12c based on the performance of the selected at least one database management system when manipulating the data. That is, in embodiments in which the performance metric relates to the time taken to manipulate data, then the time taken to manipulate the present data is used to update the performance metric of the selected database management system.
  • the updated performance metrics may be stored and used to select database management system for future data manipulation.
  • updating the performance metric may comprise determining an average performance metric over a set time period.
  • the set time period may, for example be in the range from 30 seconds to 90 seconds. In specific embodiments, the set period of time may be 60 seconds.
  • the set time period may be in the range from 4 minutes to 6 minutes. In specific embodiments, the set time period may be 5 minutes. The set time period may, for example be in the range from 25 minutes to 35 minutes. In specific embodiments, the set period of time may be 30 minutes. The set time period may, for example be in the range from 1 10 minutes to 130 minutes. In specific embodiments, the set period of time may be 120 minutes.
  • updating the performance metric may comprise determining an average performance metric over a set number of data manipulation operations. That is, the performance metric for a database management system for writing data to the database system may be updated by determining the average time taken by the database management system to write data to the database system over a number of previous data write operations. The same algorithm may be used for read operations, delete operations or any other type of data manipulation operation.
  • the performance metric may be specific to each instruction. Therefore, if data manipulation comprises writing data to a database system 100, the database management system 1 12a-1 12c having the best performance for writing data may be selected. In embodiments, therefore, the performance metric may be the time taken for a particular database management system to perform a write instruction.
  • the database management system 1 12a-1 12c having the best performance for reading data may be selected.
  • the performance metric may be the time taken for a particular database management system to perform a read instruction.
  • the database management system 1 12a-1 12c having the best performance for deleting data may be selected.
  • the performance metric may be the time taken for a particular database management system to perform a delete instruction.
  • the database system 100 to be optimised for each type of instruction for the manipulation of data as the best performing database management system 1 12a-1 12c is selected for each specific data manipulation instruction.
  • This is suited to embodiments of the invention in which data is manipulated by more than one selected database management system as the data manipulation is replicated on all database servers 106a-106c, either sequentially or non-sequentially.
  • the processor 1 10 is free to select any of the plurality of database management systems 1 12a-1 12c for the next data manipulation operation.
  • At step 204 at least one database management system 1 12a-1 12c is selected in dependence on the determined performance metric.
  • each of the database management systems 1 12a- 1 12c may provide different advantages and disadvantages regarding the execution of different types of data manipulation.
  • the method may utilise each of the different database management systems based on their current performance to provide for a more efficient database system 100.
  • the at least one database management system 1 12a-1 12c is controlled to manipulate the data.
  • the database management system 1 12a-1 12c may be controlled by the processor 1 10.
  • controlling the at least one database management system 1 12a-1 12c may comprise compiling, by the processor 1 10, commands to the selected at least one database management system 1 12a-1 12c to manipulate the data in accordance with the computer program instructions 108 received by the processor 1 10 and sending those commands to the selected at least one database management system 1 12a-1 12c.
  • the selected at least one database management system 1 12a-1 12c may then manipulate the data accordingly.
  • manipulating the data may comprise compiling the commands for the selected at least one database management system 1 12a-1 12c based on the computer program instructions 108 received in the computer programming language. That is, the processor 1 10 may be configured to compile commands for each of the plurality of database management systems 1 12a-1 12c in response to receiving computer program instructions 108 in one or more specific computer programming languages.
  • the method may further comprise providing, by the processor 1 10, a library of functions in a single computer programming language for manipulation of data.
  • the database developer may select a function for use when manipulating data.
  • the method may further comprise receiving, by the processor 1 10, computer program instructions 108 in the form of one or more of the provided functions for the manipulating the data, and mapping, by the processor 1 10, the selected function to a command for the selected at least one database management system 1 12a-1 12c.
  • the method and system may provide a means for controlling multiple database management systems 1 12a-1 12c using a single computer programming language.
  • the data is manipulated using the selected at least one database management system 1 12a-1 12c.
  • selecting a database management system 1 12a-1 12c may comprise selecting more than one database management systems 1 12a-1 12c. This allows the data to be manipulated using more than one database management system 1 12a- 1 12c and therefore allows the method and system disclosed herein to utilise the advantages of each database management system 1 12a-1 12c.
  • the selected at least one database management system comprises all of the plurality of database management systems 1 12a-1 12c.
  • a first database management system is selected based on the determined performance metric and at least one further database management system is also selected.
  • the at least one further database management system may comprise all of the remaining database management systems.
  • the data may be manipulated sequentially by the first and at least one further database management systems 1 12a-1 12c. That is, the manipulation of the data using the selected and further database management systems may be by consecutive instructions of the processor 1 10. Manipulation by subsequent further database management systems may also be by consecutive instructions of the processor 1 10.
  • the method comprises controlling the first database management system to manipulate the data, and subsequently controlling the at least one further database management system to manipulate the data non-sequentially. That is, the data may be manipulated by the first database management system as soon as possible and manipulation of the data by the at least one further database management system is scheduled for some time later. This allows for efficient use of processor 1 10 resources as the second and subsequent manipulations may be scheduled for a time when the processor 1 10 is not in use or is not under a high load. In embodiments where more than one of the plurality of database management systems is selected, a suitable version control algorithm may also be implemented to ensure that data is replicated across all database servers 106a-106b in the database system 100.
  • FIG. 3 there is shown a flow diagram of a method for manipulating data in a database management system 100 comprising a plurality of database management systems 1 12a-1 12c.
  • the method according to Figure 3 comprises the steps of the first embodiment except that the step of determining a performance metric is replaced by the step 302 of determining a characteristic of the data, and the step 304 of selecting at least one database management system 1 12a-1 12c is in dependence on the determined characteristic.
  • the instructions 108 may identify to the processor 1 10 the type of data to be manipulated in the database system 100, e.g. by writing, reading or deleting.
  • the instructions 108 may comprise information that may be used by the processor 1 10 to determine a characteristic of the data to be manipulated. For example, if the data to be manipulated is a specific type of variable, such as an integer or string, then the size of that variable, i.e. the number of bits in memory storage, may be known. Determining a characteristic of the data to be manipulated may therefore comprise determining a characteristic based on information comprised within the instructions 108.
  • the processor 1 10 may be configured to assign specific characteristics to specific data types.
  • a database may comprise students and their marks achieved in examinations.
  • a data entity may therefore be defined as "student” and may comprise information relating to each student.
  • a further entity may be defined as "marks” and may comprise the marks achieved by each student.
  • the "student” entity may comprise a large amount of data about a student such as name, address, email address, telephone number and candidate number. Therefore, the processor may be configured to determine that the size of the "student” entity is large.
  • the "marks" entity may comprise only one or more integer values relating to marks achieved by students. The processor 1 10 may therefore be configured to determine that the size of a "marks" entity is small.
  • Determining a characteristic of the data to be manipulated may comprise determining the required persistence of the data. If a particular data type has a high importance then it may be determined to be "persistent data". That is, the data may be required to persist following an electrical power outage. If the data to be manipulated is not of high importance then it may be determined to be "non-persistent" data. That is, the fate of the data following an electrical power outage may not be of great importance.
  • information relating to a student may be stored as mentioned above.
  • the information may include student name, address, email address and telephone number.
  • the processor 1 10 may be configured to determine that student name and address are to be assigned as persistent data as these must be retrievable following an electrical power outage.
  • the email address and telephone number of a student may be assigned to be non- persistent data as they are merely additional modes of communication.
  • Determining a characteristic of the data to be manipulated may comprise determining the size of data to be manipulated.
  • the size of the data to be manipulated may be determined as discussed above. In other embodiments the size of the data to be manipulated may be determined by analysis of the data itself. That is, the amount of memory storage occupied by the data may be used to determine the size of the data. It will be understood that the processor may also be configured to determine characteristics of data to be manipulated other than size and persistence. In embodiments of the method wherein the database system 100 comprises a relational database management system 1 12a, such as MySQL, the method may select the relational database management system 1 12a if the determined characteristic of the data to be manipulated is that it is persistent data.
  • Relational database management systems 1 12a typically store and manipulate data using a disc drive. Data stored on disc drives typically persists following an electrical power outage. However, access to disc drives is typically slow when compared to other types of data storage media.
  • the method may select the key-value data store database management system 1 12b if the determined characteristic of the data to be manipulated is that its size is below a low data size threshold value.
  • the low data size threshold value may be configurable dependent on the type of data to be manipulated.
  • the method may select the document oriented database management system 1 12c if the determined characteristic of the data to be manipulated is that its size is above a high data size threshold value.
  • the high data size threshold value may be configurable dependent on the type of data to be manipulated.
  • a first database management system 1 12a-1 12c may be selected based on the determined performance metric as defined above. At least one further database management system may also be selected, e.g. the relational database management system 1 12a.
  • the processor 1 10 may determine that the data to be manipulated is below the low data size threshold and may therefore select the key-value data store database management system 1 12b as the first selected database management system. The at least one further database management system is then selected to be the relational database management system 1 12a.
  • This embodiment provides the advantages of fast access of the key-value data store database management system 1 12b and also the advantage of persistence offered by the relational database management system.
  • the first database management system may be a document oriented database management system 1 12b such as MongoDB, and the at least one further database management system may be a relational database management system 1 12a.
  • FIG. 4 there is shown a flow diagram of a method for manipulating data in a database management system 100 comprising a plurality of database management systems 1 12a-1 12c according to a third embodiment.
  • the methods of the first embodiment and the second embodiment may be combined. That is, the method according to the third embodiment may comprise a step 402 of determining a characteristic of the data to be manipulated, and a step 404 of determining a performance metric for at least one of the plurality of database management systems 1 12a-1 12c. Further, the method of the third embodiment comprises the step 406 of selecting at least one of the plurality of database management systems 1 12a-1 12c in dependence on the determined characteristic and the determined performance metric.
  • the database system 100 comprises means to execute the methods described above.
  • a computer program product may be configured to store computer program code to execute any of the methods described herein.
  • the computer program product may for example comprise a computer hard drive, a floppy disc, CD, DVD, flash memory or other data media.
  • the computer program product may alternatively or additionally comprise programmable logic, FPGAs, ASICs and/or firmware.
  • an apparatus such as a computer, computing device or database system may comprise a microprocessor configured to carry out the method described herein.
  • a microprocessor configured to carry out the method described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for manipulating data in a database system comprising a plurality of database management systems. The method comprising: determining a performance metric for at least one of the plurality of database management systems; selecting, in dependence on the performance metric, at least one database management system of the plurality of database management systems; and controlling the at least one database management system to manipulate the data.

Description

Method and Database System for Manipulating Data Technical Field The invention relates to a method and a database system for manipulating data. More particularly, the invention relates to a method and database system for manipulating data wherein the database system comprises a plurality of database management systems. Background
Various database management systems are available to database system developers. Different database management systems handle data in different ways and therefore present the database system developer with advantages and disadvantages regarding the efficiency and integrity of their database systems.
For example, a relational database management system, such as MySQL, stores data in a table. Commonly the data in a relational database management system is stored on a hard disc within a computer system. This provides advantages in the robustness of the database management system and persistence of the data stored therein. The term "persistence" as used herein encompasses the ability of the data to be retrieved from a storage medium after a loss of electrical power. However, access to a database provided on a hard disc may have high latency.
A further example of a database is a non-SQL database management system such as the key-value data store database management system, Redis. Redis typically stores data in RAM, which has advantages of low latency when accessing the database. However, the robustness of the data is low compared to a relational database management system and the data may also lack persistence.
The developer of a database system is required to choose the type of database management system they wish to use for their database given the requirements of the data and the database system. However, these requirements may be conflicting. Further, there may not exist a database that provides all the requirements that the developer needs. Summary
According to the invention in a first aspect there is provided a method for manipulating data in a database system comprising a plurality of database management systems, the method comprising: determining a performance metric for at least one of the plurality of database management systems; selecting, in dependence on the performance metric, at least one database management system of the plurality of database management systems; and controlling the at least one database management system to manipulate the data.
As used herein the term "database system" encompasses a system comprising at least one data storage element and at least one control element, wherein the control element controls at least the writing of data to the data storage element, the reading of data from the data storage element and the deletion of data from the data storage element. The data storage elements are typically comprised within a database server.
As used herein the term "database management system" encompasses a software package comprising computer programs to control access to a database. The database management system is typically run on a database server.
By determining a performance metric for the database management systems the performance of each may be assessed. The database management system that performs the best may therefore be selected for use when manipulating the data, which provides improved efficiency of the database system. Optionally, the performance metric may be the time taken to manipulate previous data using the at least one database management system.
The time taken to manipulate previous data may be an indication of the likely time that will be taken to manipulate the present data and may therefore be useful information when selecting the at least one database management system to optimise the speed with which the data is manipulated.
Optionally, selecting the at least one database management system may comprise selecting the database management that has taken the least time to manipulate previous data.
Optionally, selecting at least one database management system may comprise selecting a database management system if the performance metric is 500 milliseconds or less.
Optionally, selecting at least one database management system may comprise selecting a database management system if the performance metric is 250 milliseconds or less. Optionally, the performance metric may comprise the time taken to write previous data to the database system using the at least one database management system. Optionally, controlling the at least one database management system may comprise writing the data to the database system using the at least one database management system.
Optionally, the performance metric may comprise the time taken to read previous data from the database system using the at least one database management system.
Optionally, controlling the at least one database management system may comprise reading the data from the database system using the at least one database management system.
Optionally, the performance metric may comprise the time taken by the at least one database management system to delete previous data from the database system.
Optionally, the method may further comprise updating the performance metric for the selected at least one of the plurality of database management systems based on the performance of the selected at least one database management system when controlled to manipulate the data.
Optionally, updating the performance metric may comprise determining the average performance metric over a set time period. Optionally, the set time period may be from 30 seconds to 90 seconds. In exemplary embodiments, the set time period may be 60 seconds. Optionally, the set time period may be from 4 minutes to 6 minutes. In exemplary embodiments, the set time period may be 5 minutes. Optionally, the set time period may be from 25 minutes to 35 minutes. In exemplary embodiments, the set time period may be 30 minutes. Optionally, the set time period may be from 1 10 minutes to 130 minutes. In exemplary embodiments, the set time period may be 120 minutes. Optionally, updating the performance metric may comprise determining the average performance metric over a set number of data manipulation operations.
Optionally, the plurality of database management systems may comprise a plurality of database management system types.
Including a plurality of types of database management systems (e.g. SQL and non-SQL database management systems) enables the advantages of each of the plurality of types to be utilised according to the characteristics of the data to be manipulated.
Optionally, the plurality of database management systems may comprise a plurality of the same database management system type. Optionally, the plurality of database management systems may comprise a relational database management system.
Optionally, the relational database management system may comprise a MySQL database management system. Optionally, the plurality of database management systems may comprise a non-SQL database management system.
As used herein, the term "non-SQL" encompasses the body of database management systems that are not relational database management systems and, more specifically, are not based on an SQL database management system.
Optionally, the non-SQL database management system may comprise a key-value data store database management system.
Optionally, the key-value data store database management system may comprise a Redis database management system. Optionally, the non-SQL database management system may comprise a document oriented database management system.
Optionally, the document oriented database management system may comprise a MongoDB database management system.
Optionally, the selected at least one database management system may comprise more than one database management system.
In certain embodiments it may be advantageous to manipulate the data using more than one database management system. For example, if the data to be manipulated is required to be persistent data but also requires fast access, then the data may be manipulated using, for example, the key-value data store database management system and also manipulated using the relational database management system. Therefore, the data may be manipulated quickly using the key-value data store and the persistence may be provided by also manipulating the data using the relational database management system.
Optionally, the selected at least one database management system may comprise all of the plurality of database management systems.
The data may be manipulated by all of the plurality of database management systems. This may be done sequentially or non-sequentially. This therefore produces a database that is replicated by all of the more than one database management systems within the database system. Each data manipulation type, e.g. write, read or delete, may be undertaken first by the best performing database management system for that type of data manipulation. The remaining database management systems may be scheduled to undertake the data manipulation later. This optimises the execution of each type of data manipulation such that it is carried out by the highest performing database management system while ensuring that the correct data is replicated across all database management systems. Optionally, the selected at least one database management system may comprise a first database management system and at least one further database management system, the first database management system being selected based on the determined performance metric, and wherein the first and at least one further database management systems are each controlled to manipulate the data.
Optionally, the first database management system and the at least one further database management system may be controlled sequentially. As used herein, the term "sequentially" encompasses manipulating the data using the first and at least one further database management systems in consecutive database system instructions. In this way, the versions of the database of each of the first and at least one further database management systems may be maintained to be consistent with each other.
If the data is manipulated sequentially using the first and at least one further database management systems then faults with one or more of the database management systems may be identified quickly. That is, a fault with the further database management system may be identified soon after the manipulation of the data by the first database management system.
Optionally, the first database management system and the at least one further database management system may be controlled non-sequentially.
As used herein, the term "non-sequentially" encompasses manipulating the data using the first database management system and scheduling the manipulation of the data using the at least one further database management system at a later time.
Non-sequential manipulation of the data allows processor resources to be utilised more efficiently during time critical operations.
Optionally, the method further comprises receiving computer program instructions to manipulate data, and controlling the at least one database management system may comprise compiling and issuing commands to the selected at least one database management system based on the received computer program instructions.
In this way, the method provides for a developer of a database system to manipulate data in computer program instructions of a single computer programming language. The method provides for the compilation and issuance of the computer program instructions into commands specific to the at least one selected database management system. The developer may therefore access and utilise any of the database management systems using only a single computer programming language.
Optionally, the method may further comprise manipulating the data using the at least one database management system. According to the invention in a second aspect there is provided a computer program product comprising computer readable code configured to carry out the method described above.
According to the invention in a third aspect there is provided a database system for manipulating data comprising: a plurality of database management systems; a processor in electrical communication with each of the plurality of database management systems and configured to receive instructions for manipulating data in the database system, wherein the processor is further configured to determine a performance metric for at least one of the plurality of database management systems; select, in dependence on the performance metric, at least one database management system of the plurality of database management systems; and control the at least one database management system to manipulate the data.. Optionally, the plurality of database management systems may comprise a relational database management system.
Optionally, the relational database management system may comprise a MySQL database management system.
Optionally, the plurality of database management systems may comprise a non-SQL database management system. Optionally, the non-SQL database management system may comprise a key-value data store database management system.
Optionally, the key-value data store database management system may comprise a Redis database management system.
Optionally, the non-SQL database management system may comprise a document oriented database management system.
Optionally, the document oriented database management system may comprise a MongoDB database management system.
Optionally, the processor is in electrical communication with each of the plurality of database management systems via a network. Also disclosed herein is a method for manipulating data in a database system comprising a plurality of database management systems, the method comprising: determining at least one characteristic of data to be manipulated; selecting, in dependence on the at least one characteristic, at least one database management system of the plurality of database management systems; and controlling the at least one database management system to manipulate the data.
By determining a characteristic of the data to be manipulated within the database system, the segregation of different data types, each having its own characteristics, is permitted. The characteristics of the data may then be used to select an appropriate database management system to manipulate the data. Each database management system of the plurality of database management systems may provide different advantages to the storage of data having different characteristics. Therefore, by determining the data characteristics it is possible to make best use of the database management systems.
Optionally, determining the at least one characteristic may comprise determining the required persistence of the data to be manipulated.
Optionally, the at least one characteristic may be one of persistent data and non-persistent data. Optionally, selecting at least one database management system may comprise selecting a relational database management system if the at least one characteristic is persistent data.
As used herein, the term "persistent data" encompasses data that is required to persist if electrical power to the database system is lost. Such data may be data that is of high importance and/or is to be used as a backup for data stored in a different database management system providing less persistence of data. As used herein, the term "non-persistent data" encompasses data that is not required to persist after a loss of electrical power to the database system. Non-persistent data need not necessarily be required to be lost after such an electrical power loss. Non-persistent data may therefore include a "don't care" scenario in which the fate of the data following an electrical power loss is not of great importance.
By selecting the relational database management system for storing persistent data the advantage of a disc based storage medium may be utilised. That is, a disc based medium is non-volatile and so the data stored on the disc will persist following a loss of electrical power.
Optionally, determining the at least one characteristic may comprise determining the size of the data to be manipulated.
Optionally, selecting at least one database management system may comprise selecting the key-value data store database management system if the size of the data is less than a low data size threshold value. Optionally, the low data size threshold value may be configurable dependent on the type of data to be manipulated.
By manipulating data, the size of which is below the low data size threshold value, using the key-value data store database management system the advantages of that system may be utilised. That is, the key- value data store database medium provides fast access as it may manipulate data in RAM. However, the amount of RAM available may be limited and so it may only be practical to use the key-value data store database management system when the size of the data to be manipulated falls below the low data size threshold value. Optionally, selecting at least one database management system may comprise selecting the document oriented database management system if the size of the data is greater than a high data size threshold value.
Optionally, the high data size threshold value may be configurable dependent on the type of data to be manipulated.
By manipulating data, the size of which is above the high data threshold value, using the document oriented database management system, the advantages of that system may be utilised. The document oriented database management system may manipulate some data in RAM and some data on disc. Therefore, the document oriented database management system may provide advantages regarding fast access with the ability to manipulate larger amounts of data.
Optionally, the method may further comprise selecting a first database management system in dependence on the at least one characteristic, and selecting a second database management system to be a relational database management system.
Also disclosed herein is a computer program product comprising computer readable code configured to execute the method described above. Also disclosed herein is a database system for manipulating data comprising: a plurality of database management systems; a processor in electrical communication with each of the plurality of database management systems and configured to receive instructions for manipulating data in the database system, wherein the processor is further configured to determine a characteristic of data to be manipulated, select, in dependence on the at least one characteristic, at least one database management system of the plurality of database management systems; and controlling the at least one database management system to manipulate the data.
Different database management systems may make use of more than one data storage medium. However, it will be understood that one database management system may specialise in, and therefore be optimised for use with, a particular data storage medium. For example, a relational database such as MySQL may specialise in the use of a disc based data storage medium. Further, a key value data store database management system may specialise in the use of a memory (e.g. RAM) data storage medium.
Brief Description of the Figures
Exemplary embodiments of the invention will now be described with reference to the accompanying drawings, in which:
Figure 1 is a block schematic diagram of a database system;
Figure 2 is a flow diagram showing a method of manipulating data in a database system according to a first embodiment;
Figure 3 is a flow diagram showing a method of manipulating data in a database system according to a second embodiment; and
Figure 4 is a flow diagram showing a method of manipulating data in a database system according to a third embodiment.
Specific Description Generally, disclosed herein is a method and system for efficiently manipulating data in a database system. The method and system provide technical benefits in the speed of manipulation of data and the increased integrity of data manipulated within the database system.
Referring to Figure 1 there is shown a database system 100. The database system 100 comprises a computer 102, a computer network 104 and a plurality of database servers 106a-106c. The computer 102 comprises computer program instructions 108 produced by a database developer to control the database system and a computer processor 1 10 configured to implement a method for manipulating data within the database system 100. It will be understood that the computer program instructions 108 may not, in reality, be comprised within the computer 102 but may have been produced remotely and at a different time to be executed on the processor 1 10 of the computer 102.
The processor 1 10 is therefore configured to receive computer program instructions 108 for manipulating data within the database system 100.
Each of the plurality of database servers 106a-106c comprises a database management system 1 12a-1 12c and a data storage medium 1 14a-1 14c. The database system 100 therefore comprises a plurality of database management systems 1 12a-1 12c that manage access to databases stored on the data storage media 1 14a-1 14c. The database management systems 1 12a-1 12c are software packages configured to run on processors within the database servers 106a-106b. The database management systems 1 12a-1 12c control the creation, maintenance and use of a database stored on the data storage media 1 14a-1 14c. The database management systems 1 12a-1 12c may provide functions for controlling data access, enforcing data integrity and recovering data after failures.
The processor 1 10 is in electrical communication with the plurality of database management systems 1 12a-1 12c. The electrical communication is provided via the computer network 104. It will be understood that the electrical communication may be provided by other means such as local electrical communication connections wherein each of the elements of the database system 100 are located in close proximity, e.g. in the same room or building.
In embodiments of the database system 100 the computer 102 may be located remotely from the plurality of database servers 106a-106c and communication between the processor 1 10 and the plurality of database management systems 1 12a-1 12c is provided by the network 104. The network may, for example, be the Internet or any LAN or WLAN according to the requirements of the database system. In this way the database servers 106a-106c may be located at a database farm maintained at a location remote to the computer 102.
The processor 1 10 may be configured to undertake the methods as described below.
In embodiments the database management system 1 12a on the database server 106a may be a relational database management system such as MySQL. In such embodiments, the data storage medium 1 14a may be a disc drive.
The database management system 1 12b on the database server 106b may be a key-value data store database management system such as Redis. In such embodiments the data storage medium 1 14b may be RAM and/or Flash memory.
The database management system 1 12c on the database server 106c may be a document oriented database management system such as MongoDB. In such embodiments the data storage medium 1 14c may be a combination of RAM and/or Flash memory and a disc drive.
It will be understood that the database system 100 is not limited to the database management systems disclosed above. The database system 100 may comprise any other database management system 1 12a-1 12c as required by the database developer.
Referring to Figure 2 there is shown a flow diagram of a method for manipulating data in a database management system 100 comprising a plurality of database management systems 1 12a-1 12c according to a first embodiment.
The plurality of database management systems 1 12a-1 12c may comprise a plurality of different types of database management system as specified above. For example, the plurality of database management systems may comprise SQL and non-SQL database management systems, e.g. a relational database management system, a key-value data store database management system and a document oriented database management system.
At step 200 instructions 108 are received for the manipulation of data within the database system 100. The instructions may be received by the processor 1 10 in the form of computer program instructions 108 to manipulate the data. The computer program instructions 108 may be in a specific computer programming language. The programming language may comprise functions specific to that language for the manipulation of data. The computer program instructions 108 may comprise any instruction types for manipulating data within a database system 100 including but not limited to write data instructions, read data instructions and delete data instructions.
The computer program instructions 108 may also comprise instructions relating to the administration of a database system 100 such as instructions to set up a database system 100 and also to recover a database following database failure.
At step 202 at least one performance metric of at least one of the plurality of database management systems 1 12a-1 12c is determined.
By determining a performance metric for the database management systems 1 12a-1 12c, a database management system may be selected based on which of them is performing best. Under initial conditions, i.e. the first time data is manipulated in the database system 100, the method may manipulate the data according to the computer instructions 108 using each of the plurality of database management systems 1 12a-1 12c. This may be undertaken sequentially or non-sequentially. After the initial manipulation of data, the performance metrics for each of the plurality of database management systems 1 12a- 1 12c is determined and stored. These performance metrics may then be used to select the at least one database management system for future manipulation of data. The performance metric may be the time taken for a database management system 1 12a-1 12c to manipulate previous data. That is, the processor 1 10 may store the time taken for each of the plurality of database management systems 1 12a-1 12c to manipulate previous data. Based on the determined time, selecting the at least one database management system 1 12a-1 12c may comprise selecting the database management system that has taken the least time to manipulate previous data. Each database management system of the plurality of database management systems 1 12a-1 12c may therefore have associated with it one or more performance metrics. Each of the one or more performance metrics may correspond to a specific type of data manipulation. For example, each database management system may have a performance metric for writing data to the database system 100, for reading data from the database system 100, and for deleting data from the database system 100. Therefore, the received computer instruction 108 may be processed to determine the type of data manipulation required and the correct set of determined performance metrics may be used to select at least one database management system to manipulate the data.
By using a performance metric associated with the time taken to manipulate previous data, the fastest performing database management system may be used for each type of data manipulation. Additionally, problems with the electrical communication of the processor 1 10 to the plurality of database management systems 1 12a-1 12c may also be incorporated into the performance metric. That is, if the network over which electrical communication is established is not performing well due to high network traffic volumes or problems with the network, then the time taken for one or more database management systems to undertake data manipulation may be increased. The increased time (and therefore reduced performance) results in an alternative database management system being selected to manipulate future data. The method is thereby able to overcome problems associated with the use of a particular network or network router.
Other performance metrics are within the scope of the invention.
The method of Figure 2 may further comprise the step of updating the performance metric for the selected at least one of the plurality of database management systems 1 12a-1 12c based on the performance of the selected at least one database management system when manipulating the data. That is, in embodiments in which the performance metric relates to the time taken to manipulate data, then the time taken to manipulate the present data is used to update the performance metric of the selected database management system. The updated performance metrics may be stored and used to select database management system for future data manipulation. In embodiments, updating the performance metric may comprise determining an average performance metric over a set time period. The set time period may, for example be in the range from 30 seconds to 90 seconds. In specific embodiments, the set period of time may be 60 seconds. The set time period may be in the range from 4 minutes to 6 minutes. In specific embodiments, the set time period may be 5 minutes. The set time period may, for example be in the range from 25 minutes to 35 minutes. In specific embodiments, the set period of time may be 30 minutes. The set time period may, for example be in the range from 1 10 minutes to 130 minutes. In specific embodiments, the set period of time may be 120 minutes. In alternative embodiments, updating the performance metric may comprise determining an average performance metric over a set number of data manipulation operations. That is, the performance metric for a database management system for writing data to the database system may be updated by determining the average time taken by the database management system to write data to the database system over a number of previous data write operations. The same algorithm may be used for read operations, delete operations or any other type of data manipulation operation.
For instructions relating to the manipulation of data within the database system 100, the performance metric may be specific to each instruction. Therefore, if data manipulation comprises writing data to a database system 100, the database management system 1 12a-1 12c having the best performance for writing data may be selected. In embodiments, therefore, the performance metric may be the time taken for a particular database management system to perform a write instruction.
If manipulating comprises reading data from a database system 100, the database management system 1 12a-1 12c having the best performance for reading data may be selected. In embodiments, therefore, the performance metric may be the time taken for a particular database management system to perform a read instruction.
If manipulating comprises deleting data from a database system 100, the database management system 1 12a-1 12c having the best performance for deleting data may be selected. In embodiments, therefore, the performance metric may be the time taken for a particular database management system to perform a delete instruction.
This allows the database system 100 to be optimised for each type of instruction for the manipulation of data as the best performing database management system 1 12a-1 12c is selected for each specific data manipulation instruction. This is suited to embodiments of the invention in which data is manipulated by more than one selected database management system as the data manipulation is replicated on all database servers 106a-106c, either sequentially or non-sequentially. In such embodiments, once all replication has been undertaken, the same data is stored on each of the database servers 106a-106c of the database system 100 and therefore the processor 1 10 is free to select any of the plurality of database management systems 1 12a-1 12c for the next data manipulation operation.
At step 204 at least one database management system 1 12a-1 12c is selected in dependence on the determined performance metric. As discussed herein, each of the database management systems 1 12a- 1 12c may provide different advantages and disadvantages regarding the execution of different types of data manipulation. By selecting a database management system 1 12a-1 12c based on the performance metric the method may utilise each of the different database management systems based on their current performance to provide for a more efficient database system 100. At step 206 the at least one database management system 1 12a-1 12c is controlled to manipulate the data.
The database management system 1 12a-1 12c may be controlled by the processor 1 10. In this regard controlling the at least one database management system 1 12a-1 12c may comprise compiling, by the processor 1 10, commands to the selected at least one database management system 1 12a-1 12c to manipulate the data in accordance with the computer program instructions 108 received by the processor 1 10 and sending those commands to the selected at least one database management system 1 12a-1 12c. The selected at least one database management system 1 12a-1 12c may then manipulate the data accordingly. In embodiments of the method wherein the computer program instructions 108 are received by the processor 1 10 in a particular computer programming language, manipulating the data may comprise compiling the commands for the selected at least one database management system 1 12a-1 12c based on the computer program instructions 108 received in the computer programming language. That is, the processor 1 10 may be configured to compile commands for each of the plurality of database management systems 1 12a-1 12c in response to receiving computer program instructions 108 in one or more specific computer programming languages.
Therefore, the method may further comprise providing, by the processor 1 10, a library of functions in a single computer programming language for manipulation of data. The database developer may select a function for use when manipulating data. In such embodiments, the method may further comprise receiving, by the processor 1 10, computer program instructions 108 in the form of one or more of the provided functions for the manipulating the data, and mapping, by the processor 1 10, the selected function to a command for the selected at least one database management system 1 12a-1 12c.
In this way, the method and system may provide a means for controlling multiple database management systems 1 12a-1 12c using a single computer programming language. At step 208 the data is manipulated using the selected at least one database management system 1 12a-1 12c.
In embodiments of the method of Figure 2, selecting a database management system 1 12a-1 12c may comprise selecting more than one database management systems 1 12a-1 12c. This allows the data to be manipulated using more than one database management system 1 12a- 1 12c and therefore allows the method and system disclosed herein to utilise the advantages of each database management system 1 12a-1 12c. In a particular embodiment of the method and system, the selected at least one database management system comprises all of the plurality of database management systems 1 12a-1 12c.
In such embodiments, a first database management system is selected based on the determined performance metric and at least one further database management system is also selected. The at least one further database management system may comprise all of the remaining database management systems. In embodiments in which more than one database management system 1 12a-1 12c is selected, the data may be manipulated sequentially by the first and at least one further database management systems 1 12a-1 12c. That is, the manipulation of the data using the selected and further database management systems may be by consecutive instructions of the processor 1 10. Manipulation by subsequent further database management systems may also be by consecutive instructions of the processor 1 10.
In other embodiments, the method comprises controlling the first database management system to manipulate the data, and subsequently controlling the at least one further database management system to manipulate the data non-sequentially. That is, the data may be manipulated by the first database management system as soon as possible and manipulation of the data by the at least one further database management system is scheduled for some time later. This allows for efficient use of processor 1 10 resources as the second and subsequent manipulations may be scheduled for a time when the processor 1 10 is not in use or is not under a high load. In embodiments where more than one of the plurality of database management systems is selected, a suitable version control algorithm may also be implemented to ensure that data is replicated across all database servers 106a-106b in the database system 100. Referring to Figure 3 there is shown a flow diagram of a method for manipulating data in a database management system 100 comprising a plurality of database management systems 1 12a-1 12c. The method according to Figure 3 comprises the steps of the first embodiment except that the step of determining a performance metric is replaced by the step 302 of determining a characteristic of the data, and the step 304 of selecting at least one database management system 1 12a-1 12c is in dependence on the determined characteristic.
The remaining steps of the method of the second embodiment are as described above in relation to the first embodiment and are therefore not repeated.
The embodiments of the method disclosed in Figure 2 apply equally to the method of Figure 3 and are also therefore not repeated here.
The instructions 108 may identify to the processor 1 10 the type of data to be manipulated in the database system 100, e.g. by writing, reading or deleting. The instructions 108 may comprise information that may be used by the processor 1 10 to determine a characteristic of the data to be manipulated. For example, if the data to be manipulated is a specific type of variable, such as an integer or string, then the size of that variable, i.e. the number of bits in memory storage, may be known. Determining a characteristic of the data to be manipulated may therefore comprise determining a characteristic based on information comprised within the instructions 108.
Further, the processor 1 10 may be configured to assign specific characteristics to specific data types. For example, a database may comprise students and their marks achieved in examinations. A data entity may therefore be defined as "student" and may comprise information relating to each student. A further entity may be defined as "marks" and may comprise the marks achieved by each student. The "student" entity may comprise a large amount of data about a student such as name, address, email address, telephone number and candidate number. Therefore, the processor may be configured to determine that the size of the "student" entity is large. Further, the "marks" entity may comprise only one or more integer values relating to marks achieved by students. The processor 1 10 may therefore be configured to determine that the size of a "marks" entity is small.
Determining a characteristic of the data to be manipulated may comprise determining the required persistence of the data. If a particular data type has a high importance then it may be determined to be "persistent data". That is, the data may be required to persist following an electrical power outage. If the data to be manipulated is not of high importance then it may be determined to be "non-persistent" data. That is, the fate of the data following an electrical power outage may not be of great importance.
Following the student database example above, information relating to a student may be stored as mentioned above. The information may include student name, address, email address and telephone number. The processor 1 10 may be configured to determine that student name and address are to be assigned as persistent data as these must be retrievable following an electrical power outage. However, the email address and telephone number of a student may be assigned to be non- persistent data as they are merely additional modes of communication.
Determining a characteristic of the data to be manipulated may comprise determining the size of data to be manipulated. The size of the data to be manipulated may be determined as discussed above. In other embodiments the size of the data to be manipulated may be determined by analysis of the data itself. That is, the amount of memory storage occupied by the data may be used to determine the size of the data. It will be understood that the processor may also be configured to determine characteristics of data to be manipulated other than size and persistence. In embodiments of the method wherein the database system 100 comprises a relational database management system 1 12a, such as MySQL, the method may select the relational database management system 1 12a if the determined characteristic of the data to be manipulated is that it is persistent data.
Relational database management systems 1 12a typically store and manipulate data using a disc drive. Data stored on disc drives typically persists following an electrical power outage. However, access to disc drives is typically slow when compared to other types of data storage media.
In embodiments of the method wherein the database system 100 comprises a non-SQL database management system, e.g. a key-value data store database management system 1 12b, such as Redis, the method may select the key-value data store database management system 1 12b if the determined characteristic of the data to be manipulated is that its size is below a low data size threshold value. The low data size threshold value may be configurable dependent on the type of data to be manipulated.
In embodiments of the method wherein the non-SQL database management system comprises a document oriented database management system 1 12c, such as MongoDB, the method may select the document oriented database management system 1 12c if the determined characteristic of the data to be manipulated is that its size is above a high data size threshold value. The high data size threshold value may be configurable dependent on the type of data to be manipulated.
In embodiments of the method in which more than one database management system 1 12a-1 12c is selected, a first database management system 1 12a-1 12c may be selected based on the determined performance metric as defined above. At least one further database management system may also be selected, e.g. the relational database management system 1 12a.
The processor 1 10 may determine that the data to be manipulated is below the low data size threshold and may therefore select the key-value data store database management system 1 12b as the first selected database management system. The at least one further database management system is then selected to be the relational database management system 1 12a. This embodiment provides the advantages of fast access of the key-value data store database management system 1 12b and also the advantage of persistence offered by the relational database management system.
In another embodiment, the first database management system may be a document oriented database management system 1 12b such as MongoDB, and the at least one further database management system may be a relational database management system 1 12a.
Referring to Figure 4 there is shown a flow diagram of a method for manipulating data in a database management system 100 comprising a plurality of database management systems 1 12a-1 12c according to a third embodiment. In the third embodiment, the methods of the first embodiment and the second embodiment may be combined. That is, the method according to the third embodiment may comprise a step 402 of determining a characteristic of the data to be manipulated, and a step 404 of determining a performance metric for at least one of the plurality of database management systems 1 12a-1 12c. Further, the method of the third embodiment comprises the step 406 of selecting at least one of the plurality of database management systems 1 12a-1 12c in dependence on the determined characteristic and the determined performance metric.
The remaining steps of the method of Figure 4 are described above and are therefore not repeated here.
Further, the embodiments of the methods of Figures 2 and 3 apply equally to the method of Figure 4 and are also therefore not repeated here.
The database system 100 comprises means to execute the methods described above. In embodiments a computer program product may be configured to store computer program code to execute any of the methods described herein. The computer program product may for example comprise a computer hard drive, a floppy disc, CD, DVD, flash memory or other data media. The computer program product may alternatively or additionally comprise programmable logic, FPGAs, ASICs and/or firmware.
In other embodiments an apparatus such as a computer, computing device or database system may comprise a microprocessor configured to carry out the method described herein. The skilled person will envisage further embodiments within the scope of the appended claims.

Claims

Claims
1 . A method for manipulating data in a database system comprising a plurality of database management systems, the method comprising: determining a performance metric for at least one of the plurality of database management systems; selecting, in dependence on the performance metric, at least one database management system of the plurality of database management systems; and controlling the at least one database management system to manipulate the data.
2. A method according to claim 1 wherein the performance metric is the time taken to manipulate previous data using the at least one database management system.
3. A method according to claim 2 wherein selecting the at least one database management system comprises selecting the database management that has taken the least time to manipulate previous data.
4. A method according to claim 2 wherein selecting the at least one database management system comprises selecting the at least one database management system if the determined performance metric is 500 milliseconds or less.
5. A method according to claim 2 wherein selecting the at least one database management system comprises selecting the at least one database management system if the performance metric is 250 milliseconds or less.
6. A method according to any of claims 2 to 5 wherein the performance metric comprises the time taken to write previous data to the database system using the at least one database management system.
7. A method according to any preceding claim wherein controlling the at least one database management system comprises writing the data to the database system using the at least one database management system.
8. A method according to any of claims 2 to 7 wherein the performance metric comprises the time taken to read previous data from the database system using the at least one database management system.
9. A method according to any preceding claim wherein controlling the at least one database management system comprises reading the data from the database system using the at least one database management system.
10. A method according to any of claims 2 to 9 wherein the performance metric comprises the time taken by the at least one database management system to delete previous data from the database system.
1 1 . A method according to any preceding claim further comprising updating the performance metric for the selected at least one of the plurality of database management systems based on the performance of the selected at least one database management system when controlled to manipulate the data.
12. A method according to claim 1 1 wherein updating the performance metric comprises determining the average performance metric over a set time period.
13. A method according to claim 1 1 wherein updating the performance metric comprises determining the average performance metric over a set number of data manipulation operations.
14. A method according to any preceding claim Optionally, the plurality of database management systems comprises a plurality of database management system types.
15. A method according to any preceding claim wherein the plurality of database management systems comprises a plurality of the same database management system type.
16. A method according to any preceding claim wherein the plurality of database management systems comprises a relational database management system.
17. A method according to claim 16 wherein the relational database management system is a MySQL database management system.
18. A method according to any preceding claim wherein the plurality of database management systems comprises a non-SQL database management system.
19. A method according to claim 18 wherein the non-SQL database management system comprises a key-value data store database management system.
20. A method according to claim 19 wherein the key-value data store database management system comprises a Redis database management system.
21 . A method according to claims 18 or 19 wherein the non-SQL database management system comprises a document oriented database management system.
22. A method according to claim 21 wherein the document oriented database management system comprises a MongoDB database management system.
23. A method according to any preceding claim wherein the selected at least one database management system comprises more than one database management system.
24. A method according to claim 23 wherein the selected at least one database management system comprises all of the plurality of database management systems.
25. A method according to claim 23 or claim 24 wherein the selected at least one database management system comprises a first database management system and at least one further database management system, the first database management system being selected based on the determined performance metric, and wherein the first and at least one further database management systems are each controlled to manipulate the data.
26. A method according to claim 25 wherein the first database management system and the at least one further database management system are controlled sequentially.
27. A method according to claim 25 wherein the first database management system and the at least one further database management system are controlled non-sequentially.
28. A method according to any preceding claim further comprising receiving computer program instructions to manipulate data, and wherein controlling the at least one database management system comprises compiling and issuing commands to the selected at least one database management system based on the received computer program instructions.
29. A method according to any preceding claim further comprising manipulating the data using the at least one database management system.
30. A computer program product comprising computer readable code configured to carry out the method according to any preceding claim.
31 . A database system for manipulating data comprising: a plurality of database management systems; a processor in electrical communication with each of the plurality of database management systems and configured to receive instructions for manipulating data in the database system, wherein the processor is further configured to determine a performance metric for at least one of the plurality of database management systems; select, in dependence on the performance metric, at least one database management system of the plurality of database management systems; and control the at least one database management system to manipulate the data.
32. A database system according to claim 31 wherein the plurality of database management systems comprises a relational database management system.
33. A database system according to claim 32 wherein the relational database management system comprises a MySQL database management system .
34. A database system according to any of claims 31 to 33 wherein the plurality of database management systems comprises a non-SQL database management system.
35. A database system according to claim 34 wherein the non-SQL database management system comprises a key-value data store database management system.
36. A database system according to claim 35 wherein the key-value data store database management system comprises a Redis database management system.
37. A database system according to claim 34 wherein the non-SQL database management system comprises a document oriented database management system.
38. A database system according to claim 37 wherein the document oriented database management system comprises a MongoDB database management system.
39. A database system according to any of claims 31 to 38 wherein the processor is in electrical communication with each of the plurality of database management systems via a network.
41 . A method for manipulating data in a database system as described herein with reference to the accompanying drawings.
40. A database management system according as described herein with reference to the accompanying drawings.
PCT/GB2012/052697 2011-11-04 2012-10-30 Method and database system for manipulating data WO2013064815A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1119062.6 2011-11-04
GB1119062.6A GB2496173A (en) 2011-11-04 2011-11-04 Database system for manipulating data

Publications (1)

Publication Number Publication Date
WO2013064815A1 true WO2013064815A1 (en) 2013-05-10

Family

ID=45421273

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2012/052697 WO2013064815A1 (en) 2011-11-04 2012-10-30 Method and database system for manipulating data

Country Status (2)

Country Link
GB (1) GB2496173A (en)
WO (1) WO2013064815A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462117A (en) * 2013-09-18 2015-03-25 北京齐尔布莱特科技有限公司 Method and device for operating mongodb
CN104660442A (en) * 2013-11-25 2015-05-27 中国移动通信集团福建有限公司 Service provisioning method and system based on MongoDB
WO2016187771A1 (en) * 2015-05-25 2016-12-01 武克易 Cloud television data acquisition method and system based on user behaviour analysis
WO2017090799A1 (en) * 2015-11-27 2017-06-01 전자부품연구원 Method and system for selectively configuring db according to data type
CN106980621A (en) * 2016-01-18 2017-07-25 北京京东尚科信息技术有限公司 The method and apparatus of event filing and inquiry based on MongoDB
CN107479829A (en) * 2017-08-03 2017-12-15 杭州铭师堂教育科技发展有限公司 A kind of Redis cluster mass datas based on message queue quickly clear up system and method
CN109039803A (en) * 2018-07-10 2018-12-18 武汉斗鱼网络科技有限公司 A kind of method, system and the computer equipment of processing readjustment notification message
CN109241072A (en) * 2018-08-31 2019-01-18 携程计算机技术(上海)有限公司 Buffering updating method and system based on Canal
CN116150280A (en) * 2023-04-04 2023-05-23 之江实验室 Mimicry redis database synchronization method, system, equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886079B (en) * 2014-03-26 2018-03-30 北京京东尚科信息技术有限公司 A kind of data processing method and system
CN104794244B (en) * 2015-05-13 2018-02-16 南京大学 A kind of method and apparatus that figure conversion is realized based on MongoDB
CN107609086B (en) * 2017-09-07 2021-03-16 同程网络科技股份有限公司 APP pushing method and engine system thereof
US20240168970A1 (en) * 2022-11-18 2024-05-23 Rockwell Collins, Inc. Distributed database for segregation of concerns

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0456491A2 (en) * 1990-05-10 1991-11-13 Kabushiki Kaisha Toshiba A distributed database management system
US20070050328A1 (en) * 2005-08-29 2007-03-01 International Business Machines Corporation Query routing of federated information systems for fast response time, load balance, availability, and reliability
WO2008140937A2 (en) * 2007-05-08 2008-11-20 Paraccel, Inc. Query handling in databases with replicated data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060064400A1 (en) * 2004-09-21 2006-03-23 Oracle International Corporation, A California Corporation Methods, systems and software for identifying and managing database work
US7809690B2 (en) * 2004-07-13 2010-10-05 Oracle International Corporation Performance metric-based selection of one or more database server instances to perform database recovery

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0456491A2 (en) * 1990-05-10 1991-11-13 Kabushiki Kaisha Toshiba A distributed database management system
US20070050328A1 (en) * 2005-08-29 2007-03-01 International Business Machines Corporation Query routing of federated information systems for fast response time, load balance, availability, and reliability
WO2008140937A2 (en) * 2007-05-08 2008-11-20 Paraccel, Inc. Query handling in databases with replicated data

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462117A (en) * 2013-09-18 2015-03-25 北京齐尔布莱特科技有限公司 Method and device for operating mongodb
CN104660442A (en) * 2013-11-25 2015-05-27 中国移动通信集团福建有限公司 Service provisioning method and system based on MongoDB
WO2016187771A1 (en) * 2015-05-25 2016-12-01 武克易 Cloud television data acquisition method and system based on user behaviour analysis
WO2017090799A1 (en) * 2015-11-27 2017-06-01 전자부품연구원 Method and system for selectively configuring db according to data type
KR101785166B1 (en) * 2015-11-27 2017-10-12 전자부품연구원 Selective DB Configuration Method in accordance with Data Type and System applying the same
CN106980621A (en) * 2016-01-18 2017-07-25 北京京东尚科信息技术有限公司 The method and apparatus of event filing and inquiry based on MongoDB
CN107479829A (en) * 2017-08-03 2017-12-15 杭州铭师堂教育科技发展有限公司 A kind of Redis cluster mass datas based on message queue quickly clear up system and method
CN107479829B (en) * 2017-08-03 2020-04-17 杭州铭师堂教育科技发展有限公司 Redis cluster mass data rapid cleaning system and method based on message queue
CN109039803A (en) * 2018-07-10 2018-12-18 武汉斗鱼网络科技有限公司 A kind of method, system and the computer equipment of processing readjustment notification message
CN109241072A (en) * 2018-08-31 2019-01-18 携程计算机技术(上海)有限公司 Buffering updating method and system based on Canal
CN116150280A (en) * 2023-04-04 2023-05-23 之江实验室 Mimicry redis database synchronization method, system, equipment and storage medium

Also Published As

Publication number Publication date
GB201119062D0 (en) 2011-12-21
GB2496173A (en) 2013-05-08

Similar Documents

Publication Publication Date Title
WO2013064815A1 (en) Method and database system for manipulating data
US20210397766A1 (en) System and methods for multi-language abstract model creation for digital environment simulations
US8904377B2 (en) Reconfiguration of computer system to allow application installation
EP3404899A1 (en) Adaptive computation and faster computer operation
US11294958B2 (en) Managing a distributed knowledge graph
US9092474B2 (en) Incremental conversion of database objects during upgrade of an original system
US20220004683A1 (en) System and method for creating domain specific languages for digital environment simulations
CN110737924B (en) Data protection method and equipment
US8825653B1 (en) Characterizing and modeling virtual synthetic backup workloads
US20140081901A1 (en) Sharing modeling data between plug-in applications
US9547456B2 (en) Method and apparatus for efficient data copying and data migration
US20220019451A1 (en) System and methods for creation and use of meta-models in simulated environments
WO2019113508A1 (en) A system and methods for multi-language abstract model creation for digital environment simulations
US20170371641A1 (en) Multi-tenant upgrading
CN108446398A (en) A kind of generation method and device of database
US20160028769A1 (en) Policy evaluation trees
JP2021508389A (en) Job management in a data processing system
TWI493368B (en) Automatic generation of a query lineage
TW200527294A (en) Versioning support in object-oriented programming languages and tools
US20220237500A1 (en) Test case execution sequences
US8825603B2 (en) Ordering volumes and tracks for data transfer based on usage characteristics
JP6950437B2 (en) Information processing system, information processing device and program
US9128823B1 (en) Synthetic data generation for backups of block-based storage
US9678983B1 (en) Systems and methods for automatically passing hints to a file system
WO2018225747A1 (en) Distribution system, data management device, data management method, and computer-readable recording medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12788241

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12788241

Country of ref document: EP

Kind code of ref document: A1