WO2004031989A1

WO2004031989A1 - A database provider system

Info

Publication number: WO2004031989A1
Application number: PCT/IE2003/000024
Authority: WO
Inventors: Ronan Stafford; Anthony Stafford; Richard Nolan
Original assignee: Bweb Partnership
Priority date: 2002-09-18
Filing date: 2003-02-14
Publication date: 2004-04-15
Also published as: GB0504817D0; GB2409075B; AU2003299162A1; IE20030105A1; IES20030106A2; GB2409075A

Abstract

A database provider (1) has a provider (31) for real time access by multiple user systems (2) to a database server (5), and a setup function (32) for generating and managing navigation user interfaces. The navigation interfaces are maintained by the setup function (32) reading a database meta data buffer (53) in real time to determine which paths are commonly used. The meta data is cached in a dedicated buffer (53) while values are cached in a separate buffer (54). A connection manager (56) and a request receiver (57) effectively multiplex multiple simultaneous user requests onto a single server connection. A background function (60) dynamically monitors probability of paths being re-used by users and caches path meta data on this basis.

Description

"A database provider system"

INTRODUCTION

Field of the Invention

The invention relates to interfacing of user systems with database systems.

Prior Art Discussion

At present, there are powerful end user systems and also powerful database systems available for large-scale data processing for business. For example, the Comshare product Decision™ is very effective for delivery of data to the user's desktop/browser, and Comshare's View Manager™ allows users to effectively set up views on the database system. At the database end SAP's Business Information Warehouse™ (BW™) is a powerful database system. This system includes server and client providers, however, these do not interface with some end user systems or provide information for different types of user. Regarding the latter point, user groups such as executives, analysts, end users, and IT engineers require very different types of data.

SUMMARY OF THE INVENTION

According to the invention, there is provided a database provider system comprising:

a provider for interfacing with a plurality of user systems to receive data requests, and for interfacing with a database server for satisfying the requests, a navigation setup function for interfacing with the provider, with user systems and with a database server to generate user navigation interfaces for user systems, and

means for converting commands received from user systems for compatibility with the database server.

In one embodiment, the setup function comprises a monitoring function for monitoring database server meta data for use in generating navigation interfaces.

In another embodiment, the provider is operable to cache meta data in a buffer and the monitoring function is operable to monitor the meta data buffer to determine the meta data.

In a further embodiment, the meta data buffer is solely for meta data.

In one embodiment, the setup function comprises a monitoring function for monitoring values retrieved from the server for use in generating navigation interfaces.

In another embodiment, the provider is operable to cache retrieved values and meta data in separate buffers.

In a further embodiment, the provider is operable to cache the meta data and the values as objects which inherit properties and methods from data access requests.

In one embodiment, a value object is linked with a user system view.

In another embodiment, the objects are independent, and the provider comprises an interface mechanism for inter-object requests . In a further embodiment, the provider comprises a connection manager for managing a single connection with a database server and a request receiver for receiving multiple user data access requests and for multiplexing them onto said server connection.

In one embodiment, the request receiver signals the connection manager when a request has been received and updates a user system flag to indicate access request status.

In another embodiment, there is a flag for each request.

In a further embodiment, the provider comprises a background function for writing data path meta data to a buffer for use without need to subsequently retrieve the same meta data.

In one embodiment, the background function caches a path's meta data only if it predicts that the path is likely to be re-used by a user system.

In another embodiment, the background function maintains a paths table of paths being used in real time, and uses this table to determine probability of path re-use.

In a further embodiment, the background function maintains a count of uses of each path in the path table and caches a path if the count reaches a value at which the probability of re-use is sufficiently high.

In one embodiment, the background function searches the last n paths in the table to determine if it already is in the table, increments the counter if it is, and creates a new record for the path if it is not. In another embodiment, the paths table comprise a single record for each path, the record having the path meta data and the count value.

In a further embodiment, the provider is operable to automatically log all transactions.

In one embodiment, the setup function is operable to manage elements in a views database of a user system.

In another embodiment, the setup function is operable to set up linkages between database queries and views.

In a further embodiment, the provider operates only in a read-only mode to prevent database server record locking.

In one embodiment, the provider converts database element names to ensure uniqueness of names.

In another embodiment, the provider adds and deletes prefixes.

In a further embodiment, the provider automatically generates error messages if a request is received for data in a non-existing database dimension.

In one embodiment, the provider optimises processor performance by deleting redundant commands generated at a user system.

In another embodiment, the provider automatically converts symbols such as types of parentheses for compatibility. In a further embodiment, the provider automatically deletes keywords which are not recognised by a database server.

DETAILED DESCRIPTION OF THE INVENTION

Brief Description of the Drawings

The invention will be more clearly understood from the following description of some embodiments thereof, given by way of example only with reference to the accompanying drawings in which: -

Fig. 1 is a diagram illustrating a hardware architecture for user-database interaction including a database provider of the invention;

Fig. 2 is a diagram illustrating software components of the database provider; and

Fig. 3 is a diagram showing caching of data;

Fig. 4 is a diagram showing operation of a connection manager and a request receiver of the database provider;

Fig. 5 is a diagram showing predictive data access; and

Figs. 6 and 7 are plots illustrating operation of the provider for predictive data access.

Description of the Embodiments Referring to Fig. 1 a user-to-database hardware architecture comprises a database provider 1 of the invention communicating with third party known components. The latter comprise end user systems 2(a), 2(b), and 2(c) distributed over a wide area network/intranet/Internet, and a database server 5 having a database 6. The database 6 has a size in the region of 70-80 GBytes. In this embodiment the database server 5 is a SAP BW™ server and the end user systems run the Comshare applications Decision™ and View Manager™. However, this is only one example and the database provider 1 of the invention may interface with different end user and/ or database server systems instead.

Referring to Fig. 2 the software architecture of the database provider 1 comprises a provider 31 and a navigation setup function ("wizard") 32. The wizard 32 comprises a views management function 33, a GUI function 34, a configuration file function 35, and an ODBO consumer function 36.

The provider 31 interfaces on the user side with the Decision™ application and on the database side with a SAP provider client 25 in turn linked with a SAP provider server 26. The wizard interfaces on the user side with the View Manager™ application and on the database side with the SAP provider client 25.

Structure of Database 6

By way of background, the database 6 contains multidimensional database information. It also handles data control, data storage, security, and database management functions. When SAP BW™ is running, a server agent process acts as a co-ordinator for all user data requests. This agent directs requests for SAP BW™ database records. Databases in SAP BW™ are logically grouped into main cubes ("InfoCubes™"). Each main cube is a collection of specific cubes ("Query Cubes™") logically grouped together. For example, a sales analysis system may consist of a 'Sales' main cube which comprises two specific cubes, Orders and Payments. Each main cube can be characterised as containing one of two types of data, namely key figures or data characteristics such as product, customer type, fiscal year, period or region, which is always loaded into physical memory on the server at startup. These objects form an OLAP (online analytical processing) repository for the SAP BW™ database. Large database systems can have main cubes exceeding 10Gb in size. The database has an extended star schema, which builds on the basic schema by storing master data about attributes, hierarchies and text in separate tables that are shared between main cubes. This reduces data redundancies because master data is stored only once, and then used by various main cubes. As another benefit to the extended star schema, changes in the main cube are automatically applied to the corresponding aggregates. Also known as realignment of aggregates, this important function keeps main cubes and aggregates in synchronisation, despite frequent changes to the business processes. When an underlying main cube changes, the associated aggregate is modified automatically to reflect the new data. The database system generates suggestions for creating optimal aggregates and the system administrator can then decide whether to create those aggregates or not.

At the user end, the Decision™ applications perform delivery of data to browsers on the PCs 2 in real time. Also, the View Manager™ applications allow users to set up the views beforehand.

Structure of Database Provider 1

In the database provider 1, the provider 31 performs real time data access and the wizard 32 allows setup of the overall structure and format for data transfer from the database.

The database provider 1 of the invention ("BWEB") allows the user PCs 2 to interface with SAP BW™ with both full compatibility and in a more effective manner with excellent performance. BWEB also allows compatibility and enhanced performance for interfacing with other database systems including Microsoft SQL Server, Microsoft Analysis Services, Oracle RDBMS, and Hyperion Essbase.

In more detail, the provider 31 "hooks into" the SAP BW™ provider client 25 in a read-only manner without any record locking requirements. Every time the Decision™ application requires fresh data it calls the provider 31. Views in Decision™ communicate with the source systems by means of channels defined during configuration.

The provider 31 is registered in Windows™ as the database provider. It contains a set of Component Object Model (COM) objects such as CBCommand, CBRowset or CBSession. The Component Object Model software architecture comprises binary software components that form the foundation for higher-level software services. Such services include compound documents, custom controls, inter-application scripting, data transfer, and other software interactions.

The provider 31 translates requests and sends them to the external provider client 25. Data is transferred using the MDX (Multi Dimensional Expression) language. SAP BW™ interprets the MDX standard in a different way from the norm and the provider 31 transforms the MDX as the norm into a means by which SAP BW™ can interpret.

Transformations

There are a number of transformations carried out by the provider 31 as set out below.

An aspect of the server 5 is that unique names are unique only within elements. This means that a business dimension called "Measures" could exist and within another business dimension , for example Product, an element within that dimension could be called "Measures". For example, if a company sells products a,b and c products a,b and c would belong to the 'product' hierarchy. Essentially products a,b, and c are members of the 'product' hierarchy. As in the above example the dimension for the product hierarchy would be called 'product' with the same name. The user system 2 treats each unique name as unique in the global sense (it means there can not be any other element with the same name).

To address this problem, when the provider 31 receives the rowset from server 5 it adds the artificial prefix _HXH_ - in order that each member name is made unique. Also, when receiving a hierarchy request from the user system 2 the artificial prefix _HXH_ is added to the request so that each member is made unique. The provider 31 removes the prefix when passing the information back to the user system 2.

Also, the server 5 contains a "key figures" dimension which is flat, having only one dimension. However, the user system 2 may issue a request to retrieve CHILDREN or DESCENDANTS of a dimension, including "key figures". To prevent such queries for the "key figures" dimension from becoming redundant, the provider 31 converts these constructs to the MEMBERS construct. Without this there would be a critical errors when such a request to Key figures would appear.

The provider 31 also parses the MDX to translate some key words and to remove some key words to ensure compatibility.

The provider 31 also optimises the commands between the user and the database. For example, if an MDX statement is generated containing a WHERE clause, the provider 31 removes the ALL element which would otherwise be transferred also. This does not change the result set, and a query is performed more efficiently. Another aspect of the provider 31 is that it automatically removes user system keywords which are not recognised by server 5. Examples are ADD CALCULATED MEMBERS and CELL PROPERTIES. The provider 31 also automatically inserts or deletes "{" and "(" parentheses for compatibility with the server 5 element requests and function calls.

Thus server 5 receives fully compatible requests, and in response generates packages of data and transfers them to the provider 31. The provider 31 automatically analyses and re-shapes the data for onward transfer to the user's system 2. Within the provider 31 a CBRowset COM object performs the onward transfer to the user system 2.

At the user end, the presentation of the data is based on views, which is a visualisation mechanism for a particular request or slice of data from a server 5 specific cube. This visualisation can be either numerical or graphical. There are always two main dimensions, which are displayed on rows and on columns called on-grid dimensions. All other dimensions are known as off-grid dimensions. These off-grid dimensions can be used for manipulating the view or giving extra conditions, which are used later in a WHERE statement in an MDX query.

Caching in the Database Provider 1 (Fig. 3)

The provider 31 caches two kinds of data: - data that describes the warehouse's structure, and - the actual values that are stored in the warehouse, that are actually results of queries.

Meta-data is read when a particular specific cube is accessed, and it is then stored in a provider buffer so that every next time a request from a client concerning the specific cube comes, the meta-data is already accessible without requesting it again. In such case the provider 31 only retrieves the actual data (values) from the server 5, not having to query it about their structure in order to construct its data-retrieval query, as the structure is already known. Therefore the client's request is served in a shorter time.

The data itself, contained in the cubes, which is in this document is referred to as "values", is also cached by the provider 31, for use if two requests coming from a client (or clients) are the same. In such situation the request is responded only using provider's cached data without accessing the server 5.

Thus, the provider 31 separates the meta-data from the values and caches them in two different bufers, which allows it to speed up carrying out clients' requests even when they concern different values, but the same cubes (meta-data). Also, meta-data that is cached separately makes it possible to respond to requests concerning the warehouse structure itself without reading from the server 5.

Fig. 3 graphically illustrates the mechanism for data caching in the provider 31. The cubes 50 represent specific cubes, which are subsets of the large main cube 51. The paper sheets 52 represent values. In this drawing the meta data buffer is indicated by the numeral 53 and the values buffer by the numeral 54.

The meta data and the values are cached in the form of COM+ objects, which inherit properties and methods from the rowset request. These objects have additional properties specific for particular schemas such as cube schema, and dimension schema. Each cached data item is linked with a view. This cache resides in memory until the session is restarted or alternatively the cache can be unloaded by restarting the session or by using an active server page called console. asp. Once the numeric data is retrieved this is also cached and stored in memory. The size of the data cache determines how long the data will be held in RAM. If the data requested by the view is held in RAM then this data will be retrieved from memory but if this data is not stored in memory a query will be issued to the server 5. The original data will be cleared from the cache if the remaining available cache memory is too small to store the new query.

For multithreading within a set of COM+ objects specific functions are implemented in a "state-less" environment. This means that more than one instance of such an object can be used at the same time. Each COM+ object is in fact independent and any request to another COM+ object is performed via an interface mechanism - it groups specific functions, which can be used from outside.

The provider 31 may issue independent requests in parallel, for example when pivoting the view (swapping axes) or requesting extra information.

For data retrieval, when the view is opened for the first time all metadata about on- grid dimensions and key information about off-grid dimensions is retrieved optimally using multi-threading to get the column information (MDX) and rowset request in parallel in order to create the query to get the physical data. For the next time this data is already cached so the first step (MDX and Rowset query) does not have to be processed. Similarly with the data, if the data exists in the cache the data can be retrieved from it, and if not a query must be issued.

Another aspect of the separate buffers 53 and 54 is that the other main component of the database provider, the wizard 32, performs direct access to meta data in the buffer 53. Within the wizard 32, this is performed by the views management function 33. This is for use in building templates for user navigation screens both automatically and interactively with inputs from engineers. In addition, the wizard 32 "walks" less experienced users through the process of setting up navigation screens. In both scenarios the wizard 32 uses real data access scenarios which are occurring at the same time for guidance in establishing the most effective navigation structure. Thus, user interfacing is not based on what engineers believe will be the optimum navigation paths but instead on "real-life" scenarios gleaned from the meta data buffer 53. The function 33 also accesses the values buffer 54 for the same purpose.

Connection Manager/Requests Receiver (Fig. 4)

Referring to Fig. 4 the provider 31 comprises a connection manager 56 and a client requests receiver 57. The connection manager 56 opens and keeps a connection to the server 5 in order to use it to send requests to it. As the provider 31 works in a multitasking environment, the requests may come from different threads (applications, clients) working simultaneously. The server 5 only allows a single simultaneous connection, and thus is maintained by the connection manager 56. The requests receiver 57 interfaces with the end user systems 2 in a manner which allows them to operate as if the server 5 were multi-threading. The effect is that the user systems 2 do not need to concern themselves with the workload of the server 5.

In order to collect and execute the requests coming from client threads the request receiver 57 listens to the requests coming and inserts them into a list (queues them). When the requests have been received, it signals the connection manager 56 that there are new requests and executes them one by one (using its connection to the server 5, signalling ("waking up") the awaiting client every time its particular request has been carried out. The clients' requests are constructed so that after posting a request to the provider 31, the request's "executed" flag is set to "no", and the client waits for the flag to be set to "yes" by the provider 31.

This technique effectively creates an additional "manager thread" in the multitasking environment, which holds its own connection to the server 5, and makes it available for other (client) threads by collecting their requests and executing them through its own connection. Therefore the server 5 "sees" all the requests as coming from a single client, while the clients "see" the data warehouse as accessible by many clients at the same time.

Predictive Data Access (Figs. 5(a) and 5(b)

Clients' queries for the server 5 often form structures which could be called paths. For example, assuming that there is a hierarchy defined in a specific cube, which looks as follows:

Continents

B Countries

B ^■ Cities

^; - Companies it is quite possible, that a client would navigate through the hierarchy from top to bottom. This process opens a level after a level, listing its members, that is - listing the continents, then countries, and so on (it is a natural way of drilling down into a hierarchy tree). The database provider 31 operates on the assumption that such paths can be characteristic for some clients, and repeatable, and it uses that assumption to speed up those clients' access to meta-data.

The provider 31 comprises a background buffering function 60 and a paths table 61, as shown in Fig. 5(a). The arrow A represents retrieval of meta data for the buffer 53. An arrow Bl represents a faster retrieval by the function 60 using a path stored in the table 61. The function 60 routes the meta data to the buffer 53 earlier than on the conventional path A because it predicts what meta data will be required on the basis of the paths stored in the table 61. The arrow C represents transfer of the meta data on to the user system 2.

The function 60 keeps a record of all paths navigated by users, and incrementally updates a counter every time a particular path is used. The count value indicates the probability of the path being required next time by the associated user system 2. When the probability reaches a particular level, this triggers the function 60 to save the path to the meta data buffer 53 for predictive data access.

In more detail, the provider 31 keeps a table of querying paths with an assigned number of times every path has been walked. The table's structure is indicated by the numeral 65 in Fig. 5(b). The path fields hold pointers to structures defining querying paths that are collected when requests from clients come. Such a structure is a list of subsequent requests concerning the same hierarchy tree as shown in Fig. 5(b). Therefore, when a request comes from a client, the provider 31 may check whether it belongs to any of the paths already held in the querying paths table 61, by comparing the last n requests with the paths stored in the table. In case a path is found in the table, the path's "has been walked" counter is increased. That provides the provider 31 with the information of how often the paths held in the table have been walked.

If a path formed by the last n requests (n is the number of last requests concerning the same hierarchy tree) is not found in the table, the provider 31 creates a new row in the table, which holds the new path, and sets the counter to 1.

Whenever a request comes and a path containing it is found in the table 61, the driver 60 computes the probability that the client 2 will keep walking the path, using the number of times the path has been walked before. The formula to compute the probability is: ^wpath

P(w)=l-e ^lpath where: P(w) - the probability of the user to keep walking the path

Wp_at - number of times the path has been walked lpath - the path's length (number of requests it consists of) e - natural logarithm base P(path) is always between 0 and 1, so it can be treated as probability. It depends exponentially on w, as shown on the graph of Fig. 6.

As we can see, p comes close to 1 when w is around 6. To make the probability dependent on the path's length (we want it to be lower if the path is longer - indeed, it is less probable that let's say 10 next requests will belong to the given path, than only 2 requests). Therefore the exponent is divided by the path's length. The larger the path's length, the slower the curve comes closer to 1. The graph of Fig. 7 shows the curves for different values of 1:

If the probability (P(w)) is "large enough", which was empirically set to more than 0.7, the provider 31 starts a background procedure of executing the next requests (the ones following the request that has just come) in the path.

Therefore, if the client's next request is indeed the one predicted there is a large possibility that the meta-data required to fulfill it will already be buffered by the function 60.

An advantage of this feature is grouping clients' requests into paths and tracing them walking those paths. This gives a possibility to reliably predict clients' next queries and provides a mechanism to ensure the possibly requested meta-data is already cached. It will be appreciated that if there is a number of clients that walk the same path from time to time, their performance may be increased in a noticeable way.

Access Security

Another aspect of the provider 31 is that is performs access security for access to elements of the server 5 specific cube at the level of individual cells of information. Furthermore, the provider 31 logs all transactions and queries between a user system 2 and the server. Regarding the wizard function 32, this allows the user to perform administrative functions including:

- management of all elements in a navigation database (Applications, Categories, Topics, Views);

- creating linkages between server 5 queries and views;

- running MDX queries over the server 5 environment; and

- reading meta data parameters of server 5 elements (main and specific cubes, dimensions, levels, members).

As described above, access by the wizard 32 to the meta data buffer 53 allows dynamic monitoring of real meta data and paths for more meaningful development of navigation structures.

It will be appreciated that the invention provides for comprehensive real time compatibility, and/or improved efficiency, and/or greater user management and administrative control for user-database server communication.

The invention is not limited to the embodiments described but may be varied in construction and detail.

Claims

1. A database provider system (1) comprising:

a provider (31) for interfacing with a plurality of user systems (2) to receive data requests, and for interfacing with a database server (5) for satisfying the requests,

a navigation setup function (32) for interfacing with the provider, with user systems (2), and with a database server (5) to generate user navigation interfaces for user systems (2), and

2. A system as claimed in claim 1, wherein the setup function (32) comprises a monitoring function (33) for monitoring database server meta data (53) for use in generating navigation interfaces.

3. A system as claimed in claim 2, wherein the provider (31) is operable to cache meta data in a buffer (53) and the monitoring function (33) is operable to monitor the meta data buffer (53) to determine the meta data.

4. A system as claimed in claim 3, wherein the meta data buffer (53) is solely for meta data.

5. A system as claimed in any preceding claim, wherein the setup function (32) comprises a monitoring function (33) for monitoring values retrieved from the server (5) for use in generating navigation interfaces.

6. A system as claimed in any preceding claim, wherein the provider (31) is operable to cache retrieved values and meta data in separate buffers (53, 54).

7. A system as claimed in claim 6, wherein the provider (31) is operable to cache i the meta data and the values as objects which inherit properties and methods from data access requests.

8. A system as claimed in claim 7, wherein a value object is linked with a user system view.

9. A system as claimed in claims 7 or 8, wherein the objects are independent, and the provider (31) comprises an interface mechanism for inter-object requests.

10. A system as claimed in any preceding claim, wherein the provider (31) comprises a connection manager (46) for managing a single connection with a database server (5) and a request receiver (57) for receiving multiple user data access requests and for multiplexing them onto said server connection.

11. A system as claimed in claim 10, wherein the request receiver (57) signals the connection manager (56) when a request has been received and updates a user system flag to indicate access request status.

12. A system as claimed in claim 11, wherein there is a flag for each request.

13. A system as claimed in any preceding claim, wherein the provider (31) comprises a background function (60) for writing data path meta data to a buffer for use without need to subsequently retrieve the same meta data.

14. A system as claimed in claim 13, wherein the background function (60) caches a path's meta data only if it predicts that the path is likely to be re-used by a user system.

15. A system as claimed in claim 14, wherein the background function (60) maintains a paths table (61) of paths being used in real time, and uses this table to determine probability of path re-use.

16. A system as claimed in claim 15, wherein the background function (60) maintains a count of uses of each path in the path table (61) and caches a path if the count reaches a value at which the probability of re-use is sufficiently high.

17. A system as claimed in claim 16, wherein the background function (61) searches the last n paths in the table (61) to determine if it already is in the table, increments the counter if it is, and creates a new record for the path if it is not.

18. A system as claimed in any of claims 15 to 17, wherein the paths table (61) comprise a single record for each path, the record having the path meta data and the count value.

19. A system as claimed in any preceding claim, wherein the provider (31) is operable to automatically log all transactions.

20. A system as claimed in any preceding claim, wherein the setup function (32) is operable to manage elements in a views database of a user system (2).

21. A system as claimed in claim 20, wherein the setup function (32) is operable to set up linkages between database queries and views.

22. A database provider as claimed in any preceding claim, wherein the provider operates only in a read-only mode to prevent database server record locking.

23. A database provider as claimed in any preceding claim, wherein the provider converts database element names to ensure uniqueness of names.

24. A database provider as claimed in claim 23, wherein the provider adds and deletes prefixes.

25. A database provider as claimed in any preceding claim, wherein the provider automatically generates error messages if a request is received for data in a non-existing database dimension.

26. A database provider as claimed in any preceding claim, wherein the provider optimises processor performance by deleting redundant commands generated at a user system.

27. A database provider as claimed in any preceding claim, wherein the provider (31) automatically converts symbols such as types of parentheses for compatibility.

28. A database provider as claimed in any preceding claim, wherein the provider (31) automatically deletes keywords which are not recognised by a database server (5).

29. A database provider substantially as described with reference to the drawings.

0. A computer program product comprising software code for completing a database provider as claimed in any preceding claim when executing on a digital computer.