GB2461818A

GB2461818A - Method of identifying applications in a database system

Info

Publication number: GB2461818A
Application number: GB0914797A
Authority: GB
Inventors: Holger Karn; Torsten Steinbach
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2008-10-06
Filing date: 2009-08-25
Publication date: 2010-01-20
Also published as: GB0914797D0

Abstract

A method identifying a client application (A1) in a database system (18), the method comprising the steps of:collecting historical data on database transactions (DT) in the database system (18) (step 210);identifying identical database transactions (DT) by assigning unique transaction identifiers (TI) to them (step 212);analyzing patterns of occurrence of said transaction identifiers (TI) (step 214);mapping said patterns of transaction identifiers (TI) to client applications (A1) (step 216). In a preferred embodiment of the invention, identical database transactions are identified by reducing incoming sequences (40) of SQL statements (50) to their basic structures (60) (step 114) and defining transaction identifiers (TI) based on the basic structures (60) of the sequences (40) (step 116).

Description

DESCRIPTION

METHOD AND SYSTEM FOR IDENTIFYING

APPLICATIONS IN A DATABASE SYSTEM

FIELD OF THE INVENTION

The invention relates generally to techniques for managing database systems, especially for evaluating and enhancing database system performance. Specifically, the invention provides a method for identifying applications executing in a database system.

BACKGROUND OF THE INVENTION

In a typical database system, a host of different database applications will execute millions of transactions on the database system over the course of a single day. A database administrator is responsible for a smooth and unobstructed operation of the database system. In order to accomplish this task, this database administrator has to continuously monitor and tune the database system and to identify performance prob]ems on the database system.

Database administrators typically focus monitoring and analysis activities on the most critical applications, neglecting less critical applications. Standard applications such as Websphere, SAP, Data Stage, Cognos etc. comprise as set counters which is used to transmit application information as application name, end user ID, end user workstation name etc. to the database system. Thus, when executing a standard application, the database system receives information which enables the database system to characterize the application, the user etc. unequivocally.

However, many clients of database systems use "homegrown" or vendor-built database applications which may not contain the set of counters used by standard applications to enable distinct identification of the application by the database system. These client-specific applications may either not make use of counters at all, or they may use default values such as, for example, "jawaw.exe" as an application name for all applications written in Java. As a consequence, the database administrator has to expend considerable effort to distinguish these applications. By manually analyzing these applications, the database administrator may, for example, discover that all applications identified with the generic identifier "jawaw.exe" and running an a specific client system are instances of an application called "preserver". This process of manually analyzing and subsuming client specific applications accessing a data base system can be very cumbersome and time-consuming, especially if there are hundreds of them or if there is no information available linking the various applications to individual clients.

Thus, it is desirable to detect database applications accessing a database with as little manual effort as possible.

SUMMARY OF THE INVENTION

It is an objective of the invention to provide a method for automatically identifying applications in a database system.

Moreover, it is an objective of the invention to generate a unique identification key for database transactions type, the key being the same for different instances of this transaction.

These objectives are achieved by the features of the independent claims. The other claims and the specification disclose advantageous embodiments of the invention.

According to a first aspect of the invention, a method of identifying an application in a database system is provided. The method comprises the steps of (1) collecting historical data on database transactions in the database system, (2) identifying identical database transactions by assigning unique transaction identifiers to them, (3) analyzing patterns of occurrence of said transaction identifiers and (4) mapping said patterns of transaction identifiers to client applications. Thus, patterns of transaction identifiers are used as fingerprints for applications. In this way, applications that do not make use of the counters available in standard database applications can still be automatically identified.

According to a second aspect of the invention, a system for identifying an application in a database system is provided. The system comprises (1) a Sample Collection System for collecting historical data on database transactions in the database system, (2) a Transaction Analysis System for identifying identical database transactions by assigning unique transaction identifiers to them, (3) a Frequency Analysis System for analyzing patterns of occurrence of said transaction identifiers and (4) an Application Identification System for mapping said patterns of transaction identifiers to client applications.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention together with the above-mentioned and other objects and advantages may best be understood from the following detailed description of the embodiments, but not restricted to the embodiments, wherein is shown in: FIG. 1 a schematic flow diagram of a business transaction; FIG. 2 a schematic diagram of a system 1 to be used for carrying out business transactions involving database transactions; FIG. 3 a flow diagram of a preferred embodiment of a method for analyzing sequences of SQL statements executing in a database system; FIG. 4 a schematic diagram depicting details of the reduction step 114 in the method of FIG. 3;

FIG. 5a a diagram of a sequence of SQL statements

representing an instance of a database transaction; FIG. 5b a diagram of the basic structure of the sequence of FIG. 5a; FIG. 5c a diagram showing the result of a hash algorithm performed on the basic structure of FIG. 5c; FIG. 5d the transaction identifier of the sequence of FIG. 5a; FIG. 6a-6c schematic representations of SQL commands; FIG. 7a-7d diagrams of sequences of SQL statements representing instances of a database transaction involving the SQL commands of FIG. 6a-6c; FIG. 8a-8d diagrams of basic structures derived from the of SQL

statement sequences of FIG. 7a-7d;

FIG. 9a a diagram of the sequence of SQL statements of FIG. 4a failing during exection; FIG. 9b a diagram of the basic structure of the sequence of FIG. 9a; FIG. 9c a diagram showing the result of a hash algorithm performed on the basic structure of FIG. 9c; FIG. 9d the transaction identifier of the sequence of FIG. 9a after reconstruction; FIG. 10 a flow diagram of a preferred embodiment of a method for detecting database applications; FIG. ha a histogram showing numbers of database transaction instances executed by client 20 during time interval At 1; FIG. hib a histogram showing numbers of database transaction instances executed by client 20' during time intervals Atl and At2; FIG. 11c a histogram showing numbers of database transaction instances executed by client 20 during time interval At2; FIG. 12 a computer system implementation of the invention.

In the drawings, like elements are referred to with equal reference numerals. The drawings are merely schematic representations, not intended to portray specific parameters of the invention. Moreover, the drawings are intended to depict only typical embodiments of the invention and therefore should not be considered as limiting the scope of the invention.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

Business transactions involve communication carried out between separate entities or objects, often involving the exchange of items of value such as information, goods, services and money.

Many business transactions involve electronic data processing.

As an example, consider the (schematic) electronic banking process 1 shown in FIG. 1 in which a user uses a bank machine to withdraw money from his banking account. This electronic banking application 1 begins by an authentication step 2 (e.g. demanding the user to type in a password) If authentication 2 is successful, the user is able to access a web page linked to his bank account. As the user selects a "withdraw cash" option on this web page and types in a desired amount (step 3), an SQL statement 4 is made to a bank's database containing information on the current balance of this account as well as credit information. Based on this information, the user's request is evaluated (step 5) and either approved or refused. If the request is approved, another SQL statement 6 is made to the bank's database, in which the account balance is reduced by the amount that is being withdrawn, the cash is delivered to the user (step 7) and a receipt is issued. In this example, steps 4 and 6 involve database transactions DT1, DT2 in which the bank's database is accessed. From a technical point of view, the electronic banking process 1 of FIG. 1 transaction of is a database application Al involving one or more database

statements (statements 4 and 6)

FIG. 2 shows a schematic of a system 10 to be used for carrying out database applications 8 such as the online banking process 1 described above. User 11 will typically use a banking machine 12 to access the bank's computing system 14 or access the bank's computing system via a network connection. Computing system 14 encompasses a number of workstations 15 executing the bank's applications. When carrying out a database application 8 such as the electronic banking application 1 of FIG. 1, the bank's computing system 14 uses a network connection 16 such as Internet or Intranet to access a bank database 17 in which the pertinent data are stored. The bank database 17 resides in a database system 18 comprising a number of database instances 19 which work together as one logical database system (a so-called partitioned database) . The bank's computer system 14 acts as a client 20 of the database system. Besides the bank's database 17, database system 18 hosts a number of other databases 17' containing data of other clients 20' A database administrator 30 accesses the database system 18 and supervises the performance of the database system 18 subject to the various database applications DT, DT' of clients 20, 20' accessing this database system 18 at any given time. This is a formidable task, considering that a typical large database server may execute more than 10 000 database transactions per second. Since the number of transactions in the database system 18 is very large, it is not practicable to analyze database transactions on a case-by-case basis; rather, attention should be focused on those (few) database transactions which are executed hundreds or thousands of times during that day and thus have the highest impact on the performance of the database system 18.

From the technical point of view, database transactions DT issued to the database 18 via network 16 are expressed as consecutive sequences of database operations (SQL statements) each of which is executed as an atomic operation. Thus, database transactions DT, DT' issued by the clients 20, 20' present themselves to the database administrator 30 as sequences of SQL statements. Client information can be obtained by providing an interface wrapper around client applications Al which intercepts the database access via client interfaces, i.e. on the client side. In this way, information on the client 20 (such as client IP address, process ID of the client operating system process etc.) can be tracked in conjunction with the database transactions DT1, DT2 submitted by this client 20.

Each sequence of SQL statements executing in the database system 18 implements an instantiation of a given database transaction DT and depends on the specific data of this instantiation.

Online transaction processing (OLTP) database applications typically provide a fixed set of SQL statements from which the database transactions DT are constructed. Present day database systems 18 provide means for identifying and classifying individual SQL statements, so that identical SQL statements within a database DT can be automatically recognized as being identical. Each statement is thereby uniquely identified by its statement text; a different statement text will identify a different SQL statement. Moreover, present day database systems 18 provide a so-called SQL Activity Trace which contains a chronological listing of all SQL statements processed by the database system 18. This SQL Activity Trace forms the basis of an analysis method 100 for automatically identifying different instances of database transactions DT as originating from one and the same database transaction DT. This method 100 can thus be used to analyze sequences of SQL statements executing in database system 18 in such a way as to classify them as belonging to a given database transaction. A schematic flow diagram of method 100 is shown in FIG. 3.

The analysis method 100 begins by using the SQL Activity Trace to extract sequences of SQL statements 50 executing in the database system 18 (step 110) . Each individual sequence 40 corresponds to an individual database transaction instance DT and ends with a terminal statement 51 ("Commit" or "Rollback") indicating a successful execution ("Commit") of the transaction or a failure of the transaction ("Rollback") FIG. 5a shows a sequence 40 of SQL statements 50 corresponding to a specific instance of a given database transaction DT executing successfully.

Having unequivocally identified sequences 40 of SQL statements (step 112 of method 100), each of these sequences 40 is reduced to a so-called "basic structure" in step 114. In this reduction step 114, a sequence 40 of SQL statements 50 is stripped to its essential features and analyzed for typical patterns. Minor differences in the SQL statement sequences 40 are ignored, so that sequences differing on]y slight]y (e.g. by the frequency of execution of a given statement type) will be identified as belonging to the same database transaction DT. As a result of step 114, a basic structure 60 of sequence 40 is generated as an SQL statement list. In a preferred embodiment, the reduction step 114 will transform a given sequence 40 of SQL statements 50 into the sequence's basic structure 60 by -registering the first occurrence of each type of SQL

statement within the sequence 40 and

-forming the basic structure 60 as a sequence of the types of SQL statements in their order of first occurrence, omitting all subsequent occurrences of these types of SQL statements.

A flow diagram of this step 114 is shown in FIG. 4.

As an example, consider the sequence 40 of FIG. 5a containing SQL statements SQL1 to SQL5. The basic structure 60 of this sequence 40 is obtained by sequentially analyzing the SQL statements of sequence 40 (steps 156 and 158 of FIG. 4) and copying the first occurrence of each SQL statement in sequence into the basic structure 60 (step 160 of FIG. 4), omitting all repetitive occurrences of these SQL statements 50. Since the first position 501 of sequence 40 marks the first occurrence of SQL1, this statement is adopted into the first position 601 of the basic structure 60. By the same token, the second position 502 of sequence 40 marks the first occurrence of SQL2, so that this statement is adopted into the basic structure's 60 subsequent second position 602. The third position 503 of sequence 40 is occupied by an SQL1 statement which already occurred in position 501 of sequence 40 and thus -since it constitutes a repetition -is not adopted into the basic structure 60. The fourth statement 504 of sequence 40 marks the first occurrence of statement SQL5, so that this statement is adopted into the next available position 603 of basic structure 60. Following this scheme, statements SQL4 and SQL3 contained in the fifth and the eighth positions 505, 508 of sequence 40, will be adopted into the fourth and fifth positions 604 and 605 of basic structure 60, whereas all remaining statements in sequence correspond to subsequent occurrences of statements SQL1, SQL2 already registered in basic structure 60 and thus will be omitted. The resulting basic structure 60 of sequence 40 thus generated is shown in FIG. Sb.

As additional illustrative examples of the reduction step 114, FIG. 8a -8c show some basic structures 60a -60c generated by the reduction step 114 from sequences 40a -40c (FIG. 7a -7c) which are specific instances of database transactions DTa -DTc shown schematically in FIG. 6a -6c. As will be recognized from these examples, the reduction step 114 is not only performed on SQL statements SO within sequences 40 but also on all individual -10 -operations within SQL statements, thereby reducing individual

SQL statements 50 to their basic structure.

FIG. 6a displays a database transaction DTa comprising a SELECT command. Database applications DI typically use a cursor to retrieve the results of an SQL query from the database server 18. The query is prepared for execution, the cursor is opened and result rows are fetched via multiple executions of the FETCH operation, depending on the amount of returned data. After the last FETCH operation, the cursor is closed.

Depending on the set of data to be returned, different instances of this SELECT database transaction DTa will contain different numbers of FETCH operations; FIG. 7a shows a sequence 40a corresponding to a specific instance containing three consecutive FETCH operations 0P5 corresponding to a data set containing three data rows. During the reduction step 114, all multiple occurrences of FETCH operations 0P5 are omitted, so that sequence 40a of FIG. 7a is reduced to the "basic" structure 60a of FIG. 8a.

Different instances of the SELECT command DTa may display different numbers of coiisecutive FETCH operations 0P5 but will otherwise be identical. Reduction step 114 will transform all instances of database transaction DTa into same the basic structure 60a of FIG. 8a, irrespective of the number of FETCH operations 0P5 they contain. Thus, all instances of database transaction DTa will be reduced to basic structure 60a characteristic of this database transaction DTa. The basic structure 60a of the SELECT command may be represented by the sequence of PREPARE / EXECUTE / OPEN / FETCH / CLOSE operations and may be handled as a composite SQL statement QRY1.

FIG. 6b shows a database transaction DTb which will fetch result rows of a query and then execute several subsequent SQL -11 -statements to process this data. The number of SQL statements needed to process each result row is invariant, but the number of executions may vary for different instances of the transaction DTb depending on the number rows that need to be processed in the specific instance. FIG. 6b shows a data application in which two subsequent SQL statements STMT-A, STMT-B are executed for each row returned by the outer FETCH operation 0P5. FIG. 7b shows a schematic of a sequence 40b of SQL statements corresponding to an instantiation with a data set containing two result rows. As the reduction step 114 is applied to sequence 40b, all multiple occurrences of statements are omitted while preserving the order of first occurrence of each type of SQL statement, so that sequence 40b of FIG. 7a is reduced to the basic structure 60b of FIG. 8b.

Different instances of the database transaction DTb may display different numbers of consecutive repetitions of the statement sequence STMT-A, STMT-B, but will otherwise be identical.

Reduction step 114 will transform all instances of database transaction DTb into the basic structure 60b of FIG. 8b, irrespective of the number of loops (0P5 -SQL2 -SQL3) they contain. Thus, all instances of database transaction DTb will be reduced to basic structure 60b characteristic of this database transaction DTb. The basic structure 60b of this database transaction DTb may be represented by the sequence QRY2 / SQL2 / SQL3.

As a variant of the example of FIG. 6b, FIG. 6c shows a database transaction DTc which will execute different sets of SQL statements when processing the returned data, the various branches depending on the contents of the returned data.

Database application DTc contains three branches corresponding to three conditional clauses depending on parameters A, B and C. Different branches A, B and C have different code paths STMTA, -12 -STMTB and STMTC and will therefore execute different (sets of)

SQL statements SQL2, SQL3 and SQL4.

FIG. 7c and 7d show schematics of two sequences 40c and 40d of SQL statements corresponding to two different instances of database transactions DTc in which -due to the data set encountered -branches A, B and C were traversed in different number in different order. As a result, basic structures 60c, 60d generated from these sequences 40c, 40d in reduction step 114 differ insofar that they contain elements of the set of statements {SQL2, SQL3, SQL4} and in different number and order, depending on the specific data the database transaction DTc operated on. From a data server perspective, the performance behavior may be totally different for the different branches A, B and C. Thus, different instances of database transactions DTc containing conditional clauses will not be mapped onto a single basic structure, but rather a number of different basic structures 60c, 60d.

Reduction step 114 may be complemented with a step 115 which adds/complements parameter markers in SQL statements. Parameter markers are used by database applications whenever the values in a WHERE c]ause (specifying a set of conditions which will be true for every output row) are different between executions of the same statement. The actual values of the parameters are passed on as additional parameters to the database server and only used during execution. As an example, the statement SELECT * FROM DB.TBL WHERE COL1=? retrieves all rows of table DB.TBL for which the value of COL1 corresponds to the actual value of the parameter marker. This construction enables reusing previously created control structures (called "access path"), since the statement text always looks the same to the database server, independent of the -13 -actual value of the parameter marker. Well written applications will use of parameter markers wherever possible, since this and reduces the time required to execute the SQL statement. Some applications accessing database system 18 may, however, contain sub-optimal coding which does not use parameter markers but creates SQL statements containing the actual values as part of the statement text. In the example cited above, the

corresponding SQL statements would read:

SELECT * FROM DB.TEL WHERE COL1=1 SELECT * FROM DB.TEL WHERE COL1=2 Each of these statements would create a different SQL statement, leading to a host of different basic structures for the corresponding database transaction. In order to map these onto a single basic structure, all tokens in the statement text are examined to find out whether they could be replaced by a parameter marker. If a token is found which could be replaced by a parameter marker, it will be replaced by a question mark, thereby transforming the associated set of statements into statements with parameter markers. The reduction method 114 will thus identify these statements as identical and map the corresponding sequences onto one and the same basic structure.

Once the sequence 40 has been reduced to its basic structure 60 using the reduction step 114, it is assigned an unambiguous transaction identifier corresponding to this basic structure 60 in step 116 of method 100. Since sequences 40 originating from a given database transaction DT are mapped onto a common basic structure 60, they will receive the same transaction identifier, indicating that these sequences correspond to specific instances of one and the same database transaction characterized by the transaction identifier. The transaction identifier can then be used to monitor and analyze the performance of the various -14 -database transactions executing in the database system in a very efficient way.

Advantageously, the pattern of the basic structure 60 itself is used as a transaction identifier. In a preferred embodiment, the transaction identifier of sequence 40 is generated by performing a hash algorithm on each SQL statement 50 of the basic structure of sequence 40; this yields an (short) integer value for each SQL statement 50 in the basic structure 60. The sequence of integer values thus generated is interpreted as an integer array which is used as a transaction identifier for sequence 40. This method will thus generate a (typically very long) identifier which can be directly used for transaction identification. By using a hash algorithm on each SQL statement 50 of the basic structure 60, the original SQL statement string is transformed an into integer value representing that SQL statement 50. This enables a very fast and efficient way of generating, handling and storing transaction identifiers. Notably, it greatly facilitates the analysis of database transactions, since comparisons of integer values execute much faster than multiple string compares for each SQL statement text contained in the transaction. This strongly reduces the time required assign identifiers to specific sequences 40 and thus constitutes a great advantage in a database environment in which millions of transactions are processed every day.

Various hash algorithms are known that can be applied to each SQL statement of the basic sequence in order to generate an integer transaction identifier of that sequence. Note that, in principle, hash algorithms are problematic in the sense that they will never generate unique keys; thus, in principle, collisions may occur. However, good hash algorithms have a probability of collisions which is less than 1%. Since the transaction identifiers are generated from a sequence of such good hash values, the likelihood of different database -15 -transactions having the same transaction identifier will shrink with the number of SQL statements in the transaction.

FIG. 5c and 5d show an example of an identifier assignment step using a hash algorithm as applied to the basic structure 60 of sequence 40 of FIG. 5a. FIG. 5c shows the result of the hash algorithm performed on each SQL statement 50 of basic structure 60, resulting in a sequence of integer values 56. A transaction identifier TI is generated by appending these integers 56 to each other sequentially, thus forming an integer array (FIG.

5d) . As shown in FIG. 5d, the transaction identifier TI of sequence 40 thus generated may be supplemented with a header section 71 for storing additional information related to sequence 40, e.g. information pertaining to the client who issued the sequence 40, performance indicators related to the sequence execution, start and stop timestamps of the sequence execution etc. In this way, transaction identifiers TI can be related to the clients 20 who submitted the various database transactions, thus enabling an analysis of the performance of database transactions as to specific clients.

The transaction identifiers TI of all database transactions DI executing on the database are transmitted to a monitoring application 28. The monitoring application 28 will support the database administrator 30 in grouping and aggregating the information pertaining to the various types of transactions and identify those types of transactions which require further analysis. If, for example, the demand on database resources is found to be exceedingly high during a certain period of time, database administrator 30 can investigate the pattern of database transactions DT executing during this period of time and extract the ones that caused the problem. Thus, overly costly database transactions can be detected, and programming attention can be focused on improving these specific database transactions in order to make them faster and/or less costly.

-16 -Up to now, it was assumed that all instances of database transactions submitted to database system 18 execute successfully. However, a given instance of a database transaction DI may not execute as required due to an SQL statement failure within the transaction. As an example, FIG. 9a shows a sequence 40' representing an instance of the database transaction DI failing while executing the seventh SQL statement 507; the failure is accompanied by a final "Rollback" statement.

As a consequence of this failure, all subsequent SQL statements within this transaction will be bypassed and the transaction will terminate immediately. In a case like this, the basic structure 60' calculated for the sequence 40' will likely differ from the basic structure 60 of successfully executing instances due to missing terms. In a case like this, as the transaction identifier TI' is computed from basic structure 60' (see FIG. 9c and 9d), an error flag 72 is generated and stored in the header 71 of the transaction identifier TI', the numeral 1' indicating that the sequence 40 contained a failure; also, information on specific SQL statement causing the failure can be stored in the header section 71 of transaction identifier TI' . The monitoring application 28 can evaluate sequences 40' aborted prematurely and can try to identify the database transaction corresponding to this failed sequence 40' by matching the first (successfully executed) statements of this sequence 40' to the leading statements of other (previously monitored) sequences. If a match is found, the transaction identifier TI' of the failed sequence 40' is overwritten by the transaction identifier of the previously detected, successfully terminated sequence. In the case of failed sequence 40' of FIG. 9a, the leading six statements correspond to the leading six statements of the sequence 40 of FIG. 5a, so that it is assumed that the failed sequence 40' -had it executed successfully -would have exhibited the statement sequence 40 of FIG. 5a. Thus, the transaction identifier TI' is replaced by transaction identifier TI of FIG. 5d, while the header section 71 still contains the -17 -flag 1' indicating that this instance 40' of database transaction DT failed during execution. This information can be used for analyzing how often database transactions of a given type fail and which SQL statement within the database transaction caused the failure.

The method 100 outlined above is capable of assigning various sequences 40 of SQL statements 50 executing in database system 18 to a given database transaction DT by means of a transaction identifier TI. This unequivocal relation between these sequences and transaction identifier TI can be used as a basis for a method 200 geared at application mining in database system 18.

Method 200 relies on the fact that any given application will always execute a given set of database transactions DT and have a given relationship between the number of executions of these transactions DT. Thus, if the type and number of executions of transactions in database system 18 can be monitored, the type of applications executed may be inferred.

Method 200, a schematic diagram of which is depicted in FIG. 10, begins with monitoring database transaction DT in database system 18 for a given period of time At (step 210) . In order to establish a relation between a specific client's 20 applications as a function of the database transactions executing on the database system, all sequences of SQL statements 40 executing during this period of time At and originating from all clients are assigned transaction identifiers TI, e.g. by using the method outlined above (step 212) . The frequency of occurrence of the various transaction identifiers TI is stored in a monitoring repository 29 for further analysis. Monitoring repository 29 will thus contain a set of transaction identifiers TI, the frequency of their occurrence within time At as well as information pertaining to the specific client 20, 20' database transactions of a given transaction identifier originated from.

Analysis step 214 will analyze this frequency pattern with -18 -respect to the list of database clients 20, 20' in order to detect which of them have executed a given set of transactions and have a given relationship between the execution frequency of these transactions.

FIG. ha and iib show histograms 251, 252 of occurrences of various transaction identifiers TI-i to TI-x of transactions submitted during a given time interval Ati by given clients 20 (FIG. ha) and 20' (FIG. iib) respectively. By comparing the pattern of database transactions issued by clients 20 and 20' during time interval Atl (histograms 251 and 252), a resemblance of these histograms 251, 252 is detected, since the ratio of transactions of type TI-i to transactions of type TI-3 is approximately the same (1:1) in each histogram. It is therefore inferred that the same application APPL-A is installed in both clients 20, 20' Now assume that a banking application Al executes two database transactions DT1 and DT2 (see FIG. 1), DT1 representing a database transaction related to checking the current account balance and credit rating and DT2 representing a database transaction related to withdrawing cash from an account. As client 20 accesses the database system 18 in the course of performing application Al, a connection is made from client workstation 15 to database system 18 via network 16, and information such as client's 20 identification code, workstation ID etc. is transmitted to the database system 18. Typically, a cash withdrawal from an account will require a certain cash record on that account, so that banking application Al will first execute transaction DT1 (i.e. verifying the current account balance and credit rating stored in database 18) followed by transaction DT2 (i.e. reducing the current account balance stored in database system 18 to account for the amount of cash withdrawn) . Therefore, the banking application Al will typically imply one execution of DT1 and DT2 each. From -19 -analyzing histograms 251, 252, database administrator 30 may infer that the application APPL-A installed in both servers 20, 20' corresponds in fact be banking application Al and may give APPL-A a more meaningful name, such as "Withdraw-Money".

By using patterns such as APPL-A as "fingerprints", Method 200 will assist the database administrator in recognizing and analyzing the various applications submitted to the database system 18 by the clients: FIG. llc shows a histogram of the number of database transaction 11-1 to TI-x submitted by client during a different time interval At2; during this time interval At2, number of database transaction submitted by client 20' corresponds to the histogram shown in FIG. llb. By using method 200, it is discovered that during this time period At2, client 20 still issues database transactions 11-1 and 11-3 at approximately equal ratio, indicating that APPL-A is still installed; it is also found that at least one other application APPL-B is installed besides APPL-A, this application APPL-B comprising database transactions 11-2 and 1I-X.

Mapping step 216 of method 200 will thus detect specific client applications APPL-A, ARPL-B from comparing the pattern of occurrences of transactions between different clients 20, 20' Note that the overlap will never amount to 100% since the transactions are independent database operations and the application logic might decide to process them in a different order or frequency. Thus, applications are interpreted as identical if more than a given percentage X (e.g. 80%) of the transactions belong to the same set and if their execution frequency overlaps more than a given percentage Y (e.g. 80%) Thus, sequences of transaction identifiers can be used as fingerprints for applications. The longer the period of time during which the transactions are monitored, the more precise will this kind of detection be. One can thus analyze the -20 -transactions executed by each database client 20, 20' and deduce the application being performed from the transaction pattern.

The database administrator 30 may repeat the method 200 of autodetecting applications at certain time intervals in order to automatically detect newly deployed applications in the database system.

Referring now to FIG. 12, a computer system 300 implementation of the preferred embodiment of the present invention is shown.

Specifically, the present invention can be implemented as a computer system 300 and/or program product for identifying database transactions and relating these to database applications executing on a database system 18. This allows a user 30, specifically a database administrator, to analyze individual database transactions and applications with respect to their impact on database system performance.

As depicted, computer system 300 generally comprises memory 312, input/output (I/O) interfaces 314, a central processing unit (CPU) 316, external devices/resources 318, bus 320 and data base 338. Memory 312 may comprise any known type of data storage and/or transmission media, including magnetic media, optical media, random access memory (RAM), read-only memory (ROM), a data cache, a data object etc. Moreover, memory 312 may reside at a single physical location, comprising one or more types of data storage, or can be distributed across a plurality of physical systems in various forms. CPU 316 may likewise comprise a single processing unit, or be distributed across one or more processing units in one or more locations, e.g. on a client and server. I/O interfaces 314 may comprise any system for exchanging information from an external source. External devices 318 may comprise any known type of external device, including keyboard, mouse, voice recognition system, printer, monitor, facsimile etc. Bus 320 provides a communication link between -21 -each of the components in the computer system 300 and likewise may comprise any known type of transmission link, including electrical, optical, wireless etc. In addition, although not shown, additional components such as cache memory, communication systems, system software etc. may be incorporated into computer system 300.

Database 338 provides storage for information necessary to carry out the present invention. Such information could include e.g. contents of monitoring repository 28, client lists of database system 18, client IP addresses, etc. Database 338 may include one or more storage devices, such as a magnetic disk drive or an optical disk drive. In another embodiment, database 338 includes data distributed across, for example, a local area network (LAN), wide are network (WAN) or a storage area network (SAN) Database 338 may also be configured in such a way that one of ordinary skill in the art may interpret it to include one or more storage devices. Moreover, it should be understood that database 338 could alternatively exist within computer system 300.

Stored in memory 312 is logic system 326. As depicted, logic system 326 generally includes Database Transaction Analysis System 330 and Database Application Detection System 340 Database Transaction Analysis System 330 comprises SQL Identification System 332, Sequence Identification System 334, Structure Extraction System 336 and Transaction Identifier Generating System 338. Database Application Detection System 340 comprises Sample Collection System 342, Transaction Analysis System 344, Frequency Analysis System 346 and Application Identification System 348. The systems shown herein carry out the functions described above.

SQL Identification System 332 will use standard database features to detect identical SQL statements. Sequence -22 -Identification System 334 will identify instances of database transactions DT as sequences 40 of consecutive SQL statements 50. Structure Extraction System 336 will reduce these sequences to their basic structures 60. Transaction Identifier Generating System 338 will define transaction identifiers TI based on the basic structures 60 of the sequences 40. -Sample Collection System 342 will accumulate instances of database transactions (DT) for a given period of time At. Transaction Analysis System 344 will identifying identical database transactions DT by assigning unique transaction identifiers TI to them; note that Transaction Analysis System 344 may resort directly to the results attained by Database Transaction Analysis System 330. Frequency Analysis System 346 will analyze the transaction identifiers TI with respect to their temporal patterns of occurrence within time period At. Application Identification System 348 will for mapping said patterns to client applications Al, thus using the fingerprint of database transactions as a means of identifying client applications Al.

The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc. Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer readable medium providing program code for use by or in connection with a computer or any instruction execution system.

For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by on in connection with the instruction execution system, apparatus, or device.

-23 -The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read-only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.

While the foregoing has been with reference to particular embodiments of the invention, it will be appreciated by those skilled in the art that changes in these embodiments may be made without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims.