GB2516357A - Methods and apparatus for monitoring conditions prevailing in a distributed system - Google Patents

Methods and apparatus for monitoring conditions prevailing in a distributed system Download PDF

Info

Publication number
GB2516357A
GB2516357A GB1409563.2A GB201409563A GB2516357A GB 2516357 A GB2516357 A GB 2516357A GB 201409563 A GB201409563 A GB 201409563A GB 2516357 A GB2516357 A GB 2516357A
Authority
GB
United Kingdom
Prior art keywords
message
measurement data
measurement
logging
client application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1409563.2A
Other versions
GB2516357B (en
GB201409563D0 (en
Inventor
Mark Patrick Henry Eastman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ADVANCED BUSINESS SOFTWARE AND SOLUTIONS Ltd
Original Assignee
ADVANCED BUSINESS SOFTWARE AND SOLUTIONS Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ADVANCED BUSINESS SOFTWARE AND SOLUTIONS Ltd filed Critical ADVANCED BUSINESS SOFTWARE AND SOLUTIONS Ltd
Publication of GB201409563D0 publication Critical patent/GB201409563D0/en
Publication of GB2516357A publication Critical patent/GB2516357A/en
Application granted granted Critical
Publication of GB2516357B publication Critical patent/GB2516357B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems

Abstract

Client application container 22-1 comprises application 24-1 accessed by client device 14-1. The container is on a different physical or virtual machine than statistics server 16 where data are sent for persistence. Logging of performance measurements is started by application 24-1 calling local performance logger 30-1, which generates records comprising measurement results and metadata. Local logger 30-1 determines whether to store result remotely on 16 and passes results and metadata to asynchronous FIFO queue 33-1 via remote logger 32-1. Messages are then sent to primary logging environment 16 when sufficient resources are available without having significant impact on execution of application 24-1 (e.g. when the application is not particularly busy). Messages in queue 33-1 have associated time-to-live parameters to remove them from the queue when the parameters expire. Time-to-live parameters are common to all messages or configurable. Queue 33-1 uses processing threads of lower priority than those of application 24-1.

Description

Methods and apparatus for monitoring conditions prevailing in a distributed system The present invention relates to apparatus and associated methods for monitoring conditions prevailing in a distributed system. The invention has particular although not exclusive relevance to apparatus and associated methods for monitoring conditions prevailing in a client-server based computer system.
When implementing distributed systems, such as client-server based systems that implement a distributed application structure in which one or more servers provide one or more client machines with access to resources and/or services, there is often a need to provide mechanisms by which system performance and the like can be measured, monitored, analysed, viewed and recorded. Such mechanisms may be required, for example, to provide an early, or even advanced indication, of potential technical issues arising in the system such as latency in one or more distributed applications exceeding an acceptable level, a client machine or application monopolising resources to the detriment of performance elsewhere in the system, communication bottlenecks arising, a significant reduction the ability of an end user is able to navigate an application efficiently and effectively, outright system failure, or the like. Thus, such mechanisms can beneficially allow appropriate corrective or preventative action to be taken promptly. Such mechanisms may also be required to allow the provider of a particular service or range of services, via the distributed system, to monitor system performance levels or the like against predetermined criteria such as acceptable latency levels, acceptable resource provision levels, acceptable resource usage levels, acceptable application navigation speeds or the like. These criteria may, for example, represent levels of performance agreed, in advance, with an end user and/or may represent levels of performance dictated by operational constraints such as communication bandwidths, resource availability, or the like.
However, the very act of measuring and monitoring system performance, and recording measurement data in a common location for analysis purposes, can add to the overall work of the application and therefore decrease the overall performance of the system because measuring and monitoring performance, and communicating the results for storing in a common location, requires system resources. In some cases this negative impact on performance can cause the results of a particular performance measurement to appear worse than it otherwise would and even to fail to meet a predetermine performance criteria that, in the absence of monitoring, would have been met.
Moreover, measuring and monitoring system performance can be particularly difficult in a distributed system in which a range of different distributed applications may be provided each of which may need performance information to be captured in a different way and/or each of which may be implemented using a different software platform/framework.
Accordingly, preferred embodiments of the present invention aim to provide methods and apparatus which overcome or at least alleviate one or more of the above issues.
In one aspect of the invention there is provided apparatus for monitoring conditions prevailing in a distributed system in which at least one client application is provided for access by a client device, the apparatus comprising: a client application environment in which the at least one client application and a measurement logging entity are provided; wherein the measurement logging entity comprises: an interface via which the measurement logging entity can receive, from each client application, measurement data representing a respective measure of performance for that client application; means for determining that said measurement data should be logged remotely from the local environment; and means for queuing, in a message queue, a message comprising said measurement data for sending to a primary logging environment for logging in a measurement database when said determining means determines that said measurement data should be logged remotely from the local environment, wherein said message comprising said measurement data in said queue has associated therewith a time-to-live parameter setting a time period for which said message is to be retained in said queue; wherein said means for queuing is configured: (a) to send said message comprising said measurement data, from said message queue, to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application and said time-to-live parameter associated with that message has not expired; and (b) to remove, from said message queue, said message comprising said measurement data, without sending said message comprising said measurement data, when said time-to-live parameter associated with that message expires before that message would otherwise be sent.
The time-to-live parameter may be a message queue specific time-to-live parameter that may be common to all messages queued in said message queue or may be a message specific time-to-live parameter that may be respectively configurable for each message comprising said measurement data.
Where the time-to-live parameter may be a message specific time-to-live parameter the message queue may have associated therewith a message queue specific time-to-live parameter that may be common to all messages queued in said message queue and wherein said means for queuing may be configured: to remove, from said message queue, each message comprising said measurement data for which a message specific time-to-live parameter has not been configured, without sending that message comprising said measurement data for which a message specific time- to-live parameter has not been configured, when said message queue specific time-to-live parameter associated with that message expires before that message has been sent; and to remove, from said message queue, each message comprising said measurement data and for which a message specific time-to-live parameter has been configured, without sending that message comprising said measurement data for which a message specific time-to-live parameter has been configured, when said message specific time-to-live parameter configured for that message expires before that message has been sent regardless of a time period set by the message queue specific time-to-live parameter.
A message specific time-to-live parameter may be adapted to be configured, on a message by message basis, by the at least one client application.
The message queue may have associated therewith a transmission order parameter which may be set: to a value that indicates that messages in said message queue should be sent in a first-in-first-out (FIFO) order; or to a value that indicates that messages in said message queue should be sent in a last-in-first-out (LIFO) order; and wherein said means for queuing may be configured to send said messages in a FIFO order or LIFO order depending on the value of said parameter. The transmission order parameter may be adapted to be configured and reconfigured by the at least one client application.
The measurement logging entity may be adapted to receive via said interface, respective measurement data for a plurality of related operations or groups of operations; wherein respective measurement data for each of the plurality of related operations or groups of operations may be provided in association with a shared key that may be common to said related operations or groups of operations.
The means for queuing may be operable to queue, in a single message in said message queue, the respective measurement data for each of the plurality of related operations or groups of operations that share said shared key.
The measurement logging entity may be adapted to receive via said interface, respective measurement data for at least one of a plurality of related operations or groups of operations performed across the distributed system; wherein the respective measurement data for the at least one of the plurality of related operations or groups of operations may be provided in association with a shared key that may be common to said related operations or groups of operations.
The means for queuing may be operable to queue, in said message queue, at least one message comprising said received measurement data for at least one of a plurality of related operations or groups of operations performed across the distributed system, wherein the at least one message comprising the received measurement data for at least one of a plurality of related operations or groups of operations performed across the distributed system may further comprise a shared key.
The measurement logging entity may be operable to provide for each message comprising said measurement data (e.g. in the message or in a separate message), associated context information which may comprise at least one context parameter representing a condition prevailing in the distributed system at a time that the measurement data was acquired.
The apparatus may further comprise means for requesting a prediction of a measure of performance based on current conditions prevailing in the distributed system and for receiving, responsive to said request, a prediction of a measure of performance based on the previously provided measurement data the associated context information for the time that the measurement data was required.
The measurement logging entity may comprise a first (local') measurement logging part and a second (remote') measurement logging part wherein: the first measurement logging part may comprise means for receiving measurement data via said interface, said means for determining that said measurement data should be logged remotely from the local environment, means for generating said message comprising said measurement data, and/or means for sending the generated message to the second measurement logging part; and the second measurement logging part may comprise means for receiving said generated message from said first measurement logging part, and/or said means for queuing said message.
The determining means may be configured for determining whether said measurement data should be logged remotely from the local environment or logged within the local environment.
The apparatus my comprise means for logging said measurement data locally when it is determined that said measurement data should be logged within the local environment.
The measurement logging entity may be configured to receive, from a client application, an indication that said measurement data should be logged locally and said determining means may be configured for determining that said measurement data should be logged, for that client application, within the local environment responsive to receipt of said indication that said measurement data should be logged locally.
The measurement logging entity may be configured to receive, from a client application, an indication that remote logging of said measurement data should be suspended and said determining means may be configured for determining that said measurement data, for that client application, should be logged within the local environment responsive to receipt of said indication that remote logging of said measurement data should be suspended.
The measurement logging entity may be configured to receive, from a client application, an indication that logging of said measurement data should cease and/or to disable logging of measurement data for that client application responsive to receipt of said indication that logging of said measurement data.
The queuing means may be configured to send said message comprising said measurement data to said primary logging environment via an interface that may be independent of a software platform or framework used to provide said client application.
The interface that may be independent of a software platform or framework may be a uniform application programing interface (API).
The API may be a representational state transfer (REST) service (RS) API.
The at least one client application may comprise a plurality of client applications and the measurement logging entity may be configured to receive respective measurement data from each said client application.
The apparatus may further comprise a plurality of further client application environments, each further client application environment comprising a respective measurement logging entity.
The apparatus may comprise the primary logging environment. The primary logging environment may comprise means for receiving said message comprising measurement data from said queuing means and/or means for logging said measurement data accordingly.
The receiving means of said primary logging environment may be configured to receive a message comprising measurement data from the respective measurement logging entity of each of a plurality of client application environments.
The receiving means of said primary logging environment may be configured to receive a plurality of messages comprising measurement data for a plurality of related operations or groups of operations performed across the distributed system, wherein each of said plurality of messages may comprise a shared key that may be common to said related operations or groups of operations, and wherein said means for logging said measurement data may be operable log said measurement data provided in said plurality of messages in association with said shared key.
The receiving means of said primary logging environment may be configured to receive with each message comprising said measurement data, associated context information comprising at least one context parameter representing a condition prevailing in the distributed system at a time the measurement data was acquired.
The apparatus may further comprise means for receiving a request for a prediction of a measure of performance based on current conditions prevailing in the distributed system and for determining, responsive to said request, a prediction of a measure of performance based on previously logged measurement data and the associated context information for the time that the measurement data was acquired.
The primary logging environment may further comprise a viewer entity for generating a visual display of stored measurement data.
The viewer entity may be configured to provide an alert when said measurement data indicates that a predetermined criterion has been, or is about to be, met.
The means for queuing may be configured to operate a processing thread having a lower priority than a processing thread that the at least one client application uses whereby said message comprising said measurement data may be sent to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application.
The means for queuing may comprise a scheduler that uses a background processing thread to process each message added to said message queue wherein said processing thread may have a lower scheduling priority than that of a general execution thread used by the at least one application whereby said message comprising said measurement data may be sent to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application.
The means for queuing may be configured to operate said message queue as a first in first out (FIFO) message queue.
The apparatus may be configured for use in a mobile execution environment.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting community health and/or social care services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting mobile health and/or social care services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services in a care recipient's home.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services in a residential care home.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services of an urgent and unplanned nature.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting mental health and/or social care services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting palliative, hospice or end of life health and/or social care services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services, for those with learning disabilities, in a school and/or care home.
The apparatus may be configured for monitoring conditions prevailing in a distributed system to track response times of external integrations.
The apparatus may be configured for monitoring conditions prevailing in a distributed system to track internal response times of subroutines and/or data retrieval.
The apparatus may be configured for monitoring conditions prevailing in a distributed system to track user decision making speed.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting employer services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system by means of centralised statistics depository for performance measurements across a range of said employer services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for providing at least one of financial and accounting employer services, human resources employer services, payroll employer services, procurement employer services, document management employer services, supply chain management employer services, business analytics employer services, and business intelligence employer services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting employer services in the public service sector.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting employer services in the private sector.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting employer services in the not-for-profit or voluntary sector.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting managed services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting managed services comprising cloud computing services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting managed services comprising data centre services.
The apparatus may be configured for monitoring conditions prevailing in a distributed system for supporting electronic learning services.
According to one aspect of the present invention there is provided an application configured to operate as the client application of the apparatus, the application comprising: means for configuring said client application to perform a measurement of performance for the client application; means for performing a measurement of performance configured by said configuring means; and means for passing at least one result of the measurement of performance performed by said measurement performing means, as at least part of said measurement data, to said measurement logging entity for logging in association with said time-to-live parameter.
The configuring means may be operable to configure at least one start point and at least one end point for said measurement of performance.
The at least one result may comprise an elapsed time beginning at said at least one start point and ending at said at least one end point.
The results passing means may be configured for passing said result with associated metadata relating to said measurement, as at least part of said measurement data, to said measurement logging entity for logging.
The metadata may comprise at least one of: information identifying the client application to which the measurement data relates; information identifying an operation or group of operations for which the measurement was performed; information identifying a time at which the measurement was performed; and/or information indicating an approximate magnitude of the operation or group of operations to which the measurement relates.
The metadata may comprise information for identifying a data type of said measurement data (e.g. string, numeric, date and/or time related).
According to one aspect of the present invention there is provided a method for monitoring conditions prevailing in a distributed system in which at least one client application is provided for access by a client device, the method comprising: a logging entity: receiving via an interface via, from a client application, measurement data representing a respective measure of performance for that client application; determining that said measurement data should be logged remotely from the local environment; and queuing, in a message queue, a message comprising said measurement data for sending to a primary logging environment for logging in a measurement database when said determining step determines that said measurement data should be logged remotely from the local environment, wherein said message comprising said measurement data in said queue has associated therewith a time-to-live parameter setting a time period for which said message is to be retained in said queue; and sending said message comprising said measurement data, from said queue, to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application, and said time-to-live parameter associated with that message has not expired; or removing, from said message queue, said message comprising said measurement data, without sending said message comprising said measurement data, when said time-to-live parameter associated with that message expires before that message would otherwise be sent.
According to one aspect of the present invention there is provided a method performed by a client application configured to operate as part of the apparatus, the method comprising: performing a measurement of performance in accordance with a measurement configuration; and passing at least one result of the measurement of performance, as at least part of said measurement data, to said measurement logging entity for logging in association with said time-to-live parameter.
According to one aspect of the present invention there is provided a computer program product comprising computer implementable instructions which, when executed on a computer processing apparatus, cause said computer processing apparatus to become configured as an apparatus as referred to earlier or as an application referred to earlier.
According to one aspect of the present invention there is provided a computer program product comprising computer implementable instructions which, when executed on a computer processing apparatus, cause said computer processing apparatus to perform a method referred to earlier.
Aspects of the invention extend to computer program products such as computer readable storage media having instructions stored thereon which are operable to program a programmable processor to carry out a method as described in the aspects and possibilities set out above or recited in the claims and/or to program a suitably adapted computer to provide the apparatus recited in any of the claims.
Each feature disclosed in this specification (which term includes the claims) and/or shown in the drawings may be incorporated in the invention independently (or in combination with) any other disclosed andlor illustrated features. In particular but without limitation the features of any of the claims dependent from a particular independent claim may be introduced into that independent claim in any combination or individually.
Embodiments of the invention will now be described by way of example only with reference to the attached figures in which: Figure 1 schematically illustrates a distributed system; Figure 2(a) shows a simplified flow chart illustrating typical steps performed by the client application to record a result of a performance measurement; Figure 2(b) shows a simplified flow chart illustrating typical steps performed by the client application to record a result of performance measurements forming part of a long running or complex event; Figure 3 shows a simplified sequence diagram illustrating typical steps performed by various entities to transfer a result of a performance measurement; and Figure 4 shows a simplified sequence diagram illustrating typical steps performed by a statistics runner to queue and transfer a result of a performance measurement to a statistics server.
Overview Figure 1 schematically illustrates a distributed system 10 comprising a server environment 12 and a number of client entities 14-1, 14-2, 14-3.
The server environment 12 comprises, a statistics server entity 16, a database server entity 18, a viewer entity 20, a number of distinct client application containers' 22-1, 22-2, 22-3, and a browser entity 23. Each client application container 22 comprises a respective execution environment in which an associated client application 24-1, 24- 2, 24-3 may run, on behalf of a client entity 14. In the present example! each client application container 22 is implemented on a different physical machine to the statistics server entity 16. It will be appreciated, however, that each client application container 22 may be implemented on a virtual machine in a common location with the statistics server entity 16 (e.g. on a common physical machine).
Each client application 24, in this example runs on a different respective software platform/framework. In this example, one client application 24-1 runs on a Java software platform, one client application 24-2 runs on a.Net software platform, and the other client application 24-3 runs on a different software platform. The other software platform may, for example, comprise an embedded database system where the execution logic is contained within stored procedures.
Each of the client applications 24 is also provided with a statistics service 26-1, 26-2, 26-3 which can be used by the client application 24 to record performance related measurement data. Each statistics service 26 comprises a respective local performance logger 30-1, 30-2, 30-3 for logging performance statistics locally and to isolate the calling client application 24 from the specifics of working with the remote logging process, and a remote performance logger 32-1, 32-2, 32-3 for logging performance statistics remotely via the statistics server entity 16. Accordingly, each statistics service 26 effectively provides a gateway to the statistics server entity 16 for any application deployed in the same container 22.
To aid each client application 24 to capture performance statistics with a minimum impact on wider system performance, a dedicated application programming interface (API) is provided between the client application 24 and the statistics service 26. This allows the client application 24 to log statistics virtually' without direct involvement in the actual delivery of the statistics to the statistics server entity 16, and hence without any undue overhead in process of persisting the measurement data. In the case of the application 24-1 running on a Java platform, for example, the API may be provided by means of a small Java class and, as those skilled in the art will appreciat, the respective API for each other software platform may be provided in an appropriate manner for that software platform Beneficially, in order to measure the perfomiance statistics, each client application 24 is configurable (and reconfigurable) with a number of checkpoints' for triggering the start and the stop of corresponding performance measurements. A client application 24 may, for example, have a start' checkpoint and an end' checkpoint configured at the respective start and end of a particular operation or set of operations for which performance statistics are required. In this example, therefore, a timer can be initiated at the start checkpoint and terminated at the end checkpoint thereby giving a measure of the time period taken for the operation or set of operations to complete.
The resulting time period represents measurement data which can then be logged appropriately. Similarly, a plurality of start' and end' checkpoint pairs can be configured for starting and stopping an individual timer in order to produce a measured cumulative time period representing a sum of the time periods between each respective pair of start and stop checkpoints. Similarly, a plurality of start' and end' checkpoint pairs can be configured each pair configured for starting and stopping a different respective timer to produce a corresponding measured time period.
The ability to configure (and reconfigure) the start and end checkpoints may be provided using any suitable means, for example by means of a class library that could be added to an application so that the application can start and stop the relative checkpoints that need to be measured.
Measurement data is typically logged with associated metadata providing additional information about the measurement. The metadata typically comprises, for example: information identifying the client application to which the measurement data relates; information identifying the operation or group of operations for which the measurement was performed; information identifying the time at which the measurement was performed; information indicating an approximate magnitude of the operation or group of operations to which the measurement relates (e.g. against which to correlate a measured time period); and/or other such data.
In the case of the information indicating an approximate magnitude of the operation or group of operations this may, for example, be representative of the amount of data/information that requires processing in a particular operation or group of operations (e.g. the number of rows of information that need processing), the number of separate operations in a set of one or more operations and/or the like. By way of illustration, if a measured time period is used to provide an indication of the length of time it is taking to process a particular data set of variable length, then metadata indicating an approximate magnitude of the operation can be used to provide an indication of the length of the data set thereby allowing this to be taken into account during analysis of a number of measured time periods for processing data sets of different lengths.
The metadata may also be self-describing' of the data to which it relates, the metadata comprising, for example, a definition of the type of data to which it relates to allow the storage of any type of statistical data, be it numeric, string, date or time related.
The performance statistics measured by the application can, thus, reflect any performance measure that the application is configured to log and any number of additional attributes can be included with the measurement data that represents the performance statistic.
Each local performance logger 30 is able to record the performance related measurement data locally (e.g. in a plain text file using appropriate delimiters such as commas or tabs (or flat file')) or to send it to the remote performance logger 32 for remote persistence via the database server entity 18. The location (local or remote) to which the performance measurements are sent is configured externally to the client application 24 and can be flipped during execution if required.
The performance loggers 30, 32 of each statistic service 26 are arranged to receive requests to log measurement data from the associated application 24 and to place the request into a respective asynchronous pooled queue 33-1, 33-2, 33-3 so that the client application 24, for which performance is being measured, can continue execution without any further overhead associated with capturing the statistic.
Client applications 24 are also provided with the ability to instruct the local performance logger 30 to initiate a suspension of sending measurement result messages to the statistics server entity 16 during which the measurement result messages are simply queued up for sending later. At some later point the client application 30 can instruct the release of the queued measurement result messages thereby allowing them to be sent to the server. This facility can be used beneficially when a client application 24 requires as little impact on the performance as possible.
Client applications 24 are also provided with the ability to turn off measurement logging capability completely for that application. Client applications 24 are further provided with the ability to instruct the local performance logger 30 to log some or all measurement results to a local file system without queuing them for sending to the statistics server entity 16 or to log some or all measurement results to the statistics server entity 16 via the remote performance logger 32.
It can be seen, therefore, that this represents an asynchronous messaging system that can take the measured performance statistics from the application 24 and add them to a separate queue 33. It is then this queue 33 that is responsible for sending the measured performance statistics data to the statistics server entity 16 for persistence at a time when the act of sending the data will not impact the execution of the application (or when any such impact will be minimised).
Beneficially, the remote performance logger 32 is operable to send measurement data from the asynchronous queue 33 at times when the transfer of the measurement data will have a minimal impact on the performance of the wider system (e.g. when the client application to which the measurement data relates is not particularly busy).
The statistics server entity 16 comprises a service logger 34 for receiving measurement data, from each remote performance logger 32, for persistence to the database server entity 18. Advantageously, despite the differences in the different respective software platform/frameworks on which the client applications 24 are implemented, the distributed system 10 uses a uniform API between each local performance logger 32 and the service logger 34 on the statistics server entity 16. In this example, the uniform API comprises a representational state transfer (REST) service (RS) API which allows connection from applications using any language and application stack, via the hypertext transfer protocol (http), using appropriate request messages.
The service logger 34 is operable to log measurement data received from the remote performance loggers 32 to the database server 18. The service logger 34 also provides a number of enquiry functions to allow remote or local applications to interrogate the metadata for definitions associated with the stored measurement data.
The database server 18 comprises a relational database in which the measurement data and associated metadata is stored. The database server 18, in this example, utilises the so called Hibernate framework' although it will be appreciated that any suitable framework may be used to provide mapping to the relational database.
In this embodiment, the viewer entity 20 comprises a separate standalone application, running in its own execution environment, which provides a viewer for the statistics held in the relational database of the database server 18. The viewer entity is configured to allow an authorised user, after logging into the system, to review the current statistics, view appropriate statistical graphs, view trends, analyse the data, perform comparisons or the like. Any metadata stored with the measurement data can be extracted and used, for the purposes of reviewing the statistics, to inform an authorised user of particular pertinent information relating to a measurement and/or to perform secondary analysis/manipulation on the measurement data (e.g. to present measured data on a graph against the time at which the data was collected, to normalise measured time periods against data set size, etc.).
The viewer entity 20, in this embodiment, is connected the browser entity 23 via which the user can access the viewer entity 20 by means of a web browser or any other suitable viewer. This viewer entity 20 allows an authorised administrator to maintain the configuration of the system and monitor the overall statistics being logged into the database by means of the browser entity 23. In this example, the service logger 34 is shown as being deployed, as part of the statistics server entity 16, in conjunction with the viewer entity 20. It will be appreciated, however, that the viewer entity 20 and statistic server entity 16 may be deployed in isolation from one another.
It can be seen, therefore, that the proposed methods and apparatus for monitoring conditions prevailing in a distributed system provides a flexible way of capturing and monitoring performance statistics relating to the execution of various tasks by distributed client applications thereby aiding in the tracking of system performance measurements against appropriate criteria (e.g. criteria agreed by service level agreement) with minimal impact on the operation of the underlying client applications and hence on the operation of the wider system.
For example, having a central statistics server entity 16, and database server 18, helps to ensure that when there is a need to analyse statistics the resulting measurement results are all held in a common location even if individual components doing the logging are deployed to multiple disparate systems (possibly at different geographic locations).
Of particular benefit is the asynchronous queuing mechanism at the client application side that helps to ensure that the measuring of a particular performance statistic and the logging of that performance statistic has a relatively small impact on the performance of that client application as possible.
The ability to provide metadata / self-describing together with the statistics to which the metadata relates is particularly beneficial because it provides additional flexibility for suitable measurements to be defined, at the client application side, without significant reconfiguring at the statistics server entity side. The ability to provide metadata I self-describing together with the statistics to which the metadata also allows additional information to be stored that allows improved comparison of one set of measurement data with another. For instance, storing information indicating an approximate magnitude of the operation or group of operations to which a particular measured time period relates allows a better comparison of the measured time period with other measured time periods for an operation, or group of operations, with a different magnitude.
Measurement and Recordal Procedure -Client Application A procedure employed by a client application 24 to log a particular performance measurement result will now be described! by way of example only, with reference to Figure 2(a) which shows a simplified flow chart illustrating typical steps performed by the client application 24 to record the result of the performance measurement.
In this example, the client application is a web application that is configured to store server side timings for responding to key screens. The client application 24 is a servlet based application that responds to browser interactions by processing the normal http get' and post' commands.
In this example the client application is configured to store server side timings for a number of the key screens that have been defined as a measurable criterion for assessing client application I system performance.
When a measurement is to be performed the client application 24 first creates an instance of the local performance logger 30 by instigating an associated call at 3210.
This effectively extends the basic underlying statistics measuring capability and supports this underlying capability by providing a shortcut for the client application 24 to store' measured performance data (e.g. elapsed time statistics) locally by passing captured measurement data to the local performance logger 30. It is this local performance logger 30, rather than the client application itself, that determines how to best to log the performance statistics.
After creating the instance of the performance logging statistics service 26, the client application 24 identifies the current time at S212 immediately before starting the unit of work (i.e. the operation or group of operations) which is to be timed atS2l4. Once the operation or group of operations which is to be timed is completed, the client application 24 once again identifies the current time at S216.
AtS218, the client application calls the local performance logger 30 in order to initiate generation, locally, of an appropriate statistic record for the unit of work that has just been completed including any associated metadata as required for that measurement. The statistics record includes, in this example: information identifying the client application 24 to which the measurement data relates; information identifying the functionlunit of work being performed; information indicating an approximate magnitude of the operation or group of operations to which the measurement relates (such as a measure of the size of the dataset being processed, for example a measure of the number of bytes, rows, columns, pages, or the like that require processing); information identifying the time at which the measurement was performed (e.g. in the form of a timestamp or the like); and/or any other such metadata data that the client application has been configured to log for the specific measurement being carried out and/or the specific client application 24 performing the measurement.
As far as the client application 24 is concerned this is all that is necessary to measure a performance statistic and persist the results, via the statistics server entity 16, at the database server entity 18. The location and configuration of the statistics server entity 24 are externalised from the client application 24 itself.
Measurement and Recordal Procedure Long Running! Complex Events It will be appreciated that there may be a need to log performance information for events comprising a plurality of operations, or a plurality of groups of operations, where each operation or operation group is performed by a different respective component at the client side. Such events may, for example be a long running or complex' task with multiple related information messages needed to be considered, from the peispective of server, as a single event.
In this scenario the client 24 can create an event message with a unique key, and a plurality of client side components can add additional logging information to this event message via the key. Once all components have completed their operation(s) (which may be over several hours or even days) then completion of the event can be notified to the local performance logger 30, at which point the performance logger 30, 32 can schedule the event message for transfer to the server 16.
A procedure employed by a client application 24 to log performance measurement information for such events will now be described, by way of example only, with reference to Figure 2(b) which shows a simplified flow chart illustrating typical steps performed by the client application 24 to record the results of the performance measurement for the event.
In this example the client application 24 is configured to store server side timings for a number of operations, performed by different components (e.g. library functions), that have been defined as a measurable criterion for assessing client application I system performance.
When a measurement is to be performed the client application 24 first creates an instance of the local performance logger 30 for logging a long running I complex event, by instigating an associated call at S220. This effectively extends the basic underlying statistics measuring capability and supports this underlying capability by providing a shortcut for the client application 24 to store' measured performance data (e.g. elapsed time statistics), for an event comprising operations performed by different local components, locally by passing captured measurement data to the local performance logger 30. It is this local performance logger 30, rather than the client application itself, that determines how to best to log the performance statistics.
After creating the instance of the performance logging statistics service 26, a locally unique key is generated for the event at S221 by the client. In this example, the key need not be unique across all systems but is instead locally' unique (unique to the client application 24) and can therefore be anything that makes sense to that client application 24. The key could, for example, be the next sequence number in a locally held incremental set of sequence numbers, could be timestamp based, or the like.
The client application 24 identifies the current time at S222 immediately before the first component that contributes to performance of the event initiates a first thread of the event at S230-1 and starts to perform the first operation or first group of operations of the event which is to be timed at S224-1. Once the first operation or first group of operations of the event for which measurement data is to be logged is completed, the client application 24 once again identifies the current time at S226-1.
At S228-1, the client application 24 (or the first component completing processing an event thread) then calls the local performance logger 30 in order to initiate generation, locally, of an appropriate event related statistic record (an event message') for the event that is in progress. This event related record is associated with the locally unique key and includes, in addition to the locally unique key, any associated metadata as required for that measurement. The statistics record includes, in this example: information identifying the client application 24 / component to which the measurement data relates; information identifying the function/unit of work being performed; information indicating an approximate magnitude of the operation or group of operations to which the measurement relates (such as a measure of the size of the dataset being processed, for example a measure of the number of bytes, rows, columns, pages, or the like that require processing); information identifying the time at which the measurement was performed (e.g. in the form of a timestamp or the like); and/or any other such metadata data that the client application 24 / component has been configured to log for the specific event being processed and/or the specific client application 24 / component performing that aspect of the event.
A further event thread may be initiated in parallel with (or sequentially after) execution of the first thread by a further component. In the diagram this further event thread is shown as being initiated, independently of the first event thread, a short while after the first event thread starts. It will be appreciated, however, that a further event thread may be initiated at any time including substantially simultaneously with the first event thread. The further event thread may also be initiated dependent on completion of an operation of the first event thread rather than automatically in isolation. A further operation or further group of operations of the event is then performed by the further component at S224-2. Once the further operation or further group of operations of the event for which measurement data is to be logged is completed, the client application 24 once again identifies the current time at S226-2.
At S228-2, the client application 24 (or the component processing the event) then calls the local performance logger 30 in order to update the event related statistic record for the event that is in progress using the locally unique key to identify the correct event message. The statistics record is updated, in this example, with: further information identifying the component to which the measurement data relates; further information identifying the function/unit of work being performed; information indicating an approximate magnitude of the operation or group of operations to which the measurement relates (such as a measure of the size of the dataset being processed, for example a measure of the number of bytes, rows, columns, pages, or the like that require processing); information identifying the time at which the measurement was performed (e.g. in the form of a timestamp or the like); and/or any other such metadata data that the client application 24 I component has been configured to log for the specific event being processed and/or the specific client application 24 / component performing that aspect of the event.
In a similar manner, any number of further event threads may be initiated and performance data logged as illustrated by the dashed line 250.
If, at 3229, it is determined that there are no more event threads to be completed, then the client application 24 indicates to the performance logger 30, at 3232, that the event has completed and the event message may be logged appropriately (e.g. at a timing that does not affect overall system performance).
In this example, the client application 24 may be executed on a mobile device and the event may be initiated when a user of the mobile device initiates a process for which logging is required (e.g. the time at which the process is started needs to be logged) whilst there is no network coverage. At some convenient point, following initiation of the event being processed, information created as a result of the process being performed could be sent to the server, for example, after network connectivity is re-established. In this situation there may be two distinct pads to the event: a pad in which the event is created and associated operations performed (e.g. which generate the information to be sent to the server) and a part in which the information generated by performing the event is dispatched to the server. In this case, there are at least two components that contribute to performance of the overall event, one component that creates the event and performs the associated operations and one component for delivery of the information generated by the operations that go toward making up the event. These components may be different library functions that work on a common message packet. It will be appreciated that any number of components can add additional message lines to an overall statistical record for that event as long as each component uses the same locally unique key.
Measurement and Recordal Procedure -Correlated Events It will be appreciated that, whilst the above procedure is particularly beneficial and efficient for logging performance information for events comprising a plurality of operations where the operations are all performed by different components of the same client application 24, there are situations in which the above procedure may not be appropriate.
For example, where a number of related operations I multi-operation events (correlated events') are performed via a plurality of different client entities (e.g. a mobile device, a personal computer and/or the like) it may, nevertheless, be useful to be able to log performance information for all the related operations / multi-operation events in a single record or at least in a correlated manner such that a user can visualise, and analyse, the performance information for all the related operations / multi-operation events together.
Similarly, related operations / multi-operation events (correlated events') may continue to occur at discrete intervals with natural breaks between them possibly without a natural definable endpoint. In this case it may, nevertheless, be beneficial for a user to be able to interrogate that statistics server 16 to visualise, and analyse, the performance information, for some or all the correlated events that have occurred prior to the interrogation, together in a correlated manner.
In order to achieve this, the client application 24 is beneficially able to associate a respective globally unique identifier (GUID') with the performance measurements arising from each respective set of correlated events. This allows the client application 24 to send the correlated events via the performance logger 30 to the statistics server 16 for logging in association with the GUID. The GUID may globally unique by virtue of being a randomly selected number that is so large that there is a negligible probability of the same GUID being selected by another application.
Alternatively, the GUID may be rendered globally unique by having an application specific portion that is known to be globally unique to that application and another portion that is randomly selected by the application from a set of possible values that are known, by the application, not to be in current use for events that it initiated.
The data for the performance measurements for these correlated events can thus be sent with relatively long time gaps between them but the statistics server 16 is still able to correlate them in the database using the GUID.
In normal operation, therefore, for a multi-process workflow initiated at one client application 24, a thread of work may start when a user of that client application 24 starts the first process of the multi-process workflow. The client application 24 then generates a GUID and, at an appropriate juncture, when the current operation or set of operations being performed in pursuit of the multi-process workflow by that client application 24 are completed (e.g. pending action by another user via a different client application and/or via a different client entity), a performance measurement message is created and sent to the statistics server 16 with some performance statistics and the associated client application generated GUID as described above.
The GUID is treated as one of the properties of each process making up the multi-process workflow. Hence, as the workflow is continued and work is handed over to other processes (potentially performed by different client applications, via different client entities and/or via different servers), the GUID generated by the client application that initiated the workflow is handed over to the next component performing the next part of the workflow as one of the properties of that process. This allows the next component performing part of the workflow to send additional performance data to the statistics server 16 and have it linked to the data stored from the first performance measurement message by virtue of the GUID that is passed between the processes.
Thus, for an end-to-end multi-process workflow, where natural breaks occur in the processing time, the above use of GUIDs allows correlated logs to be formed that allows a user to visualise, and analyse, all the correlated events for a single multi-process workflow.
Where different client entities (e.g. a mobile device and a personal computer) are involved in performing related operations or events, one of the entities (e.g. the mobile device) can create the GUID and send performance measurement messages for logging, the GUID may be passed to a server that provides the client application to the mobile device, this server may carry out more work and send additional performance measurement messages to the statistics server 16 for logging in association with the GUID. Performance messages for related operations or events that are carried out via a different client entity (e.g. a personal computer) may similarly be logged in association with the GUID. Hence, a user visualising or analysing an event log can see events from the PC based interfaces in line with actions performed on the mobile devices.
Transfer of Measurement Data -Client Application 9 Statistics Service A procedure for transferring measurement data from the client application 24 to the asynchronous queue 33 for log a particular performance measurement result will now be described in more detail, by way of example only, with reference to Figure 3, which shows a simplified sequence diagram illustrating typical steps performed by the various entities to transfer the result of the performance measurement.
The procedure starts, at S31 0, when the client application 24 initiates the logging of a measurement result by calling the local performance logger 30 (e.g. at S218 in Figure 2). The local performance logger 30 generates a measurement record comprising the measurement result and associated metadata and decides whether to store the information locally (e.g. in a flat file) or to send it to the remote performance logger 32 for remote persistence via the statistics server entity 16 and the database server entity 18.
The performance logger 30 determines whether the results should be stored locally or remotely. If, as shown in Figure 3, the measurements are to be stored remotely then a measurement result message comprising the measurement result(s) and any metadata is passed to the remote performance logger 32 at S312 where the measurement result message is passed to a so called statistics runner' which comprises the measurement queue 33 at S314 onto which the measurement result message is placed until it is sent to the statistics server entity 16.
The statistics runner is, in effect, a separate scheduler (or scheduler class') that uses a single background processing thread to perform the processing of each measurement result message added to it. It queues the incoming messages and processes them sequentially in a first in first out (FIFO) order. Advantageously, the statistics runner uses a processing thread that has deliberately had its scheduling priority lowered below that of general execution threads (e.g. those used by the client application 24) to help ensure that it yields to higher priority processing threads. The operating system will thus schedule this thread for execution when higher priority threads are not processing data and when the transmission of a measurement result message will therefore have minimal impact on performance.
Advantageously, notwithstanding the normal' FIFO operation of the statistics runner / message queue 33, the statistics runner / message queue 33 can be configured, when appropriate, to send the measurement result messages in reverse order (i.e. last in first out -LIFO'). This LIFO/FIFO configuration is initially set (usually to FIFO) when the queue 33 is instantiated. However, in a particularly beneficial embodiment a client application is provided with the ability to reconfigure this LIFO/FIFO configuration dynamically to allow it to be changed on the fly.
During normal operation, for example, the order does not matter as performance measurements can be logged almost simultaneously with them being created.
However, if the client becomes partitioned from the statistics server 16 for a period of time (e.g. because of a loss of network coverage to a mobile communication device) then the backlog of measurement result messages may become large. In this scenario, once connectivity is established again, the statistics runner / message queue 33 can be configured to send the performance measurements sent to the statistics server 16 such that the most recent performance measurement is sent first (e.g. because it is now considered the most important).
Moreover, each measurement result message placed on the message queue 33 can be associated, by a client application, with a message specific time-to-live' parameter which sets a period of time that the measurement result message should stay in the message queue before being ignored and discarded in the event that the measurement result message has not been sent to the server within the time to live period. A default time time-to-live parameter may be initially set when the queue 33 is instantiated. This default time-to-live parameter may be applied for all messages that a message specific time-to-live has not been set for.
This allows a client application the flexibility to attribute relatively short message specific time to live periods to performance data messages for events that are only relevant for a relatively short period of time and therefore become redundant (and hence do not need to be persisted) if the client becomes isolated from the server for a longer period.
Statistics Runner The statistics runner 33 will now be described in more detail, by way of example only, with reference to Figure 4, which shows a simplified sequence diagram illustrating typical steps performed by the statistics runner 33 to queue and transfer the result of the performance measurement to the statistics server 16.
The statistics runner 33 creates a performance measurement storage object (termed StatisticsVO') 40 to hold details of the performance measurement (5410). The statistics runner 33 then populates this object 40 with at least some of the key attributes which apply to any performance measurement, including, for example, a group name (in the example set to PERF') (at S412) and a type name (in the example set to SUMMARY') (at S414) identifying a type of measurement to which the measurement data relates. A map object (termed HashMap') 42 is then created to hold all specific attributes associated with that measurement type (e.g. the measurement associated metadata), such as: information identifying the client application to which the measurement data relates (e.g. at 3416); information identifying the function/unit of work being performed (e.g. at 3418); information indicating an approximate magnitude of the operation or group of operations to which the measurement relates (e.g. a row count as illustrated at 3420); information indicating the measured elapsed time (e.g. 3421); and information identifying the time at which the measurement was performed (e.g. at S422). The specific attributes associated with that measurement type are held along with additional details associated with the captured measurement data (at 3424). The map object 42 is then passed to the performance measurement storage object 40 (at 3426).
The statistics runner 33 is now ready to send this statistics across the associated API to the service logger 34 for persistence into the database of the database server 18.
To perform this action the statistics runner 33 gets a client connection 44 (referred to as Statistics Client' in this example) to the statistics server entity 16 to instantiate (at 3428), by means of an associated factory class (in this example a REST Service client factory class), a helper entity 46 (or class', referred to as StatisticsRS' in this example) that will facilitate performance of the transfer service. This instance of the helper class 46 is then used to send the performance measurement storage object across the associated API to the service logger 34, via the hftp protocol, for persistence into the database of the database server 18 using an insert statistic service (at 3430). The helper entity 46 is essentially responsible for marshalling all the data in the performance measurement storage object 40 into the necessary http protocol format ready for transmission to the server 16 If the system detects errors during this process the system will attempt to retry and send the request again. After a number of attempts (e.g. three or any other appropriate number) the system will pause sending the messages to the service logger 34 for a predetermined length of time to help ensure that the processing thread used by the statistics runner does not waste resources attempting to send measurement reports when the server is probably not contactable.
At the statistics server entity end, the system waits for incoming requests to log measurement results and will process the insert statistic service by inserting the necessary records into the database of the database server 18. In this example, an Object Relational Mapping library called Hibernate is used to map the performance measurement object 40 into the necessary database table records.
Predictive Analysis In the above description, the historical performance data relating to operations and events performed in the distributed system 10 is gathered and logged by the service logger 34, on the statistics server 16, essentially as the operations/events happen.
In a further enhancement the statistics server 16 is further able to perform predictive analysis of a particular client application's behaviour based on the historical performance data, in order to determine likely future characteristics of conditions prevailing in the distributed system thereby allowing possible problems to be highlighted early and corrective or preventative action to be taken if necessary.
This enhanced capability utilises a predictive model which is responsible for providing a prediction of the likely execution and response times for an operation, a group of operations, or multi-operation event, based on the context for the application performing the operation(s)/event and the historically collected performance data.
More specifically, the predictive model exploits patterns found in historical performance related data to predict the performance and response time of the client application (or a specific feature/functions of the application) performing a particular operation or group of operations.
The statistics server 16 performs the predictive analysis for a particular client application based on contextual information (referred to as the context'), relating to that client application. The context may comprise several performance related parameters which are configurable, for each monitored operation or group of operations, to allow different parameters to be taken into account for different performance metrics. The context may include, for example: one or more user contextual parameters (e.g. number demographics such as the number of users by type/profiles and/or a percentage of concurrent users); one or more application execution contextual parameters (e.g. server types and number andlor available resources such as processor (CPU) resources, memory, virtual memory, disk space etc.); and/or one or more application communication or access' contextual parameters (e.g. bandwidth, on-line/off-line, mobile/wireless /wifi/wired, browser version etc).
For example, the performance of a particular operation or group of operations may depend on a number of parameters including, but not restricted to: the processor (CPU) usage at the time of execution; the memory usage at the time of execution; the time of execution; the number of users connected to the system; network bandwidth; and network throughput. Depending on the type of application, operation, or group of operations, therefore, the performance/response times can be estimated based on one or more context parameters configured for that particular type of application, operation, or group of operations, to ensure that performance can be predicted accurately based on an appropriate parameter or set of parameters.
The predictive analysis is not performed until there is a sizeable quantum of historical performance related data stored for a particular client application. The size of this quantum itself is configurable to allow predictions to be based on an optimised quantum of data.
By way of example, it may be possible to provide an acceptably precise prediction of the response times for a particular operation, or group of operations, performed by a particular client application if there is historical data for 100 previous executions of that particular operation, or group of operations. Hence, the quantum for performing predictive analysis for that operation, or group of operations by that client application may be set to ensure that the predictive analysis is not performed until after 100 historical performance related measurement records are available.
To allow the predictive analysis to performed effectively, as the performance data representing a performance metric (e.g. a response time) for a particular operation/group of operations for which predictive analysis is to be performed, is collected, the performance data representing the performance metric is stored (e.g. after every successful execution of that operation or group of operations) in association with corresponding values for each parameter in the context at the time that operation or group of operations were executed.
With the context and the historical performance metric data in place, predictive analysis can be performed either at the explicit request of a user (e.g. by clicking a button) or implicitly in which case the predictive analysis is performed automatically and the user informed of any resulting prediction, for example of a response time, by any appropriate means (e.g. visually on screen or by email). A user can then decide whether to proceed with execution of an operation (or group of operations) or not based on the prediction.
In order to perform the predictive analysis, a statistical predictive' algorithm is used, which is responsible for performing a prediction of a particular performance metric (e.g. a response time). This statistical algorithm uses, as its inputs to arrive at the prediction, the historical performance data together with the associated historical contexts and the current values of the context parameters.
Where the current values of the context parameters match those of one or more records in the historical data, the statistical algorithm can predict the performance metric to be equal to (or an average of) the historical performance metric when a historic context was the same as the current context. If this is not the case, then the statistical algorithm applies a statistical logic to interpolate between historical performance metrics for contexts close to the current context.
There are a number of ways the predictive algorithm can be implemented. In a particularly beneficial example, however, the predictive algorithm is implemented using a model based approach.
Specifically, the predictive algorithm is configured to use,as an input, the historical performance data and context parameters, to generate (or, in effect, learn') a model that enables statistics server to predict the performance metric being predicted.
The generation of the model involves searching for, and establishing, meaningful relationships between one or more so called explanatory' variables (also referred to as predictors') which are defined by the context, and one or more so called response' variables which represent the performance metric (or metrics) being predicted.
By way of illustrative example, it is generally known that a response variable representing a performance metric comprising a response time may be dependent on (e.g. inversely proportional to) a number of different predictors including, inter a/ia: CPU Usage; virtual memory availability; and number of processors. However, the amount by which, and the manner in which, the response time is dependent on each predictor may vary depending on a number of factors.
The historical performance data is used to help generate a model representing the dependency between the predictor(s) and the response variable(s) using appropriate parametric or non-parametric statistical techniques and algorithms that will be familiar to those skilled in the art including, for example: (1) Naive Bayes Classifiers or Bayesian statist/cs Based on Bayes Theorem, Naive Bayes Classification or Bayesian statistical techniques may be used to treat the effect of each predictor, on each response variable independently. The independent, variable-by-variable, analysis, could then be aggregated to arrive at a final prediction.
(2) Support Vector Machine (SVM) One or more support vector machine supervised learning models may be employed to infer a relationship between the predictor(s) and response variable(s) based on an algorithm that uses the historical performance data and context parameters as training data and analyses that training data to recognise patterns within it.
(3) Poisson distribution A Poisson distribution may be used to determine the probability of an event occurring, based on a known average rate, to infer when the event may occur in the future and thereby predict a response variable such as response time.
(4) K-NN -Also referred to as the k-nearest neighbours algorithm A k-nearest neighbours algorithm is a non-parametric technique that may be used to infer the value of a response variable for a given set of predictors based on the historical performance data for a pre-defined number k of examples with the closest historical context to the current context.
Algorithms based on each of the above techniques (or a subset of them) may be provided as separate modules that can be used by the statistics server, as required, to generate and refine predictions of performance metrics. A user may also be provided with an option to select one or more of algorithms to provide the prediction of the performance metric. The results from the algorithms could thus be used to provide a prediction in the form of a range of estimated performance metrics (e.g. using different statistical techniques) or to provide a prediction comprising a single estimated value for the performance metric based on a plurality of different estimates (e.g. a mean or modal average of them).
Advantageously, the predictive algorithm may also be employed to optimise the platform used for the distributed system or part thereof to ensure optimised performance.
For example the following questions (amongst others) could be answered: * What would the performance metric(s) be the number of computer processors (CPUs) was increased and hence processor usage is brought down? * What would the performance metric(s) be if the number of concurrent users were increased? * What would the performance metric(s) be if the number of concurrent users is brought down by employing a load balancer? * What values of context parameters would result in a particular required performance metric being achieved? Thus, the distributed system could be optimised cost effectively by providing an optimum balance between the various context parameters (for example by balancing the number of computer processors and the number of allowed concurrent users) to achieve a desired performance metric.
The predictive algorithm could also be used to provide pro-active alerts to users in case the performance metric for a particular operation or group of operations could degrade in the future based on the prevailing contextual parameters (for example where one or more contextual parameters fails to meet a predetermined requirement e.g. by exceeding -or falling below -a predetermined threshold).
Additionally alerts may be issued when predictive values meet a user defined trigger value.
Predictive Analysis -Example An example of the beneficial application of predictive analysis in a distributed system will now be described, by way of example only.
In this example, a client application manages maintenance contracts. Such contracts are typically drawn up to provide maintenance services to the customer over a period of time, say 3 to 5 years. The customers typically make payments based on a charge frequency, which could be monthly, quarterly, half-yearly, yearly etc. When the contract is created, the system automatically creates milestones based on the charge frequency so that the service provider can raise invoices on the prescribed dates.
However, in this example, the creation of milestones with appropriate payment to be made at those milestones is a long drawn-out process due to a complex algorithm that is used to achieve this purpose. Typically, this could take anywhere between 25 to 50 minutes based on the server on which it is deployed.
In order to allow predictive analysis to be performed with a view to optimising the response time between initiating and completing the group of operations associated with generating the milestones and related information, the following context parameters are configured: 1. CPU usage 2. Virtual memory availability 3. The number of active processes at the time of execution 4. The number of processors In this example, the quantum of data required to initiate the predictive analysis is configured to be 100. Accordingly, the statistics server 16 will collect 100 historical records for the response time before performing the predictive analysis. The historical data collected could, for example, be in the format shown in Table 1.
Record Cpu Usage Virtual of # of Response Id (percentage) Memory Active processors time Availability Process (minutes) (GB)) 1 40 12 10 2 28.28 2 45 10 14 2 32.73 3 30 10 8 2 26.54 102 80% 5 22 2 46.13 Hba a2Q a Hi 38 Tab/el As can be seen from Table 1, there are 103 historical records that have been gathered for the group of operations related to calculating the milestones.
The statistics server thus has sufficient information to perform predictive analysis based on the prevailing context (i.e. the current available values of CPU usage, virtual memory (VM) availability, active process count and number of processors) to arrive at a predicted response time at that time.
It will be appreciated that the response time predicted could vary from the actual time taken (e.g. due to a change in context parameter values between the time a prediction is made and the time when an actual run happens).
This enhancement therefore advantageously allows the provision of a monitoring system that not only allows past and current conditions prevailing in the distributed system to be monitored but also allows potential future conditions prevailing in the distributed system to be predicted and appropriate optimisation to be performed to avoid problems if predicted performance metrics fail to meet requirements and/or to reduce energy consumption and costs associated with the distributed system should predicted performance metrics exceed requirements.
Summary
In one example of apparatus for monitoring conditions prevailing in a distributed system in which at least one client application is provided for access by a client device, the apparatus comprises: a client application environment in which the at least one client application and a measurement logging entity are provided; wherein the measurement logging entity comprises: an interface via which the measurement logging entity can receive, from each client application, measurement data representing a respective measure of performance for that client application; means for determining that said measurement data should be logged remotely from the local environment; and means for queuing, in a message queue, a message comprising said measurement data for sending to a primary logging environment for logging in a measurement database when said determining means determines that said measurement data should be logged remotely from the local environment; wherein said means for queuing is configured to send said message comprising said measurement data, from said message queue, to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application.
In one example of an application configured to operate as the client application of the apparatus, the application comprises: means for configuring said client application to perform a measurement of performance for the client application; means for performing a measurement of performance configured by said configuring means; and means for passing at least one result of the measurement of performance performed by said measurement performing means, as at least part of said measurement data, to said measurement logging entity for logging.
In one example of a method for monitoring conditions prevailing in a distributed system in which at least one client application is provided for access by a client device, the method comprise: a logging entity: receiving via an interface via, from a client application, measurement data representing a respective measure of performance for that client application; determining that said measurement data should be logged remotely from the local environment; and queuing, in a message queue, a message comprising said measurement data for sending to a primary logging environment for logging in a measurement database when said determining step determines that said measurement data should be logged remotely from the local environment; and sending said message comprising said measurement data, from said queue, to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application.
In one example of a method performed by a client application configured to operate as part of the apparatus, the method comprises: performing a measurement of performance in accordance with a measurement configuration; and passing at least one result of the measurement of performance, as at least part of said measurement data, to said measurement logging entity for logging.
Modifications and alternatives In the above embodiments, a number of software modules were described.
It will be appreciated that whilst the distributed system is described as comprising a number of distinct entities that may be distributed geographically, all elements of the distributed system may be implemented in a single apparatus, for example having a number of autonomous processes corresponding to the different entities that interact with one another by means of message passing. Similarly the client entities may be located on one or more client machines that are separate to a server machine on which the various server entities are provided.
Moreover, there can be multiple server destinations, one of which may be a central server (e.g. based at a central office or headquarters). A particular client application, or group of such applications, may log performance related information to a local server allowing the logged information to be reviewed when required. At the same time, however, other information (e.g. information of particularly high importance) can be logged to the central server for operational or maintenance purposes.
It will be appreciated that the service 26 may be shared between multiple client applications 24 residing or executing within the same container 22. This means that if multiple client applications 24 are deployed to a single container 22, then they all share the same single instance of the service 26 thereby providing additional benefits in terms of minimising the overall impact of the performance logging on the wider system.
It will be appreciated that a plurality of service loggers 34 may be provided on different virtual or physical machines each of which writes to the same database server 18. Generally there will only be one viewer entity 20 even if a plurality of loggers 34 are deployed (although a plurality of viewers is not precluded). Providing a plurality of deployments of the service logger 34 can beneficially be used to provide greater load balancing and concurrency of updates to the database server 18.
It will be appreciated that the viewer entity or other similar entity may be configured to provide automated (e.g. real-time) alerts in dependence on the statistics being logged in the relational database. An alert may be issued when the data being accumulated in the database indicates that a particular system performance related issue has arisen or is about to arise. For example, an alert may be issued when the performance data being logged indicates that latency (or similar parameter) in the system has exceeded -or is approaching -a predefined trigger level. The system may also be configured to make an automated response when the data being accumulated in the database indicates that a particular system performance related issue has arisen or is about to arise. The response may, for example, include taking preventative or corrective action such as providing more resources to an application that appears to (or about to) be experiencing a performance related issue and/or removing resources from (or shutting down) lower priority tasks or applications.
It will be appreciated that the service logger component may be enhanced to additionally support other messaging technologies such as Extensible Markup Language (XML') Simple Object Access Protocol (SOAP') messages, Java Message Service (JMS'), Extensible Messaging and Presence Protocol (XMPP') etc. This will allow a greater flexibility as it allows a developer to choose the most appropriate transport to exchange the performance statistic details.
It will be appreciated that techniques may be provided for adding statistical values derived from the execution of the code without actually modifying the source code of an application. One such example in the Java world is to use cross cutting Aspect Oriented techniques to allow external definitions to be injected' into the executing code.
A store and forward mechanism could be provided to speed up the client application even further. This may comprise storing a local cache of the statistics on the client prior to delivery of them to the statistics server. Another store and forward approach may be to utilise a distributed database such as NoSOL databases that provide eventual consistency. This means the database is responsible for the exchange and eventual update of a server database ready for analysis and the viewer 20.
The set of language stacks that sit upon the uniform (e.g. REST) API may be expanded so that developers within that stack can more readily utilise the statistic gathering functionality.
The browser based statistics viewer may be adapted to incorporate alerts and alarms so that an early indication can be provided as to potential problems occurring within running applications or the wider system.
It will be appreciated that the principles and concepts disclosed herein may be extended to a mobile execution environment such that mobile applications can centrally capture and store mobile performance statistics with minimal impact on the mobile application to which the performance statistics relate and hence any mobile device on which the mobile application is provided.
As those skilled in the art will appreciate, the software modules may be provided in compiled or un-compiled form and may be supplied to the different devices (e.g. client machines and/or server machines) as a signal over a computer network, or on a recording medium. Further, the functionality performed by part or all of this software may be performed using one or more dedicated hardware circuits.
As those skilled in the art will appreciate, the apparatus and methods disclosed herein have many different applications to provide technical benefits in any of a
number of different distinct fields.
The apparatus and/or methods disclosed herein may, for example, be provided for monitoring conditions prevailing in a distributed system for supporting health and/or social care services. Such services may comprise, for example, at least one of: community health and/or social care services; mobile health and/or social care services; health and/or social care services in a care recipient's home; health and/or social care services in a residential care home; health and/or social care services of an urgent and unplanned nature (e.g. accident and emergency and/or telephone services such as 111 or the like); mental health and/or social care services; palliative, hospice or end of life health and/or social care services; health and/or social care services, for those with learning disabilities (e.g. in a school and/or care home); The apparatus and/or methods disclosed herein may, for example, be provided for monitoring conditions prevailing in a distributed system to: track response times of external integrations (e.g. integrations with external systems provided by other parties or by other groups or departments within the same organisation; track internal response times of subroutines and/or data retrieval; and/or to track user decision making speed.
The apparatus and/or methods disclosed herein may, for example, be provided for monitoring conditions prevailing in a distributed system for supporting employer services (e.g. business services).
The apparatus and/or methods disclosed herein may, for example, be provided for monitoring conditions prevailing in a distributed system by means of centralised statistics depository for performance measurements across a range of employer services.
The employer services may, for example, comprise any of financial and accounting employer services, human resources employer services, payroll employer services, procurement employer services, document management employer services, supply chain management employer services, business analytics employer services, and/or business intelligence employer services.
The apparatus and/or methods disclosed herein may, for example, be provided for monitoring conditions prevailing in a distributed system for supporting employer services in any of a number of different sectors for example: the public service sector; the private sector; an/or the not-for-profit or voluntary sector.
The apparatus and/or methods disclosed herein may, for example, be provided for monitoring conditions prevailing in a distributed system for supporting managed services. The managed services may, for example, comprise: cloud computing services and/or data centre services.
The apparatus and/or methods disclosed herein may, for example, be provided for monitoring conditions prevailing in a distributed system for supporting electronic learning services.
Various other modifications will be apparent to those skilled in the art and will not be described in further detail here.

Claims (67)

  1. Claims 1. Apparatus for monitoring conditions prevailing in a distributed system in which at least one client application is provided for access by a client device, the apparatus comprising: a client application environment in which the at least one client application and a measurement logging entity are provided; wherein the measurement logging entity comprises: an interface via which the measurement logging entity can receive, from each client application, measurement data representing a respective measure of performance for that client application; means for determining that said measurement data should be logged remotely from the local environment; and means for queuing, in a message queue, a message comprising said measurement data for sending to a primary logging environment for logging in a measurement database when said determining means determines that said measurement data should be logged remotely from the local environment, wherein said message comprising said measurement data in said queue has associated therewith a time-to-live parameter setting a time period for which said message is to be retained in said queue; wherein said means for queuing is configured: (a) to send said message comprising said measurement data, from said message queue, to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application and said time-to-live parameter associated with that message has not expired; and (b) to remove, from said message queue, said message comprising said measurement data, without sending said message comprising said measurement data, when said time-to-live parameter associated with that message expires before that message would otherwise be sent.
  2. 2. Apparatus as claimed in claim 1 wherein said time-to-live parameter is a message queue specific time-to-live parameter that is common to all messages queued in said message queue.
  3. 3. Apparatus as claimed in claim 1 wherein said time-to-live parameter is a message specific time-to-live parameter that is respectively configurable for each message comprising said measurement data.
  4. 4. Apparatus as claimed in claim 3 wherein said message queue has associated therewith a message queue specific time-to-live parameter that is common to all messages queued in said message queue and wherein said means for queuing is configured: to remove, from said message queue, each message comprising said measurement data for which a message specific time-to-live parameter has not been configured, without sending that message comprising said measurement data for which a message specific time-to-live parameter has not been configured, when said message queue specific time-to-live parameter associated with that message expires before that message has been sent; and to remove, from said message queue, each message comprising said measurement data and for which a message specific time-to-live parameter has been configured, without sending that message comprising said measurement data for which a message specific time-to-live parameter has been configured, when said message specific time-to-live parameter configured for that message expires before that message has been sent regardless of a time period set by the message queue specific time-to-live parameter.
  5. 5. Apparatus as claimed in claim 3 014 wherein said message specific time-to-live parameter is adapted to be configured, on a message by message basis, by the at least one client application.
  6. 6. Apparatus as claimed in any preceding claim wherein said message queue has associated therewith a transmission order parameter which can be set: to a value that indicates that messages in said message queue should be sent in a first-in-first-out (FIFO) order; or to a value that indicates that messages in said message queue should be sent in a last-in-first-out (LIFO) order; and wherein said means for queuing is configured to send said messages in a FIFO order or [lEO order depending on the value of said parameter.
  7. 7. Apparatus as claimed in claim 6 wherein said transmission order parameter is adapted to be configured and reconfigured by the at least one client application.
  8. 8. Apparatus as claimed in any preceding claim wherein the measurement logging entity is adapted to receive via said interface, respective measurement data for a plurality of related operations or groups of operations; wherein respective measurement data for each of the plurality of related operations or groups of operations is provided in association with a shared key that is common to said related operations or groups of operations.
  9. 9. Apparatus as claimed in claim 8 wherein said means for queuing is operable to queue, in a single message in said message queue, said respective measurement data for each of the plurality of related operations or groups of operations that share said shared key.
  10. 10. Apparatus as claimed in any of claims 1 to 7 wherein the measurement logging entity is adapted to receive via said interface, respective measurement data for at least one of a plurality of related operations or groups of operations performed across the distributed system; wherein the respective measurement data for the at least one of the plurality of related operations or groups of operations is provided in association with a shared key that is common to said related operations or groups of operations.
  11. 11. Apparatus as claimed in claim 10 wherein said means for queuing is operable to queue, in said message queue, at least one message comprising said received measurement data for at least one of a plurality of related operations or groups of operations performed across the distributed system, wherein said at least one message comprising said received measurement data for at least one of a plurality of related operations or groups of operations performed across the distributed system further comprises said shared key.
  12. 12. Apparatus as claimed in any preceding claim wherein said measurement logging entity is operable to provide for each message comprising said measurement data, associated context information comprising at least one context parameter representing a condition prevailing in the distributed system at a time that the measurement data was acquired.
  13. 13. Apparatus as claimed in claim 12 further comprising means for requesting a prediction of a measure of performance based on current conditions prevailing in the distributed system and for receiving, responsive to said request, a prediction of a measure of performance based on the previously provided measurement data the associated context information for the time that the measurement data was required.
  14. 14. Apparatus as claimed in any preceding claim wherein the measurement logging entity comprises a first (local') measurement logging part and a second (temote') measurement logging part wherein: the first measurement logging part comprises means for receiving measurement data via said interface, said means for determining that said measurement data should be logged remotely from the local environment, means for generating said message comprising said measurement data, and means for sending the generated message to the second measurement logging part; and the second measurement logging part comprises means for receiving said generated message from said first measurement logging part, and said means for queuing said message.
  15. 15. Apparatus as claimed in any preceding claim wherein the determining means is configured for determining whether said measurement data should be logged remotely from the local environment or logged within the local environment.
  16. 16. Apparatus as claimed in claim 15 comprising means for logging said measurement data locally when it is determined that said measurement data should be logged within the local environment.
  17. 17. Apparatus as claimed in any preceding claim wherein the measurement logging entity is configured to receive, from a client application, an indication that said measurement data should be logged locally and said determining means is configured for determining that said measurement data should be logged, for that client application, within the local environment responsive to receipt of said indication that said measurement data should be logged locally.
  18. 18. Apparatus as claimed in any preceding claim wherein the measurement logging entity is configured to receive, from a client application, an indication that remote logging of said measurement data should be suspended and said determining means is configured for determining that said measurement data, for that client application, should be logged within the local environment responsive to receipt of said indication that remote logging of said measurement data should be suspended.
  19. 19. Apparatus as claimed in any preceding claim wherein the measurement logging entity is configured to receive, from a client application, an indication that logging of said measurement data should cease and to disable logging of measurement data for that client application responsive to receipt of said indication that logging of said measurement data.
  20. 20. Apparatus as claimed in any preceding claim wherein said queuing means is configured to send said message comprising said measurement data to said primary logging environment via an interface that is independent of a software platform or framework used to provide said client application.
  21. 21. Apparatus as claimed in claim 20 wherein said interface that is independent of a software platform or framework is a uniform application programing interface (API).
  22. 22. Apparatus as claimed in claim 21 wherein said API is a representational state transfer (REST) service (RS) API.
  23. 23. Apparatus as claimed in any preceding claim wherein said at least one client application comprises a plurality of client applications and wherein said measurement logging entity is configured to receive respective measurement data from each said client application.
  24. 24. Apparatus as claimed in any preceding claim further comprising a plurality of further client application environments, each further client application environment comprising a respective measurement logging entity.
  25. 25. Apparatus as claimed in any preceding claim further comprising the primary logging environment wherein said primary logging environment comprises means for receiving said message comprising measurement data from said queuing means and means for logging said measurement data accordingly.
  26. 26. Apparatus as claimed in claim 25 wherein said receiving means of said primary logging environment is configured to receive a message comprising measurement data from the respective measurement logging entity of each of a plurality of client application environments.
  27. 27. Apparatus as claimed in 25 or 26 wherein said receiving means of said primary logging environment is configured to receive a plurality of messages comprising measurement data for a plurality of related operations or groups of operations performed across the distributed system, wherein each of said plurality of messages comprises a shared key that is common to said related operations or groups of operations, and wherein said means for logging said measurement data is operable log said measurement data provided in said plurality of messages in association with said shared key.
  28. 28. Apparatus as claimed in any of claims 25 to 27 wherein said receiving means of said primary logging environment is configured to receive with each message comprising said measurement data, associated context information comprising at least one context parameter representing a condition prevailing in the distributed system at a time the measurement data was acquired.
  29. 29. Apparatus as claimed in claim 28 further comprising means for receiving a request for a prediction of a measure of performance based on current conditions prevailing in the distributed system and for determining, responsive to said request, a prediction of a measure of performance based on previously logged measurement data and the associated context information for the time that the measurement data was acquired.
  30. 30. Apparatus as claimed in claim 25 or 26 wherein said primary logging environment further comprises a viewer entity for generating a visual display of stored measurement data.
  31. 31. Apparatus as claimed in any of claims 25 to 30 wherein said viewer entity is configured to provide an alert when said measurement data indicates that a predetermined criterion has been, or is about to be, met.
  32. 32. Apparatus as claimed in any preceding claim wherein said means for queuing is configured to operate a processing thread having a lower priority than a processing thread that the at least one client application uses whereby said message comprising said measurement data is sent to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application.
  33. 33. Apparatus as claimed in any preceding claim wherein said means for queuing comprises a scheduler that uses a background processing thread to process each message added to said message queue wherein said processing thread has a lower scheduling priority than that of a general execution thread used by the at least one application whereby said message comprising said measurement data is sent to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application.
  34. 34. Apparatus as claimed in any preceding claim wherein said means for queuing is configured to operate said message queue as a first in first out (FIFO) message queue.
  35. 35. Apparatus as claimed in any preceding claim configured for use in a mobile execution environment.
  36. 36. Apparatus as claimed in any of claims 1 to 35 configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services.
  37. 37. Apparatus as claimed in claim 36 configured for monitoring conditions prevailing in a distributed system for supporting community health and/or social care services.
  38. 38. Apparatus as claimed in claim 36 configured for monitoring conditions prevailing in a distributed system for supporting mobile health and/or social care services.
  39. 39. Apparatus as claimed in claim 36 configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services in a care recipient's home.
  40. 40. Apparatus as claimed in claim 36 configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services in a residential care home.
  41. 41. Apparatus as claimed in claim 36 configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services of an urgent and unplanned nature.
  42. 42. Apparatus as claimed in claim 36 configured for monitoring conditions prevailing in a distributed system for supporting mental health and/or social care services.
  43. 43. Apparatus as claimed in claim 36 configured for monitoring conditions prevailing in a distributed system for supporting palliative, hospice or end of life health and/or social care services.
  44. 44. Apparatus as claimed in claim 36 configured for monitoring conditions prevailing in a distributed system for supporting health and/or social care services, for those with learning disabilities, in a school and/or care home.
  45. 45. Apparatus as claimed in any of claims 36 to 44 configured for monitoring conditions prevailing in a distributed system to track response times of external integrations.
  46. 46. Apparatus as claimed in any of claims 36 to 45 configured for monitoring conditions prevailing in a distributed system to track internal response times of subroutines and/or data retrieval.
  47. 47. Apparatus as claimed in any of claims 36 to 46 configured for monitoring conditions prevailing in a distributed system to track user decision making speed.
  48. 48. Apparatus as claimed in any of claims 1 to 35 configured for monitoring conditions prevailing in a distributed system for supporting employer services.
  49. 49. Apparatus as claimed in claim 48 configured for monitoring conditions prevailing in a distributed system by means of centralised statistics depository for performance measurements across a range of said employer services.
  50. 50. Apparatus as claimed in claim 48 or 49 configured for monitoring conditions prevailing in a distributed system for providing at least one of financial and accounting employer services, human resources employer services, payroll employer services, procurement employer services, document management employer services, supply chain management employer services, business analytics employer services, and business intelligence employer services.
  51. 51. Apparatus as claimed in any of claims 48 to 51 configured for monitoring conditions prevailing in a distributed system for supporting employer services in the public service sector.
  52. 52. Apparatus as claimed in any of claims 48 to 51 configured for monitoring conditions prevailing in a distributed system for supporting employer services in the private sector.
  53. 53. Apparatus as claimed in any of claims 48 to 51 configured for monitoring conditions prevailing in a distributed system for supporting employer services in the not-for-profit or voluntary sector.
  54. 54. Apparatus as claimed in any of claims 1 to 35 configured for monitoring conditions prevailing in a distributed system for supporting managed services.
  55. 55. Apparatus as claimed in claim 54 configured for monitoring conditions prevailing in a distributed system for supporting managed services comprising cloud computing services.
  56. 56. Apparatus as claimed in claim 54 or 55 configured for monitoring conditions prevailing in a distributed system for supporting managed services comprising data centre services.
  57. 57. Apparatus as claimed in any of claims 1 to 35 configured for monitoring conditions prevailing in a distributed system for supporting electronic learning services.
  58. 58. An application configured to operate as the client application of the apparatus of any preceding claim, the application comprising: means for configuring said client application to perform a measurement of performance for the client application; means for performing a measurement of performance configured by said configuring means; and means for passing at least one result of the measurement of performance performed by said measurement performing means, as at least part of said measurement data, to said measurement logging entity for logging in association with said time-to-live parameter.
  59. 59. An application as claimed in claim 58 wherein said configuring means is operable to configure at least one start point and at least one end point for said measurement of performance.
  60. 60. An application as claimed in any claim 59 wherein said at least one result comprises an elapsed time beginning at said at least one start point and ending at said at least one end point.
  61. 61. An application as claimed in any of claims claim 58 to 60 wherein said results passing means is configured for passing said result with associated metadata relating to said measurement, as at least part of said measurement data, to said measurement logging entity for logging.
  62. 62. An application as claimed in claim 61 wherein said metadata comprises at least one of: information identifying the client application to which the measurement data relates; information identifying an operation or group of operations for which the measurement was performed; information identifying a time at which the measurement was performed; and/or information indicating an approximate magnitude of the operation or group of operations to which the measurement relates.
  63. 63. An application as claimed in claim 61 or 61 wherein said metadata comprises information for identifying a data type of said measurement data (e.g. string, numeric, date and/or time related).
  64. 64. A method for monitoring conditions prevailing in a distributed system in which at least one client application is provided for access by a client device, the method comprising: a logging entity: receiving via an interface via, from a client application, measurement data representing a respective measure of performance for that client application; determining that said measurement data should be logged remotely from the local environment; and queuing, in a message queue, a message comprising said measurement data for sending to a primary logging environment for logging in a measurement database when said determining step determines that said measurement data should be logged remotely from the local environment, wherein said message comprising said measurement data in said queue has associated therewith a time-to-live parameter setting a time period for which said message is to be retained in said queue; and sending said message comprising said measurement data, from said queue, to said primary logging environment at a time when sufficient resources are available to send said message without having a significant impact on execution of said at least one client application and said time-to-live parameter associated with that message has not expired; or removing, from said message queue, said message comprising said measurement data, without sending said message comprising said measurement data, when said time-to-live parameter associated with that message expires before that message would otherwise be sent.
  65. 65. A method performed by a client application configured to operate as part of the apparatus of any of claims ito 57, the method comprising: performing a measurement of performance in accordance with a measurement configuration; and passing at least one result of the measurement of performance, as at least part of said measurement data, to said measurement logging entity for logging in association with said time-to-live parameter.
  66. 66. A computer program product comprising computer implementable instructions which, when executed on a computer processing apparatus, cause said computer processing apparatus to become configured as the apparatus of any of claims ito 57 or as an application according to any of claims 58 to 63.
  67. 67. A computer program product comprising computer implementable instructions which, when executed on a computer processing apparatus, cause said computer processing apparatus to perform a method according to claim 64 or 65.
GB1409563.2A 2013-05-29 2014-05-29 Methods and apparatus for monitoring conditions prevailing in a distributed system Expired - Fee Related GB2516357B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1309604.5A GB2514584A (en) 2013-05-29 2013-05-29 Methods and apparatus for monitoring conditions prevailing in a distributed system

Publications (3)

Publication Number Publication Date
GB201409563D0 GB201409563D0 (en) 2014-07-16
GB2516357A true GB2516357A (en) 2015-01-21
GB2516357B GB2516357B (en) 2015-08-19

Family

ID=48784865

Family Applications (2)

Application Number Title Priority Date Filing Date
GB1309604.5A Withdrawn GB2514584A (en) 2013-05-29 2013-05-29 Methods and apparatus for monitoring conditions prevailing in a distributed system
GB1409563.2A Expired - Fee Related GB2516357B (en) 2013-05-29 2014-05-29 Methods and apparatus for monitoring conditions prevailing in a distributed system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB1309604.5A Withdrawn GB2514584A (en) 2013-05-29 2013-05-29 Methods and apparatus for monitoring conditions prevailing in a distributed system

Country Status (1)

Country Link
GB (2) GB2514584A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10268714B2 (en) 2015-10-30 2019-04-23 International Business Machines Corporation Data processing in distributed computing

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105204972B (en) * 2015-09-09 2018-10-19 北京思特奇信息技术股份有限公司 A kind of unified method and system issued and manage of executable program
CN106708693A (en) * 2015-11-16 2017-05-24 亿阳信通股份有限公司 Alarm data processing method and device
US10416974B2 (en) 2017-10-06 2019-09-17 Chicago Mercantile Exchange Inc. Dynamic tracer message logging based on bottleneck detection
CN113661484A (en) * 2021-08-25 2021-11-16 商汤国际私人有限公司 Log recording method and device, electronic equipment and computer readable storage medium
WO2023026086A1 (en) * 2021-08-25 2023-03-02 Sensetime International Pte. Ltd. Logging method and apparatus, electronic device, and computer-readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020065948A1 (en) * 2000-11-30 2002-05-30 Morris Larry A. Operating system event tracker
US20050028171A1 (en) * 1999-11-12 2005-02-03 Panagiotis Kougiouris System and method enabling multiple processes to efficiently log events
US20060167951A1 (en) * 2005-01-21 2006-07-27 Vertes Marc P Semantic management method for logging or replaying non-deterministic operations within the execution of an application process
US20080222186A1 (en) * 2007-03-09 2008-09-11 Costin Cozianu System and method for on demand logging of document processing device status data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004264970A (en) * 2003-02-28 2004-09-24 Hitachi Ltd Program, information processor, and method for outputting log data in information processor
JP2006085372A (en) * 2004-09-15 2006-03-30 Toshiba Corp Information processing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050028171A1 (en) * 1999-11-12 2005-02-03 Panagiotis Kougiouris System and method enabling multiple processes to efficiently log events
US20020065948A1 (en) * 2000-11-30 2002-05-30 Morris Larry A. Operating system event tracker
US20060167951A1 (en) * 2005-01-21 2006-07-27 Vertes Marc P Semantic management method for logging or replaying non-deterministic operations within the execution of an application process
US20080222186A1 (en) * 2007-03-09 2008-09-11 Costin Cozianu System and method for on demand logging of document processing device status data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10268714B2 (en) 2015-10-30 2019-04-23 International Business Machines Corporation Data processing in distributed computing

Also Published As

Publication number Publication date
GB2516357B (en) 2015-08-19
GB2514584A (en) 2014-12-03
GB201309604D0 (en) 2013-07-10
GB201409563D0 (en) 2014-07-16

Similar Documents

Publication Publication Date Title
US10348809B2 (en) Naming of distributed business transactions
Bhattacharjee et al. Barista: Efficient and scalable serverless serving system for deep learning prediction services
EP2882140B1 (en) Data partitioning in internet-of-things (IOT) network
US9037707B2 (en) Propagating a diagnostic session for business transactions across multiple servers
GB2516357A (en) Methods and apparatus for monitoring conditions prevailing in a distributed system
Canizo et al. Implementation of a large-scale platform for cyber-physical system real-time monitoring
US10783002B1 (en) Cost determination of a service call
US10230611B2 (en) Dynamic baseline determination for distributed business transaction
Koulouzis et al. Time‐critical data management in clouds: Challenges and a Dynamic Real‐Time Infrastructure Planner (DRIP) solution
Andrade et al. Dependability evaluation of a disaster recovery solution for IoT infrastructures
CN106104626B (en) The update of digital content based on analysis
US20210366268A1 (en) Automatic tuning of incident noise
JP7461696B2 (en) Method, system, and program for evaluating resources in a distributed processing system
WO2016176421A1 (en) Intelligent management of processing tasks on multi-tenant or other constrained data processing platform
KR20150118963A (en) Queue monitoring and visualization
CN112445583A (en) Task management method, task management system, electronic device, and storage medium
CN114207590A (en) Automated operational data management for quality of service criteria determination
US20230185674A1 (en) System and method for optimized scheduling of data backup/restore
US11438426B1 (en) Systems and methods for intelligent session recording
Sarathchandra et al. Resource aware scheduler for distributed stream processing in cloud native environments
Hasan Quota based access-control for Hops: Improving cluster utilization with Hops-YARN
Kalim Satisfying service level objectives in stream processing systems
Dong et al. A proactive cloud management architecture for private clouds
JP2024030310A (en) Customer behavior prediction device, customer behavior prediction method, and program
Zhuang et al. Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters: A Case Study of Hadoop Cluster

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20230529