GB2612960A

GB2612960A - Authorisation system and methods

Info

Publication number: GB2612960A
Application number: GB2116165.8A
Authority: GB
Inventors: Alexander Bragg Michael; Suzukawa Keigo; Philippart Gilles; Smith Webb Jamie; De Barbeyrac Saint Maurice-Julien Bernard; Qu Wenjun; Zhong Lei; Raja Ajmal; Jakob Andor; Natarajan Bharadhwaj Mahesh; Dubey Tirthankar
Original assignee: Made To Do More Ltd
Current assignee: Made To Do More Ltd
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2023-05-24

Abstract

A classification engine uses data relating to an entity to determine a first classification in dependence on the data file; and transmit a request for further data relating to the entity, wherein the data file is updated to include the further data; and determine a second classification using the updated data file; wherein the first and second classifications are each determined using one or more pure functions; and an authorisation engine configured to authorise the entity in dependence on the first and/or second classifications. Risk classification may also be carried our for the entities to determine a prediction of failure or success of the further entity using indicators. This may be applied to authorising driving licences, entry to premises,borders, persons or vehicles. Failure may relate to mechanical failure, to failure to execute required actions, malware, size or age of an app, or likelihood of repayment of a loan. Sensitive data may be encrypted or the history of download, installation of an app noted.

Description

AUTHORISATION SYSTEM AND METHODS

Field of the Invention

The present disclosure relates to an authorisation system and methods. The invention also extends to a risk classification system and methods. The disclosure is particularly, but not exclusively, 5 applicable to the authorisation of an entity for access to a resource.

Background to the Disclosure

Existing authorisation systems often have a complex structure, which may make the systems difficult to test and update. Further, the same authorisation system is often used for a long period of time, with no or only small changes that have little impact on the authorisation process, but which further 10 increase the complexity and make future testing and updates to the system yet more difficult. Moreover, such complex authorisation systems typically collect large amounts of data relating to the entity to be authorised, thereby increasing memory usage, and making the authorisation process computationally intensive and slow.

The present disclosure seeks to at least partially alleviate the problems outlined above.

Summary of the Disclosure

Aspects of the disclosure are set out in the accompanying claims. These and other aspects and embodiments of the invention are also described herein.

In an aspect of the present disclosure, there is provided an authorisation system, comprising: a classification engine configured to: receive a data file comprising data relating to an entity; determine a first classification in relation to the entity in dependence on the data file; in dependence on the first classification, transmit a request for further data relating to the entity, wherein the data file is updated to include the further data; and determine a second classification in relation to the entity in dependence on the updated data file; wherein the first and second classifications are each determined using one or more pure functions; and an authorisation engine configured to authorise the entity in dependence on the first and/or second classifications.

The classification engine determines the first classification based on an initial data file and then updates the data file based on this first classification for use in determining a second classification, thus providing an iterative process in which the classification and data gathering steps are interleaved. This may allow the implementation of an adaptive authorisation system that can operate on an initially incomplete dataset and authorise an entity in a computationally efficient manner by fetching only the necessary further data. Further, the first and second classifications are determined using pure functions, which may provide improved traceability of the authorisation result, and easier testing of the first and second classifications (as well as of possible alternative classifications).

The classification engine may be configured to generate a first classification output, the first classification output comprising the first classification and the request for further data. This may provide further improved traceability of the authorisation result, as the first classification output associates the request for further data with the first classification, and so provides information as to exactly which classification triggered which request for further data. This may also provide reduced network usage as the classification engine can transmit (e.g. to a communication engine) both the first classification and the request for further data in one go.

The first classification output may be stored in memory. This may allow caching the first classification output so that it can subsequently be re-used. -2 -

The classification engine may be configured to: receive a further data file comprising data relating to a further entity; and determine a first classification in relation to the further entity in dependence on the further data file; wherein, when the data relating to the entity and the data relating to the further entity are the same, the determining the first classification in relation to the further entity comprises retrieving from the memory the first classification output. Determining the first classification using pure functions and caching the first classification output may allow the classification engine to reuse previous results and so increase computational efficiency.

The classification engine may be configured to determine iteratively a plurality of classifications in relation to the entity, each determination of a classification comprising: determining a classification in relation to the entity in dependence on the data file; and in dependence on the classification, transmitting a request for yet further data relating to the entity wherein the data file is updated to include the yet further data; wherein each of the plurality of classifications is determined using one or more pure functions.

Preferably, when it is determined that sufficient data relating to the entity has been collected for an authorisation decision to be made, the iterations are halted and the authorisation engine is configured to authorise the entity in dependence on the determined classification(s). Halting the authorisation process once sufficient data is collected to make the authorisation decision may allow the classification engine to collect only the data required to authorise the given entity and so improve computational efficiency. Further, this may provide an adaptive authorisation system where different data may be collected for entities with different attributes (i.e. the iterative process may be halted at different points for entities with different attributes), thereby allowing computationally-efficient authorisation of each entity.

The authorisation engine may be configured to authorise the entity in dependence on one or more of the plurality of classifications, preferably at least in part in dependence on the last classification.

The classification engine may be configured to generate a classification output for each of the plurality of classifications, each classification output comprising the respective classification and request for yet further data.

The system may further comprise a processing engine configured to compare: the first and/or second classifications as determined using one or more pure functions; and at least one alternative classification, the at least one alternative classification being determined in dependence on the data file using one or more alternative pure functions. The alternative classification may relate to a test classification that is tested prior to being implemented as part of the authorisation process. Thus, a data file obtained as part of authorising an entity may subsequently be used to test alternative classifications and compare the authorisation results and/or classifications as determined using the currently used pure functions (which always return the same results for the same inputs and so can be re-run for testing purposes and return the same result) and alternative pure functions. Depending on this comparison, the authorisation system may replace the current classifications with the alternative classifications. This may provide for easier and more accurate testing of new pure functions and/or classifications based on historic data (files).

The authorisation engine may be configured to authorise the entity in dependence on the first and/or second classification and the at least one alternative classification.

The processing engine may be configured, for a plurality of entities, to compare: authorisation results for the entities based on the classifications determined using the one or more pure functions; and authorisation results for the entities based on alternative classifications determined using one or 45 more alternative pure functions. -3 -

The system may be configured to replace the pure functions with the alternative pure functions in dependence on the comparison between the authorisation results.

Preferably, the data file is stored in memory, each time it is updated, for use by the classification engine. This may allow caching the data file and provide for efficient look-up of the data file at each 5 classification step.

Preferably, the inputs and outputs of one or more of the pure functions are stored in memory. Thus, next time a pure function is to be evaluated for the same inputs, the stored output can be retrieved from memory (as opposed to the pure function being evaluated anew) which may allow improving computational efficiency of the authorisation process.

Preferably, the system comprises means for subscribing to a data feed, wherein the data file is received by the classification engine, and/or each classification is received by the authorisation engine, by subscribing to a data feed. This may allow efficient communication between the various engines of the system.

The classification engine may be further configured to transmit the data file to a database 15 (preferably a data lake) for storage.

The classification engine may be further configured to transmit the first and second classifications and/or parameters of the pure functions to a database for storage.

The system may further comprise a communication engine configured to receive the request for further data from the classification engine, and to add the further data to the data file.

The communication engine may be configured to transmit one or more requests for said further data to a user device and/or one or more remote servers.

The communication engine may be configured to receive data relating to an entity, and to generate the data file comprising said data.

Preferably, adding the further data to the data file comprises: transmitting the further data to a 25 database (preferably a data lake) for storage, and adding a reference to the further data to the data file. The authorisation engine may be configured to authorise the entity for access to a resource; optionally wherein the data file further comprises data relating to the resource.

The system may further comprise a parsing engine configured to parse the further data before it is added to the data file; preferably wherein the further data is received as image data and parsed into 30 text data.

Preferably, the authorisation engine is configured to authorise the entity at least partly in dependence on the second (i.e. final/last) classification.

Preferably, the authorisation system is time-invariant.

Optionally, the system further comprises: a modelling engine which, for a plurality of entities, is configured to: classify each of the entities into one or more groupings in dependence on one or more attributes of the entities; and identify, for each grouping, in dependence on stored data relating to failure (or success) of the entities in the grouping, at least one indicator, the at least one indicator being correlated with failure (or success) of the entities; a risk classification engine which, in response to a request to classify risk relating to the entity, is configured to: transmit a request, to a user device, for data associated with the one or more attributes of the entity; determine, among the groupings, one or more groupings for the entity in dependence on said data; transmit a request, to the user device, for data associated with the at least one indicator for the determined grouping(s); and determine a prediction of failure (or success) of the entity at least partly in dependence on the data associated with the at least one indicator.

The determining of the first or second classifications may comprise determining a prediction of -4 -failure (or success) of the entity as aforementioned.

Optionally, the authorisation engine is configured to: receive the updated data file; determine, in dependence on a first subset of data in said updated data file, a risk classification for the entity based on risk relating to providing the service to the entity; determine, in dependence on a second subset of data in said updated data file, an acceptability classification for the entity based on a prediction of the acceptability of the service to the entity; determine, in dependence on the risk and acceptability classifications, a risk compensation required for providing a service to the entity; and authorise the entity for the service in dependence on the risk classification, comprising applying the risk compensation.

In another aspect of the present disclosure, there is provided a method of authorising an entity, comprising: receiving a data file comprising data relating to an entity; determining a first classification in relation to the entity in dependence on the data file; in dependence on the first classification, transmitting a request for further data relating to the entity, wherein the data file is updated to include the further data; determining a second classification in relation to the entity in dependence on the updated data file; and authorising the entity in dependence on the first and/or second classifications-wherein the first and second classifications are each determined using one or more pure functions.

Optionally, the method produces an output.

Optionally, the method presents the output. Preferably, the method presents the output on or to a display.

Optionally, the method further comprises producing an output.

Optionally, the method further comprises presenting the output.

Optionally, the method further comprises presenting the output on or to a display.

Optionally, the output is provided to a computing device, such as a user device and/or a server, or to another computer system via an application programming interface (API).

Optionally, the method is computer-implemented.

In another aspect of the present disclosure, there is provided a computer programme product comprising instructions which, when executed by a computer, cause the computer to authorise an entity, comprising: receiving a data file comprising data relating to an entity; determining a first classification in relation to the entity in dependence on the data file; in dependence on the first classification, transmitting a request for further data relating to the entity, wherein the data file is updated to include the further data; determining a second classification in relation to the entity in dependence on the updated data file; and authorising the entity in dependence on the first and/or second classifications; wherein the first and second classifications are each determined using one or more pure functions.

In another aspect of the present disclosure, there is provided a risk classification system comprising: a modelling engine which, for a plurality of entities, is configured to: classify each of the entities into one or more groupings in dependence on one or more attributes of the entities; and identify, for each grouping, in dependence on stored data relating to failure (or success) of the entities in the grouping, at least one indicator, the at least one indicator being correlated with failure (or success) of the entities; a risk classification engine which, in response to a request to classify risk relating to a further entity, is configured to: transmit a request, to a user device, for data associated with the one or more attributes of the further entity; determine, among the groupings, one or more groupings for the further entity in dependence on said data; transmit a request, to the user device, for data associated with the at least one indicator for the determined grouping(s); and determine a prediction of failure (or success) of the further entity at least partly in dependence on the data associated with the at least one indicator (thereby classifying the risk relating to the entity).

Thus, the risk classification engine collects data from the user device in two (or more) steps, -5 -which may allow improving the accuracy of the prediction of failure (or success) of the entity -since this is computed based on indicators determined to be correlated to failure (or success) for similar previous entities (i.e. entities in the same grouping); while reducing the amount of data collected from the user device -since only data needed to classify the entity into groupings and compute the prediction of failure for those specific groupings is collected, but no more.

Preferably, if the prediction of failure (or success) is below (or above) a threshold, the risk classification engine is configured to: transmit, in dependence on the determined groupings, a request for additional data relating to the further entity to the user device and/or to a remote server; and redetermine the prediction of failure (or success) of the further entity at least partly in dependence on: said data associated with the at least one indicator for the determined groupings, and the additional data. Conversely, if the prediction of failure (or success) is above (or below) a threshold, the risk classification engine preferably does not redetermine the prediction and outputs the initial prediction. This may allow reducing the computational cost of determining the prediction of failure (or success) by terminating the method early for entities that have a high (or low) probability of failure (or success) and are unlikely to be authorised (i.e. only incurring the computation cost of redetermining the prediction for entities for which the prediction of failure (or success) is below (or above) a threshold). Preferably, if the prediction of failure (or success) is above (or below) the threshold, the entity is not authorised.

The risk classification system may further comprise an authorisation engine configured to authorise the entity at least partly in dependence on the determined prediction of failure (or success).

The authorisation engine may be configured to: authorise the entity if the determined prediction of failure (or success) and the redetermined prediction of failure (or success) are below (or above) a threshold; and/or deny authorisation to the entity if either the determined prediction of failure (or success) or the redetermined prediction of failure (or success) are above (or below) the threshold.

Preferably, the modelling engine is further configured to identify, for each grouping, in 25 dependence on stored data relating to failure (or success) of the entities in the grouping, at least one further indicator not, or weakly, correlated with failure (or success) of an entity in the grouping.

Preferably, the risk classification engine is configured, prior to redetermining the prediction of failure (or success) of the further entity, to exclude, from the data associated with the one or more attributes of the further entity and/or the additional data, data associated with the at least one further indicator not, or weakly, correlated with failure (or success) of an entity. This may provide significant computational cost savings as data associated with the at least one further indicator not, or weakly, correlated with failure (or success) of an entity need not be processed (which processing may be computationally costly), while not, or only very slightly, reducing accuracy of the prediction.

The modelling engine may be configured periodically to: classify a different plurality of entities into one or more groupings in dependence on one or more attributes of the entities; identify, for each grouping, in dependence on stored data relating to failure (or success) of the entities in the grouping, at least one updated indicator, the at least one updated indicator being correlated with failure (or success) of the entities; and replace the at least one indicator with the at least one updated indicator. Periodically classifying entities into groupings and identifying indicators for each grouping may allow ensuring that the groupings and indicators are up-to-date (i.e. are determined taking into account the latest data). Optionally, a weighting factor may be applied to the data based on recency, such that more recent data is more heavily weighted in determining the groupings and/or indicators.

Preferably, the risk classification engine is configured to classify the entity into one of a plurality of predetermined bands in dependence on the determined prediction of failure (or success).

Preferably, the risk classification engine is further configured to parse the additional data to -6 -identify data associated with one or more attributes of the entity.

In another aspect of the present disclosure, there is provided a method of classifying risk, comprising: classifying each of a plurality of entities into one or more groupings in dependence on one or more attributes of the entities; identifying, for each grouping, in dependence on stored data relating 5 to failure (or success) of the entities in the grouping, at least one indicator, the at least one indicator being correlated with failure (or success) of the entities; and in response to a request to classify risk relating to a further entity: transmitting a request, to a user device, for data associated with the one or more attributes of the further entity; determining, among the plurality of groupings, one or more groupings for the further entity in dependence on said data; transmitting a request, to the user device, 10 for data associated with the at least indicator for the determined groupings; and determining a prediction of failure (or success) of the further entity at least partly in dependence on the data associated with the at least one indicator for the determined groupings.

In another aspect of the present disclosure, there is provided a computer programme product comprising instructions which, when executed by a computer, cause the computer to classify risk, comprising: classifying each of a plurality of entities into one or more groupings in dependence on one or more attributes of the entities; identifying, for each grouping, in dependence on stored data relating to failure (or success) of the entities in the grouping, at least one indicator, the at least one indicator being correlated with failure (or success) of the entities; and in response to a request to classify risk relating to a further entity: transmitting a request, to a user device, for data associated with the one or more attributes of the further entity; determining, among the plurality of groupings, one or more groupings for the further entity in dependence on said data; transmitting a request, to the user device, for data associated with the at least indicator for the determined groupings; and determining a prediction of failure (or success) of the further entity at least partly in dependence on the data associated with the at least one indicator for the determined groupings.

In another aspect of the present disclosure, there is provided a risk classification system comprising: a modelling engine configured to identify at least one indicator correlated with failure of an entity in dependence on stored data relating to failure of a plurality of entities, a communication engine configured to transmit, to a user device, a request for data associated with said at least one indicator, and a classification engine configured to determine a prediction of failure of the entity at least partly in dependence on said data associated with the at least one indicator.

Preferably, the communication engine is configured to: receive said data associated with the at least one indicator from the user device; and in dependence on the data received from the user device, transmit a request for further data relating to the entity to a remote server.

Preferably, the classification engine is configured to determine: a first prediction of failure of an 35 entity at least partly in dependence on said data associated with at least one indicator, and a second prediction at least partly in dependence on the first prediction and said further data.

In another aspect of the present disclosure, there is provided a method of classifying risk, comprising: identifying at least one indicator correlated with failure of an entity in dependence on stored data relating to failure of a plurality of entities; transmitting a request for data associated with said at least one indicator to a user device; and determining a prediction of failure of the entity at least partly in dependence on said data associated with the at least one indicator.

In another aspect of the present disclosure, there is provided a system for authorising an entity for a service, comprising an authorisation engine configured to: receive data relating to the entity; determine, in dependence on a first subset of said received data, a first classification for the entity based 45 on risk relating to providing the service to the entity; determine, in dependence on a second subset of -7 -said received data, a second classification for the entity based on a prediction of the acceptability of the service to the entity; determine, in dependence on the first and second classifications, a risk compensation required for providing the service to the entity; and authorise the entity for the service in dependence on the first classification, comprising applying the risk compensation.

Determining a risk classification based on risk relating to providing the service to the entity and on a prediction of the acceptability of the service to the entity, and subsequently applying the risk compensation when authorising the entity for the service may provide increased flexibility to the authorisation process and allow entity-specific authorisation measures to be implemented. For example, entities of somewhat higher risk can still be authorised in a safe manner with the risk posed by authorising the entity being compensated for (rather than e.g. authorising only entities that match very stringent requirements and not applying any risk compensations).

Further, determining and applying the risk compensation may allow improving the balance between the risk relating to providing the service to the entity (assessed via the first classification) and the sensitivity of the entity to risk compensation actions (assessed via the second classification), and 15 may allow ensuring that the service is acceptable to the entity while being safe to the service provider. The system for authorising an entity for a service may further comprise a modelling engine configured to identify, in dependence on stored data relating to the provision of the service to, and the acceptance of the service by, a plurality of entities: the first subset of data, the first subset of data being correlated with the risk relating to providing the service to the entities; and/or the second subset of data, 20 the second subset of data being correlated with the acceptability of the service to the entities.

The system for authorising an entity for a service may further comprise a communication engine configured to transmit requests for the first and/or second subsets of data to the entity.

Preferably, the first and second classifications are determined in parallel. This may allow reducing computation time of authorising an entity for a service.

Optionally, the first and second classifications are determined sequentially, preferably wherein the first classification is used as an input when determining the second classification. This may allow increasing the accuracy of the second classification.

Preferably, each classification comprises assigning the entity to one of a plurality of bands. Preferably, determining the first classification comprises: determining a prediction of the 30 probability of failure (or success) of the entity; and scaling said prediction based on one or more attributes of the entity. Scaling the prediction of failure may improve the accuracy of the prediction. The modelling engine may be configured to determine the one or more attributes of the entity used for scaling the prediction, wherein the one or more attributes preferably comprise attributes determined to be most closely correlated with probability of failure (or success) of the entity.

In another aspect of the present disclosure, there is provided a method of authorising an entity for a service, comprising: receiving data relating to the entity; determining, in dependence on a first subset of said received data, a first classification for the entity based on risk relating to providing the service to the entity; determining, in dependence on a second subset of said received data, a second classification for the entity based on a prediction of the acceptability of the service to the entity; determining, in dependence on the first and second classifications, a risk compensation required for providing the service to the entity; and authorising the entity for the service in dependence on the first classification, comprising applying the risk compensation.

In another aspect of the present disclosure, there is provided a computer programme product comprising instructions which, when executed by a computer, cause the computer to authorise an entity 45 for a service, comprising: receiving data relating to the entity; determining, in dependence on a first -8 -subset of said received data, a first classification for the entity based on risk relating to providing the service to the entity; determining, in dependence on a second subset of said received data, a second classification for the entity based on a prediction of the acceptability of the service to the entity; determining, in dependence on the first and second classifications, a risk compensation required for providing the service to the entity; and authorising the entity for the service in dependence on the first classification, comprising applying the risk compensation.

In another aspect of the present disclosure, there is provided an authorisation system as aforementioned, the authorisation system further comprising: the risk classification system as aforementioned and/or the system for authorising an entity for a service as aforementioned.

Preferably, the risk classification system is configured to determine the first and/or second classifications in relation to the entity in dependence on the data file.

Preferably, the authorisation engine of the authorisation system comprises the system for authorising an entity for a service, the system for authorising an entity for a service being configured to authorise the entity in dependence on the first and/or second classifications.

The methods disclosed herein can be implemented, at least in part, using computer program code. According to another aspect of the present disclosure, there is therefore provided computer software or computer program code adapted to carry out these methods described above when processed by a computer processing means. The computer software or computer program code can be carried by computer readable medium, and in particular a non-transitory computer readable medium, that is a medium on which computer code may be stored permanently or until it is overwritten. The medium may be a physical storage medium such as a Read Only Memory (ROM) chip. Alternatively, it may be a disk, such as a Digital Video Disk (DVD-ROM), or a non-volatile memory card, e.g. a flash drive or mini/micro Secure Digital (SD) card. It could also be a signal such as an electronic signal over wires, an optical signal or a radio signal such as over a mobile telecommunication network, a terrestrial broadcast network or via a satellite or the like. The disclosure also extends to a processor running the software or code, e.g. a computer configured to carry out the methods described above.

Furthermore, features implemented in hardware may be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly.

Any apparatus or device feature as described herein may also be provided as a method feature, 30 and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory.

It should also be appreciated that particular combinations of the various features described and defined in any aspects of the disclosure can be implemented and/orsupplied and/or used independently. The disclosure also provides a computer program and a computer program product comprising 35 software code adapted, when executed on a data processing apparatus, to perform any of the methods described herein, including any or all of their component steps.

The disclosure also provides a computer program and a computer program product comprising software code which, when executed on a data processing apparatus, comprises any of the apparatus features described herein.

The disclosure also provides a computer program and a computer program product having an operating system which supports a computer program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.

The disclosure also provides a computer readable medium having stored thereon the computer program as aforesaid.

The disclosure also provides a signal carrying the computer program as aforesaid, and a -9 -method of transmitting such a signal.

The disclosure extends to methods and/or apparatus substantially as herein described with reference to the accompanying drawings.

Each of the aspects above may comprise any one or more features mentioned in respect of the 5 other aspects above.

In this specification the word 'or' can be interpreted in the exclusive or inclusive sense unless stated otherwise.

Use of the words "apparatus", "server", "device", "processor", "communication interface", "engine" and so on are intended to be general rather than specific. Whilst these features of the disclosure may be implemented using an individual component, such as a computer or a central processing unit (CPU), they can equally well be implemented using other suitable components or a combination of components. For example, they could be implemented using a hard-wired circuit or circuits, e.g. an integrated circuit, using embedded software, and/or software engine(s) / module(s) including a function, API interface, or SDK. Further, they may be more than just a singular component. For example, a server may not only include a single hardware device but also include a system of microservices or a serverless architecture. Either of which are configured to operate in the same or similar way as the singular server is described.

As used herein, the term "pure function" preferably connotes a function that returns identical outputs for identical inputs (e.g. the outputs do not vary with random variables, input streams, or mutable reference arguments) and/or has no side effects (e.g. the function does not cause any mutation of non-local variables or local static variables). More preferably, the term "pure function" connotes a function that returns identical outputs for identical inputs and has no side effects.

As used herein, the term "data file" preferably connotes a computer file storing data for use by a computer program.

As used herein, the term "risk compensation" preferably connotes an action and/or parameter setting that may compensate (preferably reduce) the risk relating to an entity.

As used herein, references to the "probability of failure" of an entity, or similar terms, implicitly also refer to the corresponding "probability of success" of an entity. For example, if an attribute of an entity is described as being correlated with the failure of an entity, this implies also that that attribute is 30 (inversely) correlated with the success of the entity.

It should be noted that the term "comprising" as used in this document means "consisting at least in part of. So, when interpreting statements in this document that include the term "comprising", features other than that or those prefaced by the term may also be present. Related terms such as "comprise" and "comprises" are to be interpreted in the same manner. As used herein, "(s)" following a noun means the plural and/or singular forms of the noun.

The invention extends to methods and/or apparatus substantially as herein described and/or as illustrated in the accompanying drawings.

The invention extends to any novel aspects or features described and/or illustrated herein. In addition, device aspects may be applied to method aspects, and vice versa. Furthermore, any, some 40 and/or all features in one aspect can be applied to any, some and/or all features in any other aspect, in any appropriate combination.

The disclosure will now be described, by way of example only, with reference to the accompanying drawings.

-10 -

Brief Description of the Drawings

The disclosure will now be described, by way of example only, with reference to the accompanying drawings, in which: Figures 1 to 3 are block diagrams showing exemplary hardware architecture of the authorisation system; Figures 4a and 4b are block diagrams showing exemplary software architecture of the authorisation system; Figure 5 is a flow diagram showing a method of authorising an entity by the authorisation system; Figures 6a and 6b are simplified flow diagrams showing the method of Figures, where Figure 6a shows the method for a first entity for which the authorisation result is positive and Figure 6b shows the method for a second entity for which the authorisation result is negative; Figure 7 is a flow diagram of a method for authorising a plurality of entities by the authorisation system; Figure 8 is a flow diagram of a method for testing a new authorisation process; Figure 9a is a flow diagram showing a method of classifying risk; Figure 9b is a flow diagram showing a method of determining a prediction of failure of an entity; Figure 10 is a flow diagram showing a method of authorising an entity for a service; Figure 11 is a flow diagram showing a method of determining a classification for an entity based on risk relating to providing a service to the entity; and Figure 12 shows a method for determining a risk compensation.

Detailed Description of Preferred Examples

The present disclosure is described with particular reference to authorisation of software applications for access to resources such as computational resources on a user device. The applications serve as example entities to be authorised by the described authorisation system. The described system may be particularly well suited to authorising applications about which little data is initially available by iteratively determining classifications based on the incomplete data and requesting further data based on those classifications. The present system may also work well for authorising applications of various risk levels, by determining a risk compensation corresponding to the risk level posed by the application.

The present disclosure could equally well be applied to any other authorisation system. For example, it could be applied to the authorisation of identification documents (e.g. driving licences), e.g. by determining classifications relating to the documents' authenticity. Or, the present disclosure could be applied to the authorisation of entities (e.g. persons or vehicles) for entry to a premises (e.g. building or country (i.e. border control)) based on information initially provided by the entities that is subsequently augmented by the authorisation system.

Hardware architecture According to an example, authorisation of an entity is requested by a user device via a web application that runs in a web browser on the user device. In other examples, authorisation may be requested in other ways, such as via a standalone application downloaded onto the user device or 40 instead by a further web server.

Referring to Figure 1, according to an example, in a communication network 100 a user device 102 is configured to communicate with a web server 104. In particular, the user device 102 is configured to request authorisation for an entity from the web server 104 and/or to provide data relating to the entity to the web server 104. In the present example, these requests and submissions are transmitted via connections 112 established via the Internet 110. Also presented in the communication network 100 are a database server 106, and one or more remote server(s) 108, all of which are configured to establish connections 112 via the Internet 110. Alternatively, the database server 106 and the web server 104 are connected via a local connection, such as via a Local Area Network (LAN) connection.

One or more software modules are running on the web server 104 (these modules are described with reference to Figure 4), and it is these modules that determine whether or not to authorise the entity.

The database server 106 stores a range of information that may be requested by the web server 104. For example, the database server stores historical data relating to authorisation results for a plurality of entities. Further, the web server 104 transmits data to the database server 106 for storage on the database server 106. For example, the web server 104 receives (and adds to) a data file comprising data relating to an entity and transmits this data file to the database server 106 for storage.

In the present example, the remote server(s) 108 comprise one or more web servers configured to communicate with the web server 104 and to provide data relating to an entity to the web server 104. For example, the remote server(s) 108 may comprise an Application Store (i.e. AppStore) server (e.g. Apple® App Store, or Google® Play Store), and/or a Company Register Sewer (e.g. the UK Companies House server). It should be appreciated that, by being connectable to the Internet 110 via their various communication interfaces, the user device 102, web server 104, database server 106, and the remote server(s) 108 are all configured to potentially establish communication links with each other.

Referring to Figure 2, a user device 102 is a computer device which comprises a Central Processing Unit (CPU) 202, memory 204, storage 206, removable storage 208, Internet communication module 210 and a user interface 212 coupled to one another by a bus 214.

The user interface 212 comprises a display 216 and an input/output (I/O) device which in this example is a keyboard 218 and a mouse 220. In other examples, the input/output device may comprise a touchscreen, or other appropriate display. The user interface is arranged to provide indications to the user, under the control of the CPU 202, and to receive inputs from the user, and to convey these inputs to the CPU 202 via the communications bus 214.

The CPU 202 is a computer processor, e.g. a microprocessor. It is arranged to execute instructions in the form of computer executable code, including instructions stored in the memory 204, the storage 206 and/or removable storage 208. The instructions executed by the CPU 202 include instructions for coordinating operation of the other components of the user device 102, such as instructions for controlling the communication module 210 as well as other features of a user device 102 such as a user interface 212 and audio system (not shown). A browser application 250 may be installed on the user device 102, for example the browser may be an app installed on a smartphone. The browser application 250 has associated instructions that are also in the form of computer executable code, stored in the memory 204, the storage 206 and/or removable storage 208. The browser application 250 also has instructions for operating, receiving, and/or sending data via the communication module 210. The browser application 250 is a software application that can be used for exchanging (i.e. receiving and/or transmitting) data on the Internet 110. The browser application 250 could be, for example, Google® Chrome®. In an alternative example, application 250 is a standalone or bespoke application on the user device 102 (as opposed to a browser application), and authorisation of an entity is requested by the user device 102 using the standalone or bespoke application 250.

The memory 204 stores instructions and other information for use by the CPU 202. The memory 45 204 is the main memory of the user device 102. It usually comprises both Random Access Memory -12 - (RAM) and Read Only Memory (ROM). The memory 204 is arranged to store the instructions processed by the CPU 202, in the form of computer executable code. Typically, only selected elements of the computer executable code are stored by the memory 204 at any one time, which selected elements define the instructions essential to the operations of the user device 102 being carried out at the particular time. In other words, the computer executable code is stored transiently in the memory 204 whilst some particular process is handled by the CPU 202.

The storage 206 provides mass storage for the user device 102. In different implementations, the storage 206 is an integral storage device in the form of a hard disk device, a flash memory or some other similar solid-state memory device, or an array of such devices. In other implementations the storage 206 is remote from the user device 102 and comprises a network storage device or a cloud-based storage device (e.g. provided by Amazon Web Services (Awsq).

The storage 206 stores computer executable code defining the instructions processed by the CPU 202. The storage 206 stores the computer executable code permanently or semi-permanently, e.g. until overwritten. That is, the computer executable code is stored in the storage 206 non-transiently.

Typically, the computer executable code stored by the storage 206 relates to instructions fundamental to the operation of the CPU 202, Internet communication module 210, user interface 212 and other installed applications or software modules (such as the browser application 250).

The removable storage 208 provides auxiliary storage for the user device 102. In different implementations, the removable storage 208 is a storage medium for a removable storage device, such as an optical disk, for example a Digital Versatile Disk (DVD), a portable flash drive or some other similar portable solid state memory device, or an array of such devices. In other examples, the removable storage 208 is remote from the user device 102 and comprises a network storage device or a cloud-based storage device (e.g. provided by Amazon Web Services (Awsq).

The Internet communications module 210 is configured to establish the connection(s) 108 with 25 web server 104 shown in the communication network 100. The Internet communications module 210 typically comprises an Ethernet network adaptor coupling the bus 214 to an Ethernet socket. The Ethernet socket is coupled to a network.

As mentioned, in the present example, a request for authorisation of an entity (e.g. for authorisation for an action such as accessing a resource or network) is made by the user device 102 via user-based software which is implemented as a computer program product, which is stored, at different stages, in the memory 204, storage device 206, and/or removable storage 208. The storage of the computer program product is non-transitory, except when instructions included in the computer program product are being executed by the CPU 202, in which case the instructions are sometimes stored temporarily in the CPU 202, or memory 204. It should also be noted that the removable storage 208 is removable from the user device 102, such that the computer program product may be held separately from the user device 102 from time to time. Alternatively, the request for authorisation of an entity could originate from a separate computer system (not shown) on behalf of the entity. Preferably, the separate computer system and/or the communication network 100 comprises an API to facilitate interaction between the separate computer system and the communication network 100. In this way the separate computer system can transmit via the API data on which an authorisation decision is to be based, and the authorisation result is transmitted back to the separate computer system via the (or another) API.

In the present example, the user device 102 is a personal computer such as a laptop computer or desktop computer. In an alternative example, the user device 102 is a smartphone.

Referring to Figure 3, a web server (i.e. computing device) 104 comprises a Central Processing -13 -Unit (CPU) 302, memory 304, storage 306, removable storage 308, and an Internet communication module 310 coupled to one another by a bus 314. Also shown are web service(s) 316 running on the web server 104. The web service(s) 316 would typically be implemented as one or more web services with different or overlapping functions as required for authorising an entity and/or for any other required purposes (e.g. classifying risk related to the entity). Storage 306 of the web server may be local storage at the web server, or remote storage remote from the web server such as a network storage device or a cloud-based storage device (e.g. provided by Amazon Web Services (Awse)).

The modules shown in Figure 3 with the same name as those described with reference to Figure 2 function in substantially the same way. Since the web server 104 may perform various computationally 10 intensive tasks (e.g. training of machine learning models), it optionally comprises further specialised hardware.

Software architecture Referring to Figures 4a and 4b, example software architecture 400 of the system used for authorising an entity (i.e. authorisation system) is shown. Figure 4a shows various modules 402, 404, 406, 408,410 running on the web server 104, and Figure 4b shows further details of the communication engine 402 module and its interactions with the user device 102, remote server(s) 108, and database server 106.

The modules comprise: a communication engine 402, a classification engine 404, an authorisation engine 406, a processing engine 408 and a modelling engine 410. In simplified overview, the communication engine 402 generates and updates a data file comprising data relating to an entity. The classification engine 404 iteratively determines one or more classifications at least in part based on this data file (e.g. determines a prediction of failure of the entity) and transmits requests for more data to the communication engine 402 which adds this further data to the data file. Once this iterative process of building up the data file and determining classifications based on the data file is complete, the authorisation engine 406 determines whether or not to authorise the entity based on one or more of the classifications determined by the classification engine 404. It should be appreciated that this process is typically conducted for a plurality of entities, each having its corresponding data file.

In more detail, the communication engine 402 is responsible for communicating with servers external to the web server 104, in particular with the user device 102, database server 106 and remote server(s) 108. The other engines 404, 406, 408, 410 transmit requests for data from external servers to the communication engine 402 which fetches the data from these external servers and transmits it to the respective engines, thereby acting as an intermediary for communication between engines 404, 406, 408, 410 and the external servers. For example, the classification engine 404 transmits a request 420 for data (e.g. user personal information) from a user device 102 to the communication engine 402, the communication engine 402 communicates with the user device 102 (e.g. via a Ul sub-engine 442 as shown in Figure 4b) to fetch this data and transmits 422 it to the classification engine 404. Or, as a further example, the processing engine 408 communicates 426 with the communication engine 402 to query data (e.g. historical data on authorisation results) from the database server 106 via the communication engine 402.

The communication engine 402 is also responsible for handling requests for authorisation of an entity (e.g. for receiving an authorisation request from the user device 102 and transmitting the corresponding authorisation result to the user device 102), and for managing and building up a data file comprising data relating to the entity that is used as input for various classification steps in the authorisation method (described in further detail in sections below). As shown in Figure 4b, the -14 -communication engine 402 comprises: a data gathering sub-engine 440, a Ul (i.e. User Interface) sub-engine 442, a tracker 444 sub-engine, and optionally a parsing sub-engine 446.

The tracker sub-engine 444 is responsible for managing the process of building up (i.e. updating) the data file. The tracker sub-engine 444 is thus responsible for bringing together the data provided to it by other sub-engines and adding that data to the correct data file (i.e. corresponding to the appropriate entity). The tracker sub-engine 444: receives requests 420 for data from the classification engine 404; transmits a request for said data to the relevant sub-engine (e.g. the Ul sub-engine 442 if said data needs to be fetched from a user device 102, or the data gathering sub-engine 444 if the data needs to be fetched from a remote server 108); once received, adds the data to the data file; and transmits 422 the updated data file to the classification engine 404. The tracker sub-engine 444 also transmits various data to the database server 106 for storage, e.g. the data file and/or the "raw" data received from the other sub-engines. Optionally (e.g. for larger amounts of data), rather than adding the data received from a sub-engine directly to the data file, the tracker sub-engine 444 adds the data to the database server 106 and only adds a reference (e.g. pointer) to the data to the data file. This may reduce the size of the data file and improve memory efficiency and bandwidth usage (in particular, since the updated data file is often transmitted to the classification engine 404 multiple times in each authorisation process).

The Ul sub-engine 442 is responsible for communication with the user device 102, in particular for receiving a request for authorising an entity from the user device 102, and for fetching data from the user device 102. The Ul sub-engine may fetch data from the user device 102 in dependence on a user input (e.g. information entered by the user using the keyboard 218 and/or mouse 220; or a file uploaded by the user), or automatically by communicating with one or more applications on the user device 102 (e.g. to receive a current geographical location of the user device 102).

The data gathering sub-engine 440 performs similar functions to the Ul sub-engine, but with respect to the remote server(s) 108 as opposed to the user device 102. That is, the data gathering sub-engine 440 communicates with one or more remote servers 108 to fetch data as requested by the tracker sub-engine 444. In some examples, the data gathering sub-engine 440 may also receive requests for authorisation from the remote server(s) 108 (instead of, or in addition to requests from the user device 102 received by the Ul sub-engine 442) and/or transmit authorisation results to the remote server(s) 108.

The data received by the Ul sub-engine from the user device 102 (or the data received from the remote server(s) by the data gathering sub-engine 440) may not be machine-readable or require preprocessing before being added to the data file (and/or to the database server 106); in which case the data is transmitted to the parsing sub-engine 446 for further processing. For example, image data may be transmitted to the parsing sub-engine 446 which performs optical character recognition on the image data and transmits the resulting machine-readable text data to the tracker sub-engine 444.

Once the process of building up the data file is complete, the communication engine 402 transmits the data file to the database server 106 for storage. The data file comprises all data on the entity that was used to arrive at a given authorisation result. Accordingly, stored data files represent 'entity snapshots' that can be usefully queried for further use -e.g. to test a new classification model and compare the result to that obtained via the current model operating on the same data file.

The classification engine 404 is responsible for determining one or more classifications in relation to the entity based on the data file provided by the communication engine 402 (more specifically, the tracker sub-engine 444). For example, the entity may be classified based on its probability of failure 45 (e.g. mechanical failure, or failure to execute required actions (e.g. relinquish location permission)) or -15 -based on its security level. Based on a given classification determined by the classification engine 404, the engine 404 may determine that in order to determine the next classification, further data is required. In this case, the classification engine 404 requests this further data from the communication engine 404 by transmitting 420 a request to the tracker sub-engine 444 which fetches the required data and adds it 5 to the data file as described above. Once the data file is updated, the communication engine 402 transmits a notification and/or the updated data file to the classification engine 404, which can subsequently determine the next classification. This iterative (i.e. looping) process of determining classification and building up the data file is repeated until all the classifications required by the authorisation engine 406 to determine whether or not to authorise the entity are determined. At this 10 point, the classification engine 404 transmits the classification(s) to the authorisation engine 406.

One or more of the classifications determined by the classification engine 404 may require the use of computational models. These models are fetched 428 by the classification engine 404 from the modelling engine 410, which trains and stores the models.

The classification engine 404 also transmits data (e.g. classification results and/or classification 15 model parameters) to the database server 106 for storage, either directly (as shown in Figure 4b) or via the communication engine 402.

The authorisation engine 406 is responsible for determining whether or not to authorise the entity based on classifications determined by the classification engine 404. The authorisation process may require the use of further computation models, in which case these are fetched 430 from the modelling engine 410. The authorisation engine 406 may only determine the authorisation result (for use by a further authorising system), or directly authorise the entity -e.g. grant the entity access to a computational resource (e.g. to memory or location services on user device) or physical resource (e.g. to an entrance to a facility).

In an example of the present invention, in addition to authorising the entity, the authorisation engine 406 may determine and/or apply a risk compensation (action) alongside the authorisation. For example, if the entity to be authorised is a new application on the user device 102, the authorisation engine 406 may determine that the application should be allowed access to memory and/or location services on the user device 102, but apply a risk compensation in the form of automatically removing access to location services after a predetermined time period and/or encrypting sensitive files in the memory or keeping a watch on whether the application tries to access the sensitive files and removing (i.e. uninstalling) the application if it does (or implementing a firewall around the secure files). Thus, the risk compensation provides greater flexibility to the authorisation system in that entities of somewhat higher risk can still be authorised in a safe manner with the risk posed by authorising the entity being compensated for (rather than e.g. authorising only entities that match very stringent requirements and not applying any risk compensation actions).

The modelling engine 410 is responsible for training and managing computational models used by the various other engines -in particular, by the classification engine 404 and authorisation engine 406. These models are provided as needed to the various other engines.

The processing engine 408 is responsible for performing analyfics on historical (i.e. stored) data relating to past authorisation results, and for maintenance and release of new authorisation processes -in particular, for testing and debugging the processes. To test a new authorisation process (e.g. using new classification models) prior to release, the processing engine 408 queries a plurality of data files each corresponding to an entity, runs the new process on the data files, and compares the authorisation results obtained using the old and new processes. The plurality of data files may for example be the data files corresponding to entity authorisation requests received over a given time period (e.g. in the -16 -past month). Thus, the processing engine 408 reprocesses a number of historical 'entity snapshots' through both the current and upcoming versions of the authorisation process. The processing engine 408 then analyses the difference between each authorisation result. This enables understanding the potential impact of complex upcoming changes to the authorisation process, for example by assessing the authorisation results that would have been obtained had the new process been utilised, for example, a month ago.

The inner workings of the classification engine 404, authorisation engine 406, and modelling engine 410 are described in further detail in sections below.

Although Figures 4a and 4b show specific (e.g. uni-or bi-directional, or not existent) communications/connections between engines / servers / databases, it should be appreciated that these are purely exemplary. For example, the processing engine 408 may communicate directly with the classification engine 404 and/or the database server 106, rather than communicating with them via the communication engine 402 as shown in Figures 4a and 4b. Similarly, the classification engine 404 may make requests for further data directly to the user device 102 and/or remote server(s) 108 rather than going via the communication engine 402.

It should also be appreciated that each engine or sub-engine described above may in fact comprise a number of engines or sub-engines with differing or overlapping functionality. Further, the functionalifies of two or more of the engines 402, 404, 406, 408, 410 described with reference to Figures 4a and 4b may be encapsulated within a single engine, or the functionalifies may be split into yet more engines.

Likewise, it should be appreciated that any of the functions described above could be performed on the web server 104 or the database server 106, and that the specific division of functions described above is purely an example.

Authorisation Referring to Figure 5, an example method 500 of authorising an entity is shown. In the present example, the entity is an application on the user advice 102 which requests authorisation to access a resource on the user device (e.g. to one or more files on the user device 102 and/or to the camera and/or to services on the user device 102 (e.g. location tracking)). In the present example, the entity (i.e. the application requesting authorisation) is authorised in dependence on whether the authorisation method 500 determines that the application is safe or not (e.g. whether it comprises malware). The authorisation method 500 may be triggered automatically upon a user request to download the application (e.g. from an AppStore application installed on the user device 102). Alternatively, the user may transmit a request using the I/O device on the user device 102-e.g. a user may wish to check that an application is secure before downloading it from a previously unknown website. In a further alternative example, the request for authorisation is generated/triggered by or from within the application installed on the user device. For example, activation of a feature or a request for activation of a feature within the application (e.g. the use of a new, or previously unused feature, or a request to use a new, or previously unused, feature of the application, such as sports tracking) triggers the request for authorisation.

As the first step of method 500, a request for authorisation of the entity is received 502, along with input data relating to the entity. In the present example, the request and input data are received by the communication engine 402 from the user device 102. The input data may be entered by a user on the input device 102 and/or transmitted automatically by the user device 102 without a user's action. In the present example, where the entity to be authorised is an application on the user device, the input data may e.g. comprise data about the application entered by the user (e.g. intended purpose of the -17 -application (e.g. fitness tracking or listening to music)) and/or data about the application and/or user device transmitted by the user device (e.g. data relating to the device's operating system, and/or the web location from which the application is to be downloaded (e.g. AppStore or a given web address)). Next, the communication engine 402 generates 504 a data file comprising the received data 5 relating to the entity. For example, the communication engine 402 may parse the received data into a dictionary data structure and convert it to a JSON file (the JSON file thereby forming the data file). Using a dictionary data structure may allow efficient look-up of data in the data file, and the use of a JSON file may enable efficient execution and parsing. In the present example, the data file is a text file. Alternatively, the data file may be a binary file.

In a preferred example, the data file comprises the received data. Alternatively, the received data may be stored (e.g. on the database server 106, which is preferably a data lake) and the data file may only comprise references to the stored data such that the stored data can be efficiently fetched but need not be stored in memory of and/or passed between the communication engine 404 and classification engine 402.

Once the data file is generated, it is provided to the classification engine 404. The classification engine subsequently determines 506 a classification based on the data file. The classifications 1, 2, ..., n determined by the classification engine 404 generally relate to risk related to the entity. In the present example, where the entity to be authorised is an application on the user device, the classifications may for example relate to the security risk posed by the application to the user device.

The classifications are determined using one or more pure functions -i.e. functions that return identical outputs for identical inputs and/or have no side effects. Pure functions also preferably do not vary over time. To further illustrate the concept of pure functions, a simple example of a pure function is fl(a, b)= c = a + b; return c; ) which returns a sum of two input numbers. Function f1 is pure because each time it is run with the same inputs (and no matter how many times it is run), it will always provide the same answer (i.e. return an identical output). No external variables are used in function fl that are not passed in as inputs, function fi does not use any random processes, and has no side effects. In contrast, a simple example of a function that is not pure is f2 (year-of-birth) = (age = year-now-year-ofbirth; return age) which tells you how old you are. Function f2 is not pure because it will give a different answer (i.e. return different outputs) depending on the time at which it is run.

Determining the classifications using one or more pure functions may provide a number of advantages as described in further detail below.

In order to determine the classification, the classification engine 404 may determine one or more sub-classifications. For example, the classification engine 404 may determine the overall classification using the sub-classifications as inputs, and e.g. determine the overall classification based on logic operations on Boolean (i.e. pass / fail) sub-classifications (e.g. A AND B AND C...). These subclassifications / rules may be deterministic (e.g. logic-based) or probabilistic (e.g. machine learning based). For example, a sub-classification may be determined based on a machine learning classification model, a logic condition, or a machine learning regression model and a logic condition (e.g. on the basis of whether the value predicted using the regression model is greater than or less than a threshold value).

Further, various types of sub-classifications may be combined in determining the overall classification -e.g. sub-classification i being a logic Boolean check (e.g. whether or not the company providing the application is registered in the Apple@ App Store; and/or whether the operating system (and/or antimalware software) on the user device 102 is up-to-date -yes/no), and sub-classification i+1 (i.e. a further sub-classification) being based on machine learning regression (e.g. the probability that the application comprises malware). A further example sub-classification may be a prediction of whether an -18 -application is fraudulent based on natural language processing of the description of the application provided on the provider website / AppStore. It should be appreciated that sub-classifications may comprise further sub-components (e.g. rules as shown in Table 1).

Next, the classification 506 output is used to determine 508 whether (and what) further data is 5 required to reach an authorisation decision. For example, further data may be required because some of the input data required for one or more of the sub-classifications is not currently available. Or, for example, if a classification model used for the sub-classification and/or classification has a large confidence interval and the value predicted by the model is near the threshold for rejecting/approving the authorisation request, then further data may be required to reduce this confidence interval to an 10 acceptable range.

In a preferred example, the classification output 506 itself comprises an indication of whether any further data is required to reach an authorisation decision -i.e. the classification comprises one or more requests for further data. This may enable improved traceability to understand at which point and why given data was requested to reach an authorisation decision -in other words, which classification triggered the request for the given data. This in turn may allow improved analysis and debugging of the authorisation method 500. Further, since the communication engine 402 preferably receives the classification independently of requesting further data (to store the classification on the database server 106), embedding the requests in the classification itself may reduce network bandwidth usage in that only the classification needs to be transmitted to the communication engine 402 (as opposed to transmitting the classification separate to one or more requests for further data).

In the present example, the classification output is a JSON output, an example of which is shown in Table 1 below. The output represents the structure of the classification comprising various subclassifications. Further, the output comprises one or more requests for further data associated with the result of the classification. In a preferred example, the full (i.e. complete) JSON classification output is transmitted to the database server 106 for storage. Partial outputs are then transmitted to the communication engine 102 (e.g. a subset of the output relating to requests for further data) and to the authorisation engine 106 (e.g. a subset of the output relating to the lop-level' classification result (e.g. status of pass or fail)).

If it at step 508 it is determined that further data is required, at step 510 the request for further data (preferably, the classification itself comprising the request) is transmitted to the communication engine 402 which requests the data from the user device 102 and/or remote server(s) 108 as described above. For example, in the present example, further data relating to the provider of the application may be fetched from remote server(s) 108 corresponding to a company register and/or a website security checker (e.g. Google® Transparency Report). This further data can be used to augment the classification 506 in the next iteration and allows a more accurate authorisation decision to be made.

If the further data is requested from multiple servers (e.g. a user device 102 and a plurality of remote servers 108), the requests are preferably made in parallel for efficiency. The communication engine 402 monitors the progress of each request via one or more handlers implemented as part of the tracker sub-engine 444.

Next, once the further data is received and processed by the communication engine 404, the tracker sub-engine 444 adds 512 the further data to the data file. Optionally, in particular if the further data is large On memory terms), the communication engine 404 adds only a reference (e.g. a pointer) to the further data (or elements thereof) to the data file, and transmits the further data to the database server 106 for storage. The further data is then preferably cached on the database server 106 to allow efficient access to it by the classification engine 404.

-19 -The updated data file is subsequently transmitted to the classification engine 404, which determines 506 a further classification based on the data file (which now comprises additional data and so may enable the determination of a more accurate and/or reliable classification). Preferably, the classification engine 404 receives a notification that the data file has been updated -e.g. the classification engine 404 subscribes to a feed comprising the data file and monitors the feed for updates to the data file.

Steps 506 to 512 are then repeated until, at step 508, the classification engine determines 508 that no further data is needed to make an authorisation decision. VVhen that occurs, the one or more of the classifications determined at step(s) 506 (i.e. over one or more iterations of step 506) are transmitted 10 to the authorisation engine 406 and the method 500 moves to step 514.

At step 514, the authorisation engine 406 determines whether or not to authorise the entity (i.e. determines the authorisation result / decision) based on one or more of the classifications determined in the iterative process 506 to 512. Preferably, the authorisation is determined at least in part based on the final (i.e. last) classification (e.g. n''' classification in Figure 5). The final classification is determined based on the most-complete data file (built up based on preceding classifications) and so this may provide an improved authorisation result. In the present example, the final classification determines the authorisation result -i.e. the output of the final classification is the authorisation result. In an alternative example, the authorisation result is determined based on a plurality of classifications -e.g. by determining a "2" order" classification using the classifications determined at steps 506 as inputs. For example, for the authorisation result to be positive (i.e. a pass'), all classifications determined at steps 506 must also have a predetermined output and/or an output in a predetermined range.

If the authorisation engine 406 decides not to authorise the entity -i.e. if the authorisation result is negative (e.g. if the application (i.e. entity) presents a security risk), the authorisation request is rejected 516. A corresponding notification is sent to the user device 102 and optionally to a remote 25 server 108 (e.g. to notify a third party that the application (i.e. entity) is unsafe).

If the authorisation engine 406 decides to authorise the entity -i.e. if the authorisation result is positive, the authorisation request is accepted 518. The authorisation engine 406 subsequently executes one or more actions corresponding to the authorisation -e.g. in the present example, grants the application (i.e. entity) access to the resource on the user device. A corresponding notification is also sent to the user device 102.

In a preferred example, communication between the communication engine 402 and classification engine 404 in steps 504 to 512 of method 500 is performed by each engine placing (i.e. publishing) data on a (data) feed and the other engine subscribing to that feed (e.g. monitoring it for updates). In other words, the engines 402, 404 communicate by publishing and/or subscribing to one or more data feeds.

At step 506, once the classification engine 404 has determined a classification, the classification engine 404: transmits a full classification result in the form of a JSON file to the database server 106 for storage (so that it is persisted in the database server 106 and may subsequently be analysed); and transmits a partial result onto a classification feed monitored by the communication engine 402. The partial result comprises data relating to the further data needed to determine the next classification. The sub-engines of the communication engine 402 then consume (i.e. subscribe to) this feed and fetch the corresponding further data -i.e. the sub-engines act as consumers of the feed.

Table 1 shows an example structure of the full result JSON encoded in Apache AvroTM. This example structure comprises: * A 'top-level' classification output -e.g. a status of one of: pass, fail, pending (pending further -20 -data -i.e. further classification iteration is required), or error * A vector of sub-classifications Each sub-classification having an output -e.g. a status of one of: pass, fail, pending, or error a And each sub-classification having a vector of 'rules' On this example, sub-classifications are determined using a rules sub-engine) * E.g. a rule may comprise checking whether the operating system (and/or antimalware software) on the user device 102 is up-to-date * Each rule has a status of either 'passed:false' or 'passed:true' * A vector of actions (e.g. requests for further data), each action having one or more attributes classification_id: "123" entity id: "456" status: pending sub-classifications [ type: "sub-classification type A" status: pass rules: [ type: Yule type A" passed: true type: "rule type B" passed: true type: "sub-classification type B" status: pending ) action [ type: "action type A" attributes: f ) -21 -type: "action type B" attributes: {

Table 1

In this example, the action vector is placed on the feed. The communication engine 402 sub-engines subscribe to this feed and this is how actions are received by various sub-engines that 5 subsequently fulfil each action.

As shown in Table 1, each action has attributes. These attributes comprise: a type attribute (e.g. "get-developer-company-info" or "get-requested-permissions") as well as any further attributes relevant to the action type (e.g. an action of type "get-developer-company-info" requires an attribute that identifies (e.g. via a unique ID, such as company number) the application developer company in order to be fulfilled).

For each action type, there's a consumer/subscriber (i.e. communication engine 402 sub-engine) that knows how to fulfil that action. The consumers subscribe to the classification feed, and once a consumer/sub-engine detects/receives an action, it fulfils it (e.g. fetches and/or processes further data), and then transmits the further data to the tracker sub-engine 444 that adds the further data to the relevant data file (as per step 512 in method 500).

For example, the data-gathering sub-engine 440 subscribes to (i.e. is a consumer of actions relating to request for further data from remote server(s) 108. Or, for example, the Ul sub-engine 442 subscribes to the classification feed and monitors for an action of type "get-user-personal-info". Once the Ul sub-engine 442 receives the action, it prompts the Ul on the user device 102 to request the corresponding information from the user (e.g. to display a form). When this information is received from the user device 102, the Ul sub-engine transmits it to the tracker sub-engine 444, which adds the newly acquired information to the data file.

In this example, the 'classification' feed is used by the classification engine 404 to make requests for further data. Alternatively, or in addition, a 'data file' feed may be used to notify the classification engine 404 when the data file is updated. For example, the data file may be published onto the feed. The classification engine 404 then subscribes to that feed, and whenever a change is made (i.e. the data file is updated), a new classification step 506 is triggered.

The iterative steps 506-512 of method 500 follow the broad pattern of: 1) the classification engine 404 determining 506 a classification and requesting further data (e.g. by transmitting the request to the feed as per this example); 2) the relevant communication engine 402 sub-engine fulfilling the request/action (e.g. by subscribing to the feed); 3) the tracker sub-engine 444 adding the further data 512 to the data fie; 4) the classification engine 404 determining a further classification based on the updated data file and requesting yet further data; etc. Each sub-engine of the communication engine 402 communicates with the classification engine 404 in the same way (e.g. in this example, they all subscribe to the same classification feed and adhere to the protocol -receive an action, fulfil it, and report to tracker sub-engine 444)). This allows the sub-engines to be developed independently of one another and optimised for specific tasks (e.g. to fulfil specific actions), and a microservice architecture to be implemented for the communication engine 402 -22 - (each sub-engine acting as an independent microservice).

Referring to Figures 6a and 6b, simplified flow charts of example methods 600, 650 implementing authorisation method 500 are shown. Flow charts 600, 650 show the steps taken in method 500 over time (shown in the horizontal axis). Figure 6a shows an example authorisation method 600 for a first entity, for which the authorisation result is positive; and Figure 6b shows an example authorisation method 650 for a second entity, for which the authorisation result is negative. In the examples shown in Figures 6a and 6b, the classification step 506 is implemented using a rules sub-engine that is part of the classification engine 404, e.g. using a forward-chaining inference engine that evaluates rules against the data file, e.g. based on the Rete algorithm. Using a rules sub-engine may allow the implementation of a scalable and efficient classification system, particularly if a significant amount of conditional logic is performed. The rules represent sub-classifications (alternatively, multiple rules may be grouped into each sub-classification as shown in Table 1). In the examples shown in Figures 6a and 6b, an entity is authorised if all rules required to obtain a classification are satisfied (i.e. in the simplified 'tree' representation of Figures 6a and 6b -if a leaf node is reached and the leaf node and intermediate nodes are all activated). In Figures 6a (and 6b), activated (i.e. passed) rules are marked as 'A', failed rules as 'F', and rules for which further data is needed to determine whether they are passed are marked as '?'.

In method 600, firstly, an input data file 602-A is received 502 by the classification engine 404. The classification engine 404 finds all rules with input conditions that are satisfied (i.e. 'activated' rules) and 'fires' them off and implements their corresponding actions, thereby determining the first classification 604-A. However, not all rules are activated or 'failed' (not met). For example, rule 620 is neither activated nor failed because the required data is missing from the data file.

Since rules marked as '?' are present, the classification engine 404 transmits a request for further data to the communication engine 402, which fetches that data and returns an updated data file 602-B. Based on the updated data file 602-B, rule 620 is now activated and the classification process progresses to further rules (shown as nodes in the trees of Figures 6a and 6b). However, the current data file comprises insufficient data to activate (or fail) rule 622. Therefore, the process is repeated and a further request for more data is transmitted to the communication engine 402 which returns a further updated data file 602-C.

Based on the further updated data file 602-C, all rules required to determine the classification (including rules 622 and 624) are activated and the classification result is retuned. In Figure 6a, since all required rules (i.e. the leaf node/rule and preceding, intermediate nodes) are activated (and passed/satisfied), the authorisation result is positive and the authorisation request from the first entity is accepted/approved 518.

Method 650 shown in Figure 6b is identical method 600 described with reference to Figure 6a, except where explained below, and corresponding reference numerals are used to refer to similar features. The second entity has other attributes than the first entity (the authorisation of which is described with reference to Figure 6a) so the path down the classification tree is different in each of methods 600, 650. For example, the first entity may be a social media application that requests access to photos on the user device 102, and the second entity may be an anti-malware application that requests more extensive access to the user device 102 and so needs to pass more thorough checks (e.g. implemented via the rules sub-engine) to be authorised for download to the user device 102. Accordingly, in method 650, rule 622 need not be evaluated to determine the classification but instead rule 626 is. Since the data required to evaluate rule 626 is not in the data file, the classification engine 404 requests further data to be added to the data file. The communication engine 402 fetches this further -23 -data, adds it to the data file, and transmits the updated data file 602-C to the classification engine. Based on the updated data file 602-C, rule 626 is now activated and the classification process progresses to rule 628. However, rule 628 is failed (i.e. not passed), so the authorisation result is negative and the authorisation request from the second entity is rejected 516.

The iterative process of determining a classification based on a data file and building up the data file described with reference to Figures 5, 6a, and 6b has a number of associated advantages. It may enable the implementation of an adaptive authorisation system that can operate on an initially incomplete dataset and fetches only the necessary further data. At the start, when only the request to authorise an entity is received 502, it is unclear what exact data will be required to authorise the entity.

In conventional authorisation systems, this problem would be resolved by fetching all data that may ever be needed before the authorisation process begins to ensure that all needed data is available, thereby inefficiently fetching significant amounts of data that may be unnecessary for the authorisation process. Fetching this data may be both computationally costly On terms of memory and power usage) and monetarily costly particularly if data is fetched from external, paid, servers.

In contrast, authorisation method 500 iteratively determines classifications based on the currently available data (i.e. the data file), and only fetches further data as required by the next classification step(s) 506. Thus, by interleaving the classification and data gathering steps, data is only fetched as needed (thereby ensuring that the minimum required amount of data is fetched) and the, typically slower (and so speed-determining) classification process can begin right away based on an initially incomplete data file, thereby increasing the speed of the overall authorisation process. Further, as shown in Figures 6a and 6b, different classification steps (e.g. different path along trees shown in Figures 6a and 6b) may be required to authorise an entity in dependence on the attributes of the entity (as stored in the data file). Thus, building up the data file in the iterative manner of method 500 ensures that only the data required for the classification 'path' for the given entity is fetched.

Moreover, important advantages result from the use of pure functions to determine the classification results in step 506. Pure functions do not rely on information or variables that are subject to change. This gives rise to a number of advantages including: (a) improved traceability to understand why certain authorisation decisions were made, (b) easier de-bugging and testing, (c) the ability more easily to 're-run' previous authorisation decisions using new classification models and thus test the new models, and (d) the ability to cache certain results of calculations rather than calling external modules/engines to execute the calculations. Furthermore, pure functions are independent from one another and so may be evaluated efficiently in parallel via the use of parallel processing (e.g. across a plurality of CPUs).

As a result of each of the classifications 1, 2, ..., n being determined using one or more pure functions, the authorisation result (i.e. decision) is stateless (i.e. has no previous data to address). Accordingly, the authorisation decision is in itself a pure function (since it is comprised of pure functions) of the current data file (current in the sense that the authorisation decision does not know, and is not affected by, previous authorisation decisions). Moreover, as a result of the use of time-invariant pure functions, the authorisation system is time-invariant (i.e. it is only time-dependent insofar as its inputs are). This is particularly useful for effectively batch testing historical decisions. Further, it makes the decision repeatable/predictable based on the inputs and thus much easier to trace (as per advantage (a)) -thus enabling easier discovery of faults in the authorisation process and easier development/improvement of the authorisation process.

Method 500 effectively provides two outputs -an authorisation result for an entity, and a data 45 file containing all the information required to determine the authorisation result for the entity. Storing this -24 -data file enables many advantages, including (a)-(d) above. The data file generated for a given entity can be re-used multiple times for de-bugging and testing (as per advantage (b)) or to test new models (as per advantage (c)), and, since the classification step 506 uses pure functions, the authorisation results are repeatable and the current and/or new model can be reliably tested.

Referring to Figure 7, an example flow diagram of a method 700 for authorising a plurality of entities is shown. Referring to Figure 8, an example flow diagram of a method 800 for testing a new authorisation process is shown. Figures 7 and 8 further demonstrate the described advantages of using pure functions.

In method 700, the communication engine 402 ingests data files 602 (xi, x2.....xi) corresponding to entities 1, 2, ..., n each instant they arrive on the queue, generating new classifications 604 for each entity. These classifications 604 are subsequently transmitted to the authorisation engine 406 which determines the corresponding authorisation results 606-1, 606-2, ..., 606-n based on given classifications models (e.g. by version 1 of the models as denoted by the "1" in fi(x) and y..,1 in Figure 7). The determined authorisation decision is derived purely from the data contained in the data file. The pure functions used for the one or more classifications e.g. do not depend on the current time and do not use random processes. Thus, any historical data file for a given entity can be fetched from the database sewer 106 and method 500 will return the same authorisation result at any point in time. Once built up via method 500, the data files are stored, as are parameters used in the classification 506 and authorisation 514 steps (e.g. in the database server 106 or on a version control system such as Git), which enables easier testing and debugging of the authorisation process 500.

Method 800, as shown in Figure 8, relates to a process for testing an authorisation process k+1, by comparing it to another (e.g. a previous and/or current) version of authorisation process k (e.g. that uses different classification models than process k and/or different (i.e. alternative) pure functions). The authorisation results 606-1, ... 606-n obtained via each version k and k+1 based on the same data file 602 are compared and the differences (diffi, diffn) analysed to evaluate each respective version (and e.g. determine which performs better, e.g. with regards to recall of applications containing malware). For example, whenever a change is made to the rules sub-engine within the classification engine 404, a plurality of historical data files can be re-ingested through both the current and upcoming versions of the authorisation process. The difference between each decision can then be analysed to understand the potential impact of complex upcoming changes to the rules sub-engine, by analysing the decisions made by the new process k+1 (which uses alternative pure functions to determine the classifications) and comparing them to the decisions made by the existing process k.

The use of pure functions may also enable caching certain results of calculations rather than calling external modules/engines to execute the calculations. In particular, for a given entity / data file, the inputs and outputs of the one or more pure functions used to determine the classification(s) may be cached. Then, for subsequent entities / data files, if the inputs to a given pure function are the same as for the previous entity, then the output as computed for the previous entity is retrieved from the cache and output directly rather than the (often computationally costly) computation of the pure function for the subsequent entity being repeated. This may significantly reduce the computational cost of determining classifications for a plurality of entities as pure function results determinised for preceding entities can be re-used for following entities with the same pure function inputs, at relatively little cost in fast access memory usage. It should be appreciated that some or all of the pure function inputs and outputs may be cached -e.g. only the inputs and outputs for the most commonly used (e.g. based on a counter over the previous month of use of the authorisation process 500) pure functions may be cached. Likewise, it should be appreciated that a multi-level cache architecture may be used with different subsets of pure -25 -function inputs and outputs being stored in each cache level. Risk classification Referring to Figure 9a, an example method 900 of classifying risk is shown. Method 900 can be implemented using the authorisation system 400 described with reference to Figures 1 to 8, which can 5 thus also act as a risk classification system. The risk classification system comprises a risk classification engine that is responsible for communication with the user device 102 and/or remote server(s) 108 and classifying the risk relating to an entity by determining a prediction of failure of the entity. Alternatively, the risk classification engine determines a prediction of success of the entity; however, the system will be described with reference to predicting a failure of the entity. In the present example, the risk 10 classification engine comprises the communication engine 402 and the classification engine 404, the communication engine 402 being responsible for communication with external entities and the classification engine 404 being responsible for operating on received data to determine the prediction of failure. In the present example, the entity is an application on the user advice 102 and the prediction of failure relates to a prediction that the application comprises malware.

Method 900 comprises two sets of steps 910, 930. Modelling steps 910 are performed for a plurality of entities by the modelling engine 410, whereas classification steps 930 are performed for each given entity by the risk classification engine.

Referring to the modelling steps 910, first, the modelling engine 410 queries 912 stored (i.e. historical) data relating to a plurality of entities from the database server 106. Based on this queried data, the modelling engine 410 classifies 914 each entity into one or more groupings in dependence on attributes of the entities. For example, the modelling engine 410 may classify the entities into groupings using a clustering algorithm such as k-means clustering or mean shift clustering. Each entity may be classified into multiple groupings based on its different attributes -e.g. an application that has an age attribute (that is, a measure of how long ago the app was developed and first released) above a first threshold, and a memory size attribute below a second threshold may be classified into 'old app' and 'small app' groupings. Further, the modelling engine 410 outputs and stores a model for classifying further entities into one or more of the groupings based on their attributes.

Next, at step 916, for each grouping, based on queried data relating to failure of entities in a given grouping, the modelling engine 410 identifies indicator(s) correlated with failure of entities in that grouping (i.e. entity attributes indicative of failure of the entity). The modelling engine 410 preferably identifies indicators both positively and negatively correlated failure of entities. An attribute is classed as an indicator if the measured absolute value of its correlation (e.g. Pearson correlation, Kendall rank correlation, Spearman correlation, and/or Point-Biserial correlation) with failure of entities in the given grouping is above a threshold x. For example, in the 'old app' grouping, the modelling engine 410 may determine that the 'latest update date' attribute is strongly negatively correlated with failure of the entities and so identifies this attribute as an indicator.

Optionally, in parallel to step 916, at step 918, the modelling engine 410 identifies further indicator(s) not (or weakly) correlated with failure of entities in each grouping. For example, within each grouping, the modelling engine 410 identifies entity attributes for which the measured absolute value correlation is below a threshold y (y being much smaller than x, such that most attributes are not identified as indicators at either of steps 916, 918). These attributes not identified as indicators therefore represent data that is irrelevant to determining the prediction of failure of the entity. The use of this data is described in further detail with reference to Figure 9b.

-26 -The groupings and indicators determined at steps 914, 916, and 918 are updated periodically as the historic data evolves thereby improving the quality of authorisation decisions made over time. Turning to the classification steps 930, first, at step 932, the risk classification engine receives a request (e.g. from a user device 102 or the authorisation engine 406 on the web server 104) to classify 5 risk relating an entity. In response to this request, the risk classification engine transmits 934 a request to a user device 102 (or to another data store or data provider) for data associated with attribute(s) of the entity (i.e. for attribute data). Based on the attribute data received at step 934 and the model output by the modelling engine 410 at step 914, the risk classification engine determines 936 one or more groupings for the entity.

Next, at step 938, the risk classification engine queries the modelling engine 410 for the indicator(s) (as determined at step 916) correlated with failure of entities in the one or more groupings for the entity determined at step 936. The risk classification engine then transmits a request to the user device 102 (or to another data store or data provider) for data associated with these indicator(s) (i.e. for indicator data).

Finally, at step 940, the risk classification engine determines a prediction of failure of the entity at least partly in dependence on the data associated with the at least one indicator for the determined groupings. The prediction of failure is determined using a statistical model generated and trained by the modelling engine 410. In dependence on the determined prediction of failure, the risk classification engine then classifies the entity into one of a plurality of predetermined (risk) bands.

The 'two-step' collection of data from the user device via steps 934 and 938 in method 900 may allow improving the accuracy of the prediction of failure of the entity -since this is computed based on indicators determined to be correlated to failure for similar previous entities (i.e. entities in the same grouping); while reducing the amount of data collected from the user device 102 -since only data needed to classify the entity into groupings and compute the prediction of failure for those specific groupings is collected, but no more. Reducing the amount of collected data may reduce network bandwidth usage and the computational costs of determining the prediction of failure (since computational resources are not wasted in operating on data not correlated with failure). Further, if fewer data are entered by a user of the user device 102, it may improve the user-journey for the user.

Referring to Figure 9b, an example method 950 of determining a prediction of failure of an entity 30 is shown. Method 950 corresponds to an example implementation of step 940 in method 900.

First, the risk classification engine receives 952 data associated with the indicator(s) for the determined groupings for the entity (i.e. indicator data), as collected by the risk classification engine at step 938 of method 900.

Next, the risk classification engine determines 954 a first prediction of failure of the entity at least partly in dependence on the indicator data. In method 900, this first prediction may be output as the prediction of failure at step 940. In turn, method 950 uses the first prediction as an initial filtering stage and determines a second prediction that may provide a more accurate prediction of failure of the entity as detailed below, Subsequently, at step 956, the risk classification engine determines whether or not the first prediction (i.e. the predicted probability of failure P(F)) is below a predetermined threshold. If it is not (i.e. if P(F) is high), no further prediction is determined and the first prediction is output as the (overall) prediction of failure. Further, if the entity is authorised based on the prediction of failure, then if the first prediction is above the threshold, the entity's authorisation request is rejected without any further processing. Thus, step 956 may allow reducing the computational cost of determining the prediction of -27 -failure (and/or authorisation) by terminating the method early for entities that have a high probability of failure and are unlikely to be authorised.

If P(F) is below the threshold, then, at step 958, the risk classification engine transmits, in dependence on the groupings determined at step 936, a request for further data relating to the entity to the user device 102 and/or to remote server(s) 108. For example, for an application in the 'young app' grouping, the risk classification engine may request a report on the developer of the application from a remote server. By collecting data based on the groupings, computational and memory use may be reduced as unnecessary Optionally, data associated with the indicator(s) identified to be not, or weakly, correlated with failure of entities in the given grouping is removed from the further data set requested at step 958. This removal of data reduces the computational and memory resources used to determine a prediction of failure of the entity as it allows eliminating irrelevant data such that it is not collected or subsequently processed by the risk classification engine. Thus, the data inputs for the second classification may be optimised in that the classification is based on information known to be correlated with failure of entities, and not on information known not to be correlated with failure of entities. This is particularly important when the further data needs to be processed (e.g. parsed) to extract attributes of the entity -e.g. when the further data is received as an image scan of a document. Processing such data to extract entity attributes can be computationally intensive so removing this data from the inputs into the computation (i.e. not collecting or processing it when it is determined not to be a relevant indicator of the likelihood of failure) provides significant computational cost savings.

Finally, at step 960, the risk classification engine determines a second prediction at least partly in dependence on: the indicator data collected at step 938, and the further data relating to the entity collected at step 958. The risk classification engine then outputs this second prediction as the overall prediction of failure of the entity. The second prediction provides a more accurate prediction of failure of the entity because it is based on a larger data set.

It should be appreciated that, at steps 934 and 938, the risk classification engine may transmit request(s) to the web server 104 and/or the remote server(s) 108, in addition or instead of the requests to the user device 102, in order to obtain attribute data and indicator data. This may provide an easier user-journey for the user of the user device 102 because user action is not required to obtain the data.

Risk compensation Referring to Figure 10, an example method 1000 of authorising an entity for a service is shown. In addition to authorising the entity, method 1000 comprises determining and applying a risk compensation action alongside the authorisation. This provides greater flexibility to the authorisation system 400 and enables it to implement entity-specific authorisation measures. For example, entities of somewhat higher risk may nonetheless be safely authorised with the risk posed by authorising the entity being compensated for (rather than e.g. authorising only entities that match very stringent requirements and not applying any risk compensation actions). In the present example, the entity is an application on the user advice 102, and the service is a computing resource on the user device 102 (e.g. a CPU or memory).

First, at step 1002, the authorisation engine 406 receives data relating to the entity to be authorised. For example, this data may be received in the form of the data file, the process of building up of which is described with reference to Figure 5. The authorisation engine 406 also receives data relating to the service.

-28 -Next, at step 1004, the authorisation engine 406 determines, in dependence on a first subset of the data received at step 1002, a first classification for the entity based on risk relating to providing the service to the entity. In the present example, the authorisation engine 406 determines the risk (e.g. probability) that the application that has requested access to a service on the user device 102 is malicious and/or would result in the user device 102 not functioning properly (e.g. if the application is excessively computationally-intensive).

The first classification is determined using one or more statistical models and/or using one or more machine learning models (e.g. artificial neural networks). Preferably, the first classification is a risk classification as described above with reference to Figures 9a and 9b. In a preferred example, step 1004 comprises classifying the entity into a risk band (i.e. class), the risk band being the output of the first classification step. Further details of how the first classification is determined are described with reference to Figure 11. For each risk band there is a range of associated risk compensation actions, the risk compensation actions mitigate the risk posed by the entity as is discussed below.

In parallel, at step 1006, the authorisation engine 406 determines, in dependence on a second subset of the data received at step 1002, a second classification for the entity based on a prediction of the sensitivity of the entity to potential risk compensation actions. The second classification relates to the probability that the entity (i.e. application) will be impeded from properly using the service depending on the level of the risk compensation. Some entities may be more sensitive than others to risk compensation actions, and might for example be impeded from properly using the service if a high level of risk compensation is implemented. In other words, the second classification provides a prediction of the acceptability of the service to the entity.

In a preferred example, step 1006 comprises classifying the entity into an 'sensitivity' band (i.e. class), the sensitivity band being the output of the second classification step.

The first and second subsets of data used in 1004 and 1006 are identified by the modelling engine 410 based on statistical analysis of historic data related to risk relating to providing services and acceptability of services to a plurality of entities respectively. The first and second subsets may be distinct or partially overlap. Various statistical models may be used for this, such as multivariate regression. For example, the modelling engine 410 is configured to identify which types of data relating to an entity are typically correlated with the risk posed by the entity and the sensitivity of the entity to risk compensation actions; these identified data types are thus comprised in the input of the method 1000.

Next, at step 1008, the authorisation engine 406 determines, in dependence on the first and second classifications, a risk compensation (e.g. a risk compensation action) required for providing the service to the entity. The risk compensation allows balancing the risk relating to providing the service to the entity (assessed via the first classification) with the sensitivity of the entity to risk compensation actions (assessed via the second classification) to ensure that the service is acceptable to the entity while being safe to the service provider. As explained above, there are a range of risk compensation actions associated with each risk band, therefore the level of risk compensation is determined within a risk band based on the sensitivity of the entity to the risk compensation action. For example, if the entity is very sensitive to risk compensation actions, the risk compensation determined by the authorisation engine 406 will be at the lower end of the range, whereas if the entity is not sensitive to risk compensation actions the risk compensation determined by the authorisation engine 406 will be at the upper end of the range.

Taking one example, if an application running on the user device 102 (i.e. the entity) requests 45 access to the memory resources on the user device 102 (i.e. the service), data relating to the application -29 -Is requested (i.e. step 1002 of method 1000). Based on this data the application might be classified as high risk at step 1004 (for example because the data indicates that the application is 'young', that is, it was only recently developed and released, or because the credentials of the application do not match information retrieved from a company register). As the risk relates to accessing memory on the user device 102, a suitable range of risk compensation actions might be to encrypt a range of data files in the memory to protect the sensitive information. However, if the application requires a high level of access to the memory resource in order to function properly (e.g. if the application is an antivirus application), then the application would be classified at step 1006 as having a high sensitivity to the risk compensation action. On the basis of these classifications, the risk compensation action determined at step 1008 would be at the lower end of the range of risk compensation actions associated with the risk band. That is, to encrypt files on the user device 102 but to limit the encryption only to certain types of files, or to only files in certain directories on the user device, in order to protect the user device (by encrypting the most sensitive information) while still allowing the application the requisite access to function properly.

Finally, at step 1010, the authorisation engine 406 authorises the entity for the service in dependence on the first classification, and applies the risk compensation determined at step 1008. It should be appreciated that steps 1004 and 1006 may be executed in parallel (as shown in Figure 10) or sequentially. If executed sequentially, the second classification, determined at step 1006, may optionally be determined at least partly in dependence on the first classification determined step 20 1004. For example, separate 'acceptability' (or 'sensitivity') bands may be determined within each 'risk' band.

Referring to Figure 11, an example method 1100 for determining a classification for an entity based on risk relating to providing a service to the entity is shown. Method 1100 corresponds to an example implementation of step 1004 in method 1000 described with reference to Figure 10.

First, at step 1102, the authorisation engine 406 receives data (d) relating to the entity (i.e. the first subset of data in method 1000). For example, the authorisation engine 406 receives, from the communication engine 402, the data file corresponding to the entity.

Next, at step 1104, the probability of failure (PF) of the entity is determined based on the data received at step 1102. The probability of failure is determined using one or more statistical models maintained and trained by the modelling engine 410. For example, the statistical models may comprise artificial neural networks trained on historic stored data related to failure of a plurality of entities. The authorisation engine 406 transmits the data received at step 1102 to the modelling engine 410 which returns the probability of failure of the entity (e.g. via a HTTP call).

Next, at step 1106, the authorisation engine 406 determines an adjusted probability of failure (PF') of the entity based on the probability of failure (PF) determined at step 1104 and one or more attributes of the entity. The adjustment of the probability of failure is a scaling of the probability of failure based on certain attributes of the entities. The attributes used to determine PF' are determined by the modelling engine 410 based on historic data related to failure of a plurality of entities. In the example where the entity is a software application being downloaded to a user device, these attributes are, for example, the source of the download of the software; if the source of the application download is an official app store the probability of failure is scaled down (since an app downloaded from an official app store is unlikely to contain malware) whereas if the source of the application download is a website running in a web browser the probability of failure is scaled up. In turn, step 1106 allows adjusting (i.e. scaling) this probability based on attributes that are most closely correlated with probability of failure. In another example, based on univariate analysis of the attributes' impact on probability of failure, it may -30 - be determined that a given attribute (e.g. number of downloads of the application) is particularly closely correlated with probability of failure. Accordingly, at step 1106, PF may be adjusted (i.e. scaled) based on the value of this given attribute to determine PF'. The adjusted probability of failure (PF') may be determined at least partly in dependence on the grouping(s) determined for the entity in method 900 -e.g. the attributes used to determine PF' may be selected based on the grouping of the entity.

Finally, at step 1108, the authorisation engine 406 classifies the entity into one of a plurality of risk bands based on the adjusted probability of failure (PF'). For example, each risk band may correspond to a given range of PF' values. Preferably, the risk band is determined further based on one or more attributes of the entity. These attributes are identified by the modelling engine 410 as correlated 10 to the more general risk posed by the entity (e.g. risks other than the risk of failure as quantified via PF'). Referring to Figure 12, an example method 1200 for determining a risk compensation (RC) is shown. Method 1200 shown in Figure 12 is identical to methods 1000 and 1100 described with reference to Figures 10 and 11, except where explained below, and corresponding reference numerals are used to refer to similar features.

At steps 1104 and 1106, the probability of failure (PF) and adjusted probability of failure (PF') are determined as described with reference to Figure 11.

Next, at step 1108, the risk band (RB) for the entity is determined. In method 1200, the RB is determined based on both the PF' and one or more given attributes (d) of the entity. Determining the risk band in this way allows factoring in risks other than the risk of failure of the entity (as estimates via PF'). The given attributes used in determining RB are correlated to other risks posed by the entity and are identified by the modelling engine 410. For example, when the entity is an application for the user device 102, PP may represent the probability that the application is malware, and the given attributes may be correlated to a further risk posed by the application (e.g. that it has a security flaw that could be exploited). Thus, the risk band determined in this way may provide an overall measure of the risk posed by the entity, including but not limited to the probability of failure of the entity.

Finally, the authorisation engine 406 determines the risk compensation (RC) based on the risk band (RB) determined at step 1108. In method 1200, the second classification relating to the acceptability of the service (i.e. the sensitivity of the entity) is determined as part of this step based on data relating to the entity (i.e. the second subset of data in method 1000). Thus, the final step of method 1200 corresponds to steps 1006 and 1008 of method 1000 described with reference to Figure 10.

Alternative Examples and Embodiments A person skilled in the art will appreciate that many different combinations of embodiments and examples described with reference to Figures 1 to 12 may be used alone unmodified or in combination with each other.

For example, it should be appreciated that methods 500, 900, and 1000 described with reference to Figures 5 to 12 may be implemented together in any appropriate combination. For example, method 900 may be used to determine one or more of the classifications determined at step 506 in method 500; and/or method 1000 may be used to determine the authorisation result at step 514 in method 500 based on a fully built up data file (the data received at step 1002 in method 1000 being the built up data file).

The described examples of the invention are only examples of how the invention may be implemented. Modifications, variations and changes to the described examples will occur to those having appropriate skills and knowledge. These modifications, variations and changes may be made without departure from the scope of the claims.

-31 -In an alternative example, the present disclosure is applied to the process of authorising loans. The example methods and systems described with reference to Figures 1 to 12 are used in the same way as described above in this alternative example, except where explained below, and corresponding reference numerals are used to refer to similar features.

In this alternative example, an entity (e.g. company or private individual) applies for a loan via a request from a user device 102, and the authorisation system 400 determines whether or not to approve the loan application. In other words, the authorisation system 400 authorises the entity for access to a loan. The example methods and systems described with reference to Figures 1 to 12 are applied in effectively the same way in this alternative example, except that the nature of the data operated on is different, and certain operations specific to the alternative example are performed.

Thus, in method 500 of authorising an entity described with reference to Figures, the input data is received at step 502 and further data requested at step 510 relate to a loan application made by the entity, and comprise data associated with the entity (i.e. prospective loanee) and associated with the loan (e.g. the requested amount and/or term of the loan). Example collected data associated with the entity include: number of employees, turnover, segment (i.e. industry), and/or data associated with directors of the entity. The further data requested at step 510 is requested from remote server(s) 108, such as a remote server that provides a credit report on the entity. In turn, the classifications determined at step 506 relate to the risk that the entity defaults (i.e. does not pay back the loan).

Accordingly, an example iterative process for authorising an entity for a loan as depicted in 20 Figure 6a may be as follows: * The communication engine 402 receives 502 data input by the user (i.e. prospective loanee) via the user device 102 and generates the input data file 602-A; * The classification engine 404 determines a first classification 604-A based on the data file 602-A; * Based on the first classification 604-A, the classification engine 404, request further data in the form of a credit report on the entity; * This further data is then added to the data file to generate the updated data file 602-B; * The classification engine 404 determines a second classification 604-B based on the data file 602-B; * Based on the second classification 604-B, the classification engine 404, request further data in the form of a bank statement from the entity; * The bank statement is parsed to extract attributes of the entity and the extracted data is then added to the data file to generate the updated data file 602-C; * The classification engine 404 determines a third classification 604-C based on the data file 602-C; and * Since a leaf node of the classification 'tree' has been reached and the corresponding 'rule' passed, the authorisation engine 408 authorises 518 the entity for the loan. In this alternative example, in method 900 of classifying risk described with reference to Figure 9a and method 950 of determining a prediction of failure of an entity described with reference to Figure 40 9b, the prediction of failure relates to a prediction that the entity defaults on the loan (i.e. to the probability of default). Example groupings determined by the modelling engine 410 may include: 'large' vs 'small' entities (e.g. as determined based on the entity's 'size' attribute (e.g. as quantified via the number of employees)), or 'young' vs 'old' entities (e.g. as determined based on the entity's 'date incorporated' attribute). As described with reference to Figures 6a and 6b, the classifications are 45 determined using only pure functions. This gives rise to a number of advantages including: (a) improved -32 -traceability to understand why certain loan decisions were made, which improved regulatory reporting (b) easier de-bugging and testing, (c) the ability more easily to 're-run' previous loan decisions using new classification models and thus test the new models, since the pure functions to have no side effects and (d) the ability to cache certain results of calculations rather than calling external modules/engines to execute the calculations.

The further data collected at step 958 may comprise banking data for the entity (e.g. inflows and outflows on the entity's account). This data is often received in an unprocessed format, such as an image or PDF of a bank statement, and the process of extracting attributes of the entity from this data can be computationally costly. Thus, the use of two classification steps as per method 950, the first being used to filter out loan applicants that have a low chance of getting approved, may be particularly advantageous in this alternative example.

The prediction of default may be determined as part of the authorisation method 500 (e.g. as part of one of the classifications determined at step 506). The determined prediction of default may lead to different authorisation results for the entity -e.g. approval or rejection of the loan application (e.g. rejection if prediction of default is above a threshold); setting of a maximum permitted loan amount or duration; and/or a change in the interest rate for the loan.

The prediction of default is determined using a statistical model trained by modelling engine 410 based on historic data related to previous authorisation requests (i.e. previous loan applications). By analysing the behaviour of previous loan applications, the modelling engine 410 aims to establish a link 20 between attributes of the entity/application and its future credit performance.

The prediction of default may be determined based on data from several sources (e.g. remote server(s) 108), this data may comprise: * Commercial bureau data: e.g. payment and credit behaviour of the company (i.e. entity); * Consumer bureau data: e.g. payment and credit behaviour of the directors of the company; * Banking data: e.g. inflows and outflows on the company's current account; * Application data: e.g. data provided by the user on the user device 102 during the application process; * Financial data: e.g. balance sheet and profit & loss data; and/or * Behavioural data: e.g. data relating to payment behaviour of the company on previous loans (e.g. for existing customers/loanees).

As described with reference to Figure 9b, the risk classification engine determines a first classification just when the online application is completed by the user (i.e. based on indicator data provided by the user). This first classification is used to reject the applications with a low likelihood of being funded. To improve computational efficiency, the first classification is determined on a smaller set of data than the subsequent second classification. For example, the first classification may be determined based on: application, commercial bureau, consumer bureau and behavioural data only. If the first classification is 'passed' (e.g. if P(F) is below a threshold), then the process continues and the second classification is determined to obtain a more accurate prediction of default using a larger data set. For example, the second classification may be determined based on: application, commercial bureau, consumer bureau, behavioural, financial and banking data. The use of two classifications in this way reduces the computations resources because second classification -which is more computationally costly due to the use of a larger data set -is only performed for a subset of the loan applications.

Optionally, different data sets/sources are used to determine the probability of default depending 45 on the grouping (e.g. segment) determined for the entity. The data sources used for each grouping are -33 -determined by the modelling engine 410 based on analysis of historic data. For example, the modelling engine 410 may determine that certain data sources are not, or are only weakly, correlated with probability of default for entities is one grouping, and thus stop using those data sources to determine the prediction of default for entities in that grouping to reduce computational resource usage. For 5 example, for loan applications in some groupings (e.g. young companies), financial information may not be used to determine the first and/or second classifications if it is found that it does not significantly improve the accuracy of the prediction (e.g. because a young company has minimal financial history). Equally, the modelling engine 410 may determine that certain data source are strongly correlated with probability of default, and thus the prediction of default will be based on those data source. Further, the 10 data sources may be tailored more specifically based on the attributes of the entity (i.e. of the loan application) in addition or altemafively to groupings.

The use of different data sources for different groups of entities (i.e. loan applicants) means that a loan applicant will only need to provide data for which it is already known that there is a correlation with the probability of default. This not only simplifies the loan application process for the applicant, but also minimises the data inputs for the various classifications such that the computational resource usage when calculating those classifications is also minimised.

Each set of data may be sourced from a plurality of sources. For example, banking information can be collected: from the user device 102 by requesting bank statements from the applicant and then converting the PDF bank statements into data via the parsing sub-engine 446, from a first remote server 108 by requesting aggregated data from a commercial bureau (e.g. under Commercial Credit Data Sharing (CCDS) program), and/or from a second remote server 108 corresponding to a bank server using Open Banking technology.

Accordingly, for some loan applications, the risk classification engine may be able to 'instantly' (i.e. without seeking further data from the user) determine the prediction of default. For example, for loan applications where financial data is not deemed necessary and banking data is provided through Open Banking or CCDS, a prediction of default can be computed instantly, since no additional data is needed from the loan applicant (i.e. user). Thus, if the loan application also passes any other classifications (e.g. rules) required in authorisation method 500, an instant authorisation decision can be made, This provides a particularly quick and seamless method of obtaining funding for users.

In this alternative example, in method 1000 of authorising an entity for a service described with reference to Figure 10, said service is a loan. The first classification determined at step 1004 relates to the risk that the entity defaults on the loan. Preferably, the authorisation engine 406 determines a probability of default and classifies the entity into a risk band based on this probability and optionally other attributes of the entity as per method 1100 described with reference to Figure 11.

The second classification determined at step 1006 relates to the probability that the entity will accept a loan offer (if/when it is authorised for the loan). The second classification preferably corresponds to a prediction of the entity's price sensitivity for the loan (i.e. service). A price sensitivity score is determined for the entity using a statistical model trained by the modelling engine 410. The entity is then classified into an 'acceptability' (or 'sensitivity') band based on this score. The modelling engine 410 analyses historic loan application data (e.g. the corresponding stored data files) and the acceptance rate of loan offers to identify attributes of the entities (e.g. number of directors) that are indicative of price sensitivity and generate the statistical model used to determine the price sensitivity score.

In turn, the risk compensation determined at step 1008 is the price of the loan (in particular, the 45 interest rate for the loan). Determining the price based on both the risk of default of the entity and price -34 -sensitivity of the entity may allow improving the balance between expected yield on a given loan (taking into account the risk of default) and acceptance rate (i.e. customer retention) and may allow optimising both variables to provide loanees competitive pricing while not sacrificing overall yield (for a plurality of entities). These variables are desirably both maximised but to a certain extent counteract each other (e.g. low interest rates improving customer retention but reducing yield), so integrating both in a single price determination step 1008 may provide an improved pricing method. Since the risk compensation (i.e. price/interest rate) is determined in dependence on the risk and price sensitivity, the price of the loan for two loanees may differ even if they present the same or similar risk (e.g. are in the same risk band).

The processing engine 408 is responsible for regularly monitoring the prices set for loans and to determine appropriate prices across risk and/or acceptability bands and regularly update the prices in response to changing market conditions and/or the loaner's desired risk and yield ranges.

Finally, at step 1010, the authorisation engine 406 transmits, to the user device 102, an offer for the loan or a rejection of the loan application, at the price determined at step 1008 (i.e. applies the risk 15 compensation determined at step 1008).

In this alternative example of the present disclosure, there is provided a method of pricing a service for an entity (preferably the provision of a loan to the entity), comprising: receiving data relating to the entity; determining, in dependence on a first subset of said received data, a first classification for the entity based on risk relating to providing the service to the entity (preferably a risk that the entity fails to pay for the service, more preferably a risk of default); determining, in dependence on a second subset of said received data, a second classification for the entity based on a prediction of the acceptability of the service to the entity (the second classification preferably relating to a prediction of the price sensitivity of entity's demand for the service); determining, in dependence on the first and second classifications, a price for providing the service to the entity (preferably an interest rate for the loan).

It will be understood that the invention has been described above purely by way of example, and modifications of detail can be made within the scope of the invention.

Each feature disclosed in the description, and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination.

Reference numerals appearing in the claims are by way of illustration only and shall have no 30 limiting effect on the scope of the claims.

Claims

Claims -35 - 1 An authorisation system, comprising: a classification engine configured to: receive a data file comprising data relating to an entity; determine a first classification in relation to the entity in dependence on the data file; in dependence on the first classification, transmit a request for further data relating to the entity, wherein the data file is updated to include the further data; and determine a second classification in relation to the entity in dependence on the updated data file; wherein the first and second classifications are each determined using one or more pure functions; and an authorisation engine configured to authorise the entity in dependence on the first and/or second classifications.A system according to claim 1, wherein the classification engine is configured to generate a first classification output, the first classification output comprising the first classification and the request for further data.A system according to claim 2, wherein the first classification output is stored in memory. A system according to claim 3, wherein the classification engine is configured to: 2. 3. 4. 5. 6. 7.receive a further data file comprising data relating to a further entity; and determine a first classification in relation to the further entity in dependence on the further data file; wherein, when the data relating to the entity and the data relating to the further entity are the same, the determining the first classification in relation to the further entity comprises retrieving from the memory the first classification output.A system according to any preceding claim, wherein the classification engine is configured to determine iteratively a plurality of classifications in relation to the entity, each determination of a classification comprising: determining a classification in relation to the entity in dependence on the data file; and in dependence on the classification, transmitting a request for yet further data relating to the entity wherein the data file is updated to include the yet further data; wherein each of the plurality of classifications is determined using one or more pure functions.A system according to claim 5, wherein, when it is determined that sufficient data relating to the entity has been collected for an authorisation decision to be made, the iterations are halted and the authorisation engine is configured to authorise the entity in dependence on the determined classification(s).A system according to claim 5 or 6, wherein the authorisation engine is configured to authorise the entity in dependence on one or more of the plurality of classifications, preferably at least in part in dependence on the last classification.-36 - 8. A system according to any preceding claim, further comprising a processing engine configured to compare: the first and/or second classifications as determined using one or more pure functions; and at least one alternative classification, the at least one alternative classification being determined in dependence on the data file using one or more alternative pure functions.9. A system according to claim 8, wherein the authorisation engine is configured to authorise the entity in dependence on the first and/or second classification and the at least one alternative classification.10. A system according to claim 8 or 9, wherein the processing engine is configured, for a plurality of entities, to compare: authorisation results for the entities based on the classifications determined using the one or more pure functions; and authorisation results for the entities based on alternative classifications determined using one or more alternative pure functions.11. A system according to claim 10, wherein the system is configured to replace the pure functions with the alternative pure functions in dependence on the comparison between the authorisation results.12. A system according to any preceding claim, wherein the data file is stored in memory, each time it is updated, for use by the classification engine.13. A system according to any preceding claim, wherein the inputs and outputs of one or more of the pure functions are stored in memory.14. A system according to claim any preceding claim, wherein the system comprises means for subscribing to a data feed, and wherein the data file is received by the classification engine, and/or each classification is received by the authorisation engine, by subscribing to a data feed.15. A system according to any preceding claim, wherein the classification engine is further configured to transmit the data file to a database for storage.16. A system according to any preceding claim, wherein the classification engine is further configured to transmit the first and second classifications and/or parameters of the pure functions to a database for storage.17. A system according to any preceding claim, further comprising a communication engine configured to receive the request for further data from the classification engine, and to add the further data to the data file.18. A system according to claim 17, wherein the communication engine is configured to transmit one or more requests for said further data to a user device and/or one or more remote servers.-37 - 19. A system according to claim 17 or 18, wherein the communication engine is configured to receive data relating to an entity, and to generate the data file comprising said data.20. A system according to any preceding claim, wherein adding the further data to the data file comprises: transmitting the further data to a database (preferably a data lake) for storage, and adding a reference to the further data to the data file.21. A system according to any preceding claim, wherein the authorisation engine is configured to authorise the entity for access to a resource; optionally wherein the data file further comprises data relating to the resource.22. A system according to any preceding claim, further comprising a parsing engine configured to parse the further data before it is added to the data file; preferably wherein the further data is received as image data and parsed into text data.23. A method of authorising an entity, comprising: receiving a data file comprising data relating to an entity; determining a first classification in relation to the entity in dependence on the data file; in dependence on the first classification, transmitting a request for further data relating to the entity, wherein the data file is updated to include the further data; determining a second classification in relation to the entity in dependence on the updated data file; and authorising the entity in dependence on the first and/or second classifications; wherein the first and second classifications are each determined using one or more pure functions.24. A risk classification system comprising: a modelling engine which, for a plurality of entities, is configured to: classify each of the entities into one or more groupings in dependence on one or more attributes of the entities; and identify, for each grouping, in dependence on stored data relating to failure (or success) of the entities in the grouping, at least one indicator, the at least one indicator being correlated with failure (or success) of the entities; and a risk classification engine which, in response to a request to classify risk relating to a further entity, is configured to: transmit a request, to a user device, for data associated with the one or more attributes of the further entity; determine, among the groupings, one or more groupings for the further entity in dependence on said data; transmit a request, to the user device, for data associated with the at least one indicator for the determined grouping(s); and determine a prediction of failure (or success) of the further entity at least partly in dependence on the data associated with the at least one indicator.25. A system according to claim 24, wherein, if the prediction of failure (or success) is below (or above) a threshold, the risk classification engine is configured to: -38 -transmit, in dependence on the determined groupings, a request for additional data relating to the further entity to the user device and/or to a remote server; and redetermine the prediction of failure (or success) of the further entity at least partly in dependence on: said data associated with the at least one indicator for the determined groupings, and the additional data.26. A system according to claim 24 or 25, further comprising an authorisation engine configured to authorise the entity at least partly in dependence on the determined prediction of failure (or success).27. A system according to claim 25 and 26, wherein the authorisation engine is configured to: authorise the entity if the determined prediction of failure (or success) and the redetermined prediction of failure (or success) are below (or above) a threshold; and/or deny authorisation to the entity if either the determined prediction of failure (or success) or the redetermined prediction of failure (or success) are above (or below) the threshold.28. A system according to any of claims 24 to 27, wherein the modelling engine is further configured to identify, for each grouping, in dependence on stored data relating to failure (or success) of the entities in the grouping, at least one further indicator not, or weakly, correlated with failure (or success) of an entity in the grouping.29. A system according to claim 28 when dependent on claim 25, wherein the risk classification engine is configured, prior to redetermining the prediction of failure (or success) of the further entity, to exclude, from the data associated with the one or more attributes of the further entity and/or the additional data, data associated with the at least one further indicator not, or weakly, correlated with failure (or success) of an entity.30. A system according to any of claims 24 to 29, wherein the modelling engine is configured periodically to: classify a different plurality of entities into one or more groupings in dependence on one or more attributes of the entities; identify, for each grouping, in dependence on stored data relating to failure (or success) of the entities in the grouping, at least one updated indicator, the at least one updated indicator being correlated with failure (or success) of the entities; and replace the at least one indicator with the at least one updated indicator.31. A system according to any of claims 24 to 30, wherein the risk classification engine is configured to classify the entity into one of a plurality of predetermined bands in dependence on the determined prediction of failure (or success).32. A system according to claim 24 or 31, wherein the risk classification engine is further configured to parse the additional data to identify data associated with one or more attributes of the entity.33. A method of classifying risk, comprising: classifying each of a plurality of entities into one or more groupings in dependence on -39 -one or more attributes of the entities; identifying, for each grouping, in dependence on stored data relating to failure (or success) of the entities in the grouping, at least one indicator, the at least one indicator being correlated with failure (or success) of the entities; and in response to a request to classify risk relating to a further entity: transmitting a request, to a user device, for data associated with the one or more attributes of the further entity; determining, among the plurality of groupings, one or more groupings for the further entity in dependence on said data; transmitting a request, to the user device, for data associated with the at least indicator for the determined groupings; and determining a prediction of failure (or success) of the further entity at least partly in dependence on the data associated with the at least one indicator for the determined groupings.34. A system for authorising an entity for a service, comprising an authorisation engine configured to: receive data relating to the entity; determine, in dependence on a first subset of said received data, a first classification for the entity based on risk relating to providing the service to the entity; determine, in dependence on a second subset of said received data, a second classification for the entity based on a prediction of the acceptability of the service to the entity; determine, in dependence on the first and second classifications, a risk compensation required for providing the service to the entity; and authorise the entity for the service in dependence on the first classification, comprising applying the risk compensation.35. A system according to claim 34, further comprising a modelling engine configured to identify, in dependence on stored data relating to the provision of the service to, and the acceptance of the service by, a plurality of entities: the first subset of data, the first subset of data being correlated with the risk relating to providing the service to the entities; and/or the second subset of data, the second subset of data being correlated with the acceptability of the service to the entities.36. A system according to claim 35, further comprising a communication engine configured to transmit requests for the first and/or second subsets of data to the entity.37. A system according to any of claims 34 to 36, wherein the first and second classifications are determined in parallel.38. A system according to any of claims 34 to 36, wherein the first and second classifications are determined sequentially, preferably wherein the first classification is used as an input when determining the second classification.-40 - 39. A system according to any of claims 34 to 38, wherein each classification comprises assigning the entity to one of a plurality of bands.40. A system according to any of claims 34 to 39, wherein determining the first classification comprises: determining a prediction of the probability of failure (or success) of the entity; and scaling said prediction based on one or more attributes of the entity.41. A method of authorising an entity for a service, comprising: receiving data relating to the entity; determining, in dependence on a first subset of said received data, a first classification for the entity based on risk relating to providing the service to the entity; determining, in dependence on a second subset of said received data, a second classification for the entity based on a prediction of the acceptability of the service to the entity; determining, in dependence on the first and second classifications, a risk compensation required for providing the service to the entity; and authorising the entity for the service in dependence on the first classification, comprising applying the risk compensation.