WO2007042837A1

WO2007042837A1 - Anomalous behaviour detection system

Info

Publication number: WO2007042837A1
Application number: PCT/GB2006/050299
Authority: WO
Inventors: Mark Girolami; Iain Ross Drummond; Ian D. Hall
Original assignee: Memex Technology Ltd
Priority date: 2005-10-13
Filing date: 2006-09-21
Publication date: 2007-04-19
Also published as: GB0520789D0; GB0808625D0; GB2431255A; GB2445142A

Abstract

A method and system for detecting anomalous behavior is disclosed comprising the steps of: generating a non-anomalous profile of a plurality of data records calculating a first probability that one or more new data records belong to the non-anomalous profile; calculating a likelihood value, based on the first probability, that the one or more new data records do not belong to the non-anomalous profile.

Description

Anomalous behaviour detection system

The present invention relates to an anomalous behaviour detection system and particularly, but not exclusively, to an anomalous behaviour detection and scoring system for telephone calls.

Each year in the telecommunications sector, fraudulent transactions account for a substantial loss of annual revenue for telecom providers. The detection of such fraudulent activity is an arduous task and presents a significant challenge to researchers and practitioners alike. This is due to the nature of the telecommunications domain where a high volume of transactional call data is produced.

In fact, only a very small percentage of call transactions are actually fraudulent and to detect these in real time compounds the problem.

Various solutions have been proposed, for example in "Detection of Fraud in Mobile Telecommunications" (Shawe-Taylor, J., Howker, K. and Burge, P., Information Security Technical Report 4(1 ):pp. 16-28, 1999) a system comprising of rule-based and artificial neural network components is developed, whilst in "Signature Based methods for Data Streams" C.

Cortes, and D. Pregibon (C. Cortes, and D. Pregibon , Data Mining and Knowledge Discovery, 5, pp 167-182, 2001 ) and "Detecting Fraud in the Real World" (M. Cahill, D. Lambert, J. Pinheiro, and D. Sun, Handbook of Massive Data Sets, pp 91 1 -929, 2002) signature based methods are proposed.

Typically, these solutions have drawbacks which limit the suitability to many areas which they might be employed.

Limitations of these and other similar solutions include: • current systems have dependencies on additional data from other sources (e.g. billing systems), such links are expensive to engineer and maintain;

• solutions comprising Neural Networks are not necessarily sensitive to behaviour which has not been "trained" - a new type of behaviour will often not be detected, or simply be mislabelled by the system;

• the complexity of the existing underlying computation is generally such that it mandates the use of moderately, or extremely, expensive computer hardware to fulfil the objective of processing a day of observed behaviour within a day;

• when the modes of behaviour between the most divergent elements of the observed population are notionally close together (for example, for a set of telephone calls from a residential area), existing systems can oscillate chaotically between zero output and returning most of the input, therefore, existing systems often require constant "tuning" of internal parameters to limit this behaviour;

• presentation to an operator of the rationale for a "detected" condition is often very difficult (or indeed impossible) with current systems - neural networks in particular do not lend themselves to explaining how a result has actually been arrived at;

• existing systems are not easily able to model and refine hypothetical patterns of behaviour, which can then be immediately detected;

• many existing systems disregard entire classes of input on the grounds that they are too similar, too expensive to process or simply do not conform to some other criteria that the system requires to enable processing, when often removing elements of observed behaviour from the process will reduce the efficiency of the results; and

• the systems have limited or no ability to handle new users or those with small numbers of historically observed events. According to a first aspect of the present invention there is provided a method for detecting anomalous behaviour comprising the steps of: generating a non-anomalous profile of a plurality of data records; calculating a first probability that one or more new data records belong to the non-anomalous profile; calculating a likelihood value, based on the first probability, that the one or more new data records do not belong to the non-anomalous profile.

Preferably, the method further comprises the step of generating an anomalous profile of a plurality of anomalous data records and calculating a second probability that one or more new data records belong to an anomalous profile.

Preferably, the likelihood value is based on the second probability as well as the first probability.

Preferably, the method further comprises the step of comparing the likelihood value to a predetermined threshold value, wherein the one or more new data records are classified as anomalous if the likelihood value is greater than the threshold value.

Preferably, the threshold value is calculated by simulating data records according to the non-anomalous profile and generating a simulated distribution, the threshold value being taken from the simulated distribution.

Preferably, the non-anomalous profile is a non-anomalous probability distribution of the plurality of data records corresponding to non- anomalous behaviour. Preferably, the non-anomalous probability distribution is generated from a function of a series of feature probability distributions representing characterising features of the data records.

Preferably, the feature probability distributions are derived from a Dirichlet prior probability distribution.

Further preferably, the feature probability distributions are derived from a Dirichlet prior probability distribution and the corresponding multinomial likelihood.

Preferably, the anomalous profile is an anomalous probability distribution of the plurality of anomalous data records corresponding to anomalous behaviour.

Preferably, the anomalous probability distribution is generated from a function of a series of feature probability distributions representing characterising features of the data records.

Preferably, each of a plurality of users has an associated plurality of data records and the likelihood value is calculated for each user.

Preferably, the threshold value is calculated for each user. Preferably, for a new user, the associated plurality of data records is taken from one or more other users and the non-anomalous profile is generated accordingly.

Preferably, the data records are call data records.

Preferably, the characterising features of the call data records are one or more of the following: day of call, time call initiated, destination of call and duration of call.

Preferably, the method further comprises the step of generating an alarm to alert one or more operators when the one or more new data records have a likelihood value above the threshold value.

According to a second aspect of the present invention there is provided an anomalous behaviour detection system comprising: a plurality of data records; a non-anomalous profile generation means enabled to generate a non-anomalous profile from the plurality of data records; a probability calculation means enabled to calculate a first probability that one or more new data records belong to the non- anomalous profile; a likelihood calculation means enabled to calculate a likelihood value, based on the first probability, that the one or more new data records do not belong to the non-anomalous profile.

Preferably, the system further comprises an anomalous profile generation means enabled to generate an anomalous profile from a plurality of anomalous data records and wherein the probability calculation means is further enabled to calculate a second probability that one or more new data records belong to an anomalous profile.

Preferably, the system further comprises a likelihood comparison means enabled to compare the likelihood value to a predetermined threshold value, wherein the one or more new data records are classified as anomalous if the likelihood value is greater than the threshold value.

Preferably, the system further comprises a threshold calculation means enabled to calculate the threshold value by simulating data records according to the non-anomalous profile and generating a simulated distribution, the threshold value being taken from the simulated distribution.

Preferably, the non-anomalous profile is a non-anomalous probability distribution of the plurality of data records corresponding to non- anomalous behaviour.

Preferably, the non-anomalous probability distribution is generated from a function of a series of feature probability distributions representing characterising features of the data records.

Preferably, the feature probability distributions are derived from a Dirichlet prior probability distribution. Further preferably, the feature probability distributions are derived from a Dirichlet prior probability distribution and the corresponding multinomial likelihood.

Preferably, the threshold value is calculated for each user.

Preferably, for a new user, the associated plurality of data records is taken from one or more other users and the non-anomalous profile is generated accordingly.

Preferably, the data records are call data records. Preferably, the characterising features of the call data records are one or more of the following: day of call, time call initiated, destination of call and duration of call.

Preferably, the system further comprises an alarm generation means enabled to alert one or more operators when the one or more new data records have a likelihood value above the threshold value.

According to a third aspect of the present invention there is provided a computer program product directly loadable into the internal memory of a digital computer comprising software code portions for performing the method of the first aspect of the present invention.

Embodiments of the present invention will now be described with reference to the accompanying drawings, in which:

Fig. 1 illustrates a flow diagram of an anomalous behaviour detection and scoring system according to the present invention.

The following description of the working of the present invention and the associated examples are made with reference to detecting anomalous behaviour in telephone usage.

It should be appreciated that the invention can be applied to detecting anomalous behaviour in other applications and can be applied to any streams of data originating from multiple sources where the behavioural patterns may vary both between the sources, and each source behaviour pattern itself may vary over time.

Examples of potential applications include, but are not restricted to: • telephone call data from mobile networks;

• telecommunications and network infrastructure monitoring for failure and intrusion;

• software application usage; • user internet browsing behaviour;

• transaction streams for any banking, credit card, debit card, loyalty/reward card or similar scheme;

• freight/package distribution tracking;

• asset tracking; • insurance fraud;

• health-care fraud;

• internet transactions; and

• intrusion detection;

Additionally, the present invention provides the ability for a group of system owners to share profile information, anonymously or otherwise, between themselves to facilitate the identification of any users who migrate between systems.

With reference to Fig. 1 , an anomalous behaviour detection and scoring system 10 is shown having a non-anomalous profile generator 12 and an anomalous profile generator 14.

The non-anomalous profile generator 12 has a non-anomalous Call Data Record database 16 for non-anomalous data records from user accounts. A profile generator 18 generates an account profile for each user corresponding to non-anomalous usage and stores the account profiles in a non-anomalous profile database 20. The anomalous profile generator 14 has an anomalous Call Data Record database 22 for anomalous data records from identified fraudulent usage. An anomalous profile generator 24 generates a fraudulent profile corresponding to anomalous usage and stores the fraudulent profile in an anomalous profile database 26.

Account profile generation can be a computationally intensive exercise, dependant on the number of call data records, but only requires to be performed at the outset of deployment of the system.

A probability calculator 28 calculates a probability distribution function in respect of new data records 30 for a particular account profile and for the fraudulent profile.

A likelihood calculator 32 determines the likelihood that the new data records 30 belong to the particular account profile.

A threshold calculator 34 determines a particular threshold value for the account profile and the likelihood value is then compared to the threshold value in a comparator 36. If the likelihood value is greater than the threshold value then an alert 38 is generated to handle the anomalous data records accordingly.

A new data adder 40 may then add the identified anomalous data records to the call data records for the fraudulent profile.

If the likelihood value is less than the threshold value then the new data adder may add the new data records to the call data records for the account profile. The account profile can then be updated at regular intervals based on the new usage, thereby increasing the accuracy of the account profile.

Furthermore, an operator defined profile 42 may be added to either the anomalous or non-anomalous profile database 20, 26. This allows profiles to be added which are not necessarily drawn from data record history. For example, the profiles could define infrequently occurring but large risk fraud in call records.

The steps for generation of the account profile, fraudulent profile, probability, likelihood value and threshold value will now be described. It should be noted that the method employed in generating a decision on the call data records is particularly computationally efficient. For example, it is possible using the present invention to process a city's telephone calls, in this case approximately 4 million calls, in real-time on a relatively basic desktop personal computer (of 2005 standards).

The approach of the present invention is based on Bayesian networks. Bayesian networks are directed acyclic graphs in which nodes represent variables of interest and the links represent causal influence amongst the variables. To each node in a Bayesian network corresponds a conditional probability, where conditional variables are parents for the node. Bayesian networks allow information to be obtained about associations in the network and to observe the reasons which caused a given result. This information is defined by the network topology, data domain knowledge and testing data. Depending on the movement direction in the network it is possible to define consequences or causes of events. Bayesian networks also have a high degree of flexibility and ability to adapt to changes in environment. A network can be expanded to accommodate additional conditional variables and relations. In case the incoming data shows insignificance in a variable or slight relation between variables, the network complexity can be reduced, which subsequently reduces the required amount of computation. At the same time changes in network configuration do not affect a whole network, but are conducted locally only.

Given a customer's telephone service usage (non-anomalous) profile and a series of recent telephone calls attributed to the customer account then a test is required to assess whether the logged transaction data provides evidence which is sufficiently high to accept that the calling activity genuinely originated from the account.

Having logged a series of telephone calls, which supposedly have been made by a customer then the null hypothesis H₀ is that the call has genuinely originated from the customer's account and is consistent with all previous patterns of calling behaviour from the account. The alternate hypothesis H^ regarding the telephone calls is that they were not made from the owner of the account but in fact originate from another account which we will denote as F¹.

From classical hypothesis testing there will be a rate of TYPE I errors made for any test procedure.

In this case, the TYPE I errors (rejection of H₀ when it is true) correspond to calls from an account which are genuine being labelled as fraudulent or having not originated from the customer. Given that the number of telephone calls being tested in a 24 hour period can be in the order of tens of millions then the TYPE I error rate has to be very carefully controlled and kept low to ensure that the number of calls exceeding the threshold are kept to a manageable level for operators who may be required to process the calls which raise alarms. On the other hand the TYPE Il error rate (acceptance of H₀ when it is false) indicates the number of deviant, and possibly fraudulent, telephone calls which are classified as normal. The TYPE Il error rate also needs to be kept to a very small level to ensure that the test is particularly sensitive to deviations from normal patterns of usage which may be highly indicative of fraudulent behaviour. The practical reality of such a fraud detection system is that the false rejection rate (TYPE I error rate) will have to be controlled.

Consider a number of independent and identically distributed random vectors Ci, C₂, ..., C_N denoting the representation of N logged telephone calls and these have a probability distribution P(C = c/a_m) under the null hypothesis, that is they are generated from a customer account a_m.

Furthermore, there is a probability distribution, P(C = c/F), defining the distribution of telephone calls under the alternate hypothesis that another alternate signature, possibly a fraudulent one, is responsible for the generation of the phone calls. Then Neyman-Pearson state that the most powerful test (that which maximizes the power of the test 1 - ) for a fixed significance level is obtained by using the likelihood ratio as the test statistic.

The null hypothesis will be rejected when the value of the test statistic is smaller than _cnt such that Prob( < _cnt : H₀ is true) = . The definition of the probability distributions under both H₀ and Hi , and the definition of _cπf to set the level of significance of the tests i.e. the false rejection rate will now follow.

Consider a population of customers, denoted by the set A, each of whom have an account with the telecom provider. The m^th customer makes a series of N_m telephone calls during a given period T, ..., T+ , defined by C = [ci, C₂, ..., CNm]. Where each c_π defines the counts of the number of times that each of the events which defines a telephone call has occurred.

The account for the m^th customer will be characterized by a consistent pattern of service usage over a given period of time Λ , ..., T - from account initiation (time point 1 ) until time points prior to the set of telephone calls initiated during period T, ..., T+ . This will be reflected in a set of sufficient statistics, a_m, describing the number of times, for this account, that a particular event related to the initiation and completion of a particular telephone service has occurred. This set of sufficient statistics, a_m, will consist of, for example, the number of times a call is initiated in the morning between 6.00 am and midday, or the number of times that a call lasted longer than 15 minutes given that the call was international.

In this example, the set of sufficient statistics, a_m, are the counts of the number of times that events are observed. For example, the number of times that a telephone number dialled falls into one of the predefined categories.

The sufficient statistics are generated through histogram binning. Histogram binning is the discretisation of continuous values, such as time, or the agglomeration of a large number of discrete values, such as actual telephone numbers, into a smaller number of values. A "bin" is a single point representing a range of values which are covered by a discrete probability value. In this example, the corpus of calls has been mathematically decomposed to ascertain the boundary conditions most appropriate for the bins, and the dependencies which most affect the results. These results were used to define the sets of "bins" used. For example, there are relevant bins for day of the week, time of day, call type (essentially destination), call duration and also for some combinations of these. Histograms are the simplest way of visualising the values stored in the relevant bins.

In this example, four independent sets of events, or features, are used to define a simple account model. These are: Day of Week (W); Call Start Time (S); Call Destination (D); and Call Duration (L), each denoted as: w e IV₁ S e S₁ CI e D₁ I e L

It should be noted that the present invention is not in any way limited to the sets of events described above nor does the assumption of statistical independence between events require to be made. These assumptions (independence between four events defined) are used to illustrate the overall concept. It should be appreciated that other dependent or independent sets of events could be used for this example and would inevitably be used in other models using the present invention.

If the number of possible values for each event is defined as /W/, /S/, /D/, and \L\ (For example, seven days in the week to make telephone calls then IW/ = 7) in which case a_m = [a_m,w=1 ,... , /W/, a_m,s=1 ,... ,/S/, a_m,cM ,... ,/DI, a_m>1 ,... ,/Lβ^τ e Λ/W⁺I^S/⁺I^D/⁺W. The definition for telephone calls, during the period T, ..., T+ , follows in a similar manner such that

It would be possible to consider further conditional events such as Call Duration GIVEN Call Destination and Call Destination GIVEN Start Time but for the purposes of this example we will assume that the each event is independent.

For the purposes of this example, the assumption is made that given the customer account all call related events are independent of each other i.e.

The series of telephone calls, C, made during time period T, ..., T+ , are made with probability P(C/a_m), in other words this series of calls were likely to have been made by the m^th customer account with probability P(C/a_m). For the simplest case where it is assumed that each call is independent of all previous calls made from the customer account then:

Now each P(C/a_m) will be defined by the distribution over the available

e the day that the call was made (vή, the time that the call was initiated (s), the destination of the call (d), and the duration of the call (I). Assuming conditional independence then:

What is now required is a representation of each account in terms of a set of parameters which define each of the conditional probability distributions employed in each P(C/a_m). Assuming independence of the features each account, m, is then defined by the following set of multinomial parameters as:

where the strictly positive parameters define the multinomial distributions such that:

The distribution over each of the required Multinomial parameters will be defined with a

Dirichlet prior probability distribution which is the conjugate of the

Multinomial likelihood such that, for example, the Start Time parameters have a Dirichlet prior distribution defined as:

where

Gamma function. The corresponding multinomial likelihood for the start time is:

where . Then the marginal distribution for the account bε on, for example Start Time alone , is:

erms , d P(a,,, |cκ(_f! ) ■ Now the

3ndent on the Dirichlet parameters:

as the multinomial parameters have been integrated out due to the conjugacy of the Multinomial-Dirichlet distributions.

The parameters of the Dirichlet can be written as a product of a normalised measure over for example S in the case of Start Time and a positive real value that is:

The values of the parameters of the Dirichlet prior probabilities have a direct effect on the predictive probability assigned to a series of calls given a particular account. For the case where the prior parameter values for all variables is set to the value of one, then it is implicitly assumed that all parameter values are equally likely a priori as for then

In pr given that the m^th account is new and has made no calls

then ing tha

haviours or modes of service usage are equally likely to emerge. The form of prior probability just discussed is particularly naive in that given the existing population of customer accounts A it ignores all the information available regarding specific characteristics of service usage from the population or market segmented parts of the population.

Therefore, in the absence of account specific information, that is a brand new account, our prior should be guided by the population average signature (or the part of the market segment the new customer is attributed to) in which case each

probability of a call being initiated at Start Time equals / (the (^h

start time event, for example, Morning given the whole population of accounts A). The values of the coefficients will be account specific and can be identified via sorr

of grid search. Alternatively, to obtain the scalar values for ψl^J _n , μ_m ^b , μ^_n , μ!_m we can employ Empirical Bayes (Type I maximum likelihood) such that:

Now denoting

and

then

where denotes the digamma function and these expressions can be employed in a Newton iteration:

for each attribute of every account.

We now have to consider how to assign the required probabilities to a new sequence of calls originating from the accounts.

The posterior probability over the parameters follows as:

Now for N calls made during the new period T, ..., T+ we require the following probability:

where:

Now defining

and this follows for the other terms required, that is , , and to ci L (l|^am ipre^ v"^"_* Jlihood o\ P (w|a_m )of calls originating from the specific account

The non-anomalous profiles for each account comprise of the sufficient statistics (counts of each event) and the estimated values of the parameters of the Dirichlet priors which require little storage overhead. The scoring of the series of calls amounts simply to the iterated application of the Gamma function in each term of equation (7) defining

Account spe

lds are also required to capitalize on the individual descriptive statistics. For a given level of test signii λ"_r' ni each account will require a corresponding value such that:

This has important practical consequences in that the False Rejection rate, number of calls which are actually genuine being rejected by the system as inconsistent with the current profile, will be controlled by this value.

To this end a form of Parametric Bootstrap is employed by using the above predictive distributions to repeatedly simulate series of calls from each account, compute their associated scores and then obtain the empirical distribution of the scores. These account specific empirical distributions can then be used to obtain the account specific threshold scores which will yield the required test significance levels (i.e. TYPE I error rates). As mentioned previously, the example above assumes that the set of events used to describe the model are independent of each other. That is, for three independent variables ©₁, O₂, ©3 the probability is defined as: p{x)= p{x_{)- p{x₂)- p{x₃)

The probability in the case of dependent variables should be calculated as a product of conditional probabilities. For example:

p{x) = p{x_γ)- p{x₂ I X₁ )- p{x₃ I X_p X₂ ) The present invention has a number of distinct advantages over prior solutions including:

• There is no empirical reliance on external data other than call data records, that is the system can work in isolation and no expensive integration with other systems is mandated. Given that many companies have a number of different billing systems for different services, this is not a trivial point;

• the system has the ability to identify classes of anomalous behaviour which have not been seen before, simply because they are not what the user normally does, as opposed to only being able to identify patterns the system has already seen;

• there is immediate result scoring against new, or entirely speculative, fraud patterns and no reliance on updates to the system to cope with the new fraud pattern;

• the system has the ability to processes all data records for a customer or user, thereby building an accurate pattern of behaviour, rather than simply looking at high-value calls allowing a level of discrimination not achievable when considering only a portion of a customer's behaviour;

• the system is also insensitive to the nature of calls being made, that is, it does not matter whether the calls were to or from a mobile cellular phone or an internet connection, all data is processed and no pre-filtering of call information is necessary;

• as more call data for a given account is processed, the accuracy of the system improves since the account signature will become more and more refined; and

• cost of ownership is reduced due to reduced hardware costs and real-time processing.

Improvements and modifications may be incorporated without departing from the scope of the present invention.

Claims

1. A method for detecting anomalous behaviour comprising the steps of: generating a non-anomalous profile of a plurality of data records calculating a first probability that one or more new data records belong to the non-anomalous profile; calculating a likelihood value, based on the first probability, that the one or more new data records do not belong to the non-anomalous profile.

2. A method as claimed in claim 1 , further comprising the step of generating an anomalous profile of a plurality of anomalous data records and calculating a second probability that one or more new data records belong to an anomalous profile.

3. A method as claimed in claim 2, wherein the likelihood value is based on the second probability as well as the first probability.

4. A method as claimed in any of claims 1 to 3, further comprising the step of comparing the likelihood value to a predetermined threshold value, wherein the one or more new data records are classified as anomalous if the likelihood value is greater than the threshold value.

5. A method as claimed in any of claims 1 to 4, wherein the threshold value is calculated by simulating data records according to the non-anomalous profile and generating a simulated distribution, the threshold value being taken from the simulated distribution.

6. A method as claimed in any of claims 1 to 5, wherein the non- anomalous profile is a non-anomalous probability distribution of the plurality of data records corresponding to non-anomalous behaviour.

7. A method as claimed in claim 6, wherein the non-anomalous probability distribution is generated from a function of a series of feature probability distributions representing characterising features of the data records.

8. A method as claimed in claim 7, wherein the feature probability distributions are derived from a Dirichlet prior probability distribution.

9. A method as claimed in claim 7, wherein the feature probability distributions are derived from a Dirichlet prior probability distribution and the corresponding multinomial likelihood.

10. A method as claimed in any of claims 1 to 9, wherein the anomalous profile is an anomalous probability distribution of the plurality of anomalous data records corresponding to anomalous behaviour.

1 1. A method as claimed in claim 10, wherein the anomalous probability distribution is generated from a function of a series of feature probability distributions representing characterising features of the data records.

12. A method as claimed in claim 1 1 , wherein the feature probability distributions are derived from a Dirichlet prior probability distribution.

13. A method as claimed in claim 1 1 , wherein the feature probability distributions are derived from a Dirichlet prior probability distribution and the corresponding multinomial likelihood.

14. A method as claimed in any of claims 1 to 13, wherein each of a plurality of users has an associated plurality of data records and the likelihood value is calculated for each user.

15. A method as claimed in claim 14, wherein the threshold value is calculated for each user.

16. A method as claimed in any of claims 1 to 15, wherein, for a new user, the associated plurality of data records is taken from one or more other users and the non-anomalous profile is generated accordingly.

17. A method as claimed in any of claims 1 to 16, wherein the data records are call data records.

18. A method as claimed in claim 17, wherein the characterising features of the call data records are one or more of the following: day of call, time call initiated, destination of call and duration of call.

19. A method as claimed in any of claims 1 to 18, further comprising the step of generating an alarm to alert one or more operators when the one or more new data records have a likelihood value above the threshold value.

20. An anomalous behaviour detection system comprising: a plurality of data records; a non-anomalous profile generation means enabled to generate a non-anomalous profile from the plurality of data records; a probability calculation means enabled to calculate a first probability that one or more new data records belong to the non-anomalous profile; a likelihood calculation means enabled to calculate a likelihood value, based on the first probability, that the one or more new data records do not belong to the non-anomalous profile.

21. A system as claimed in claim 20, further comprising an anomalous profile generation means enabled to generate an anomalous profile from a plurality of anomalous data records and wherein the probability calculation means is further enabled to calculate a second probability that one or more new data records belong to an anomalous profile.

22. A system as claimed in claim 21 , wherein the likelihood value is based on the second probability as well as the first probability.

23. A system as claimed in any of claims 20 to 22, further comprising a likelihood comparison means enabled to compare the likelihood value to a predetermined threshold value, wherein the one or more new data records are classified as anomalous if the likelihood value is greater than the threshold value.

24. A system as claimed in any of claims 20 to 23, further comprising a threshold calculation means enabled to calculate the threshold value by simulating data records according to the non- anomalous profile and generating a simulated distribution, the threshold value being taken from the simulated distribution.

25. A system as claimed in any of claims 20 to 24, wherein the non- anomalous profile is a non-anomalous probability distribution of the plurality of data records corresponding to non-anomalous behaviour.

26. A system as claimed in claim 25, wherein the non-anomalous probability distribution is generated from a function of a series of feature probability distributions representing characterising features of the data records.

27. A system as claimed in claim 26, wherein the feature probability distributions are derived from a Dirichlet prior probability distribution.

28. A system as claimed in claim 26, wherein the feature probability distributions are derived from a Dirichlet prior probability distribution and the corresponding multinomial likelihood.

29. A system as claimed in any of claims 20 to 28, wherein the anomalous profile is an anomalous probability distribution of the plurality of anomalous data records corresponding to anomalous behaviour.

30. A system as claimed in claim 29, wherein the anomalous probability distribution is generated from a function of a series of feature probability distributions representing characterising features of the data records.

31. A system as claimed in claim 30, wherein the feature probability distributions are derived from a Dirichlet prior probability distribution.

32. A system as claimed in claim 30, wherein the feature probability distributions are derived from a Dirichlet prior probability distribution and the corresponding multinomial likelihood.

33. A system as claimed in any of claims 20 to 32, wherein each of a plurality of users has an associated plurality of data records and the likelihood value is calculated for each user.

34. A system as claimed in claim 33, wherein the threshold value is calculated for each user.

35. A system as claimed in any of claims 20 to 34, wherein, for a new user, the associated plurality of data records is taken from one or more other users and the non-anomalous profile is generated accordingly.

36. A system as claimed in any of claims 20 to 35, wherein the data records are call data records.

37. A system as claimed in claim 36, wherein the characterising features of the call data records are one or more of the following: day of call, time call initiated, destination of call and duration of call.

38. A system as claimed in any of claims 20 to 37, wherein the system further comprises an alarm generation means enabled to alert one or more operators when the one or more new data records have a likelihood value above the threshold value.

39. A computer program product directly loadable into the internal memory of a digital computer comprising software code portions for performing the method of claim 1 to 20.