WAVELET BASED FRAUD DETECTION SYSTEM
Field of the Invention
The present invention relates to fraud detection and in particular to a system of detecting fraud.
The invention has been developed primarily for use in detecting utility, telecommunications or credit card fraud and will be described hereinafter with reference to these applications. However, it will be appreciated that the invention is not limited to these particular fields of use.
Background of the Invention
Telecommunication and utility companies, such as electricity and gas suppliers, and credit card providers have extensive distribution networks. Due at least in part to their network size, these companies have found it extremely difficult to constantly monitor the total consumption of their services.
In the large utility networks for example, losses due to inefficiencies in di system are prevalent. For electricity suppliers, these losses include resistive power transmission losses and transformers inefficiencies and leakages for gas utilities. Other losses by the utilities occur through unregulated Supply, fraudulent activity and faults in the systems for example.
Using the electricity utilities as an example, it is the case that fraudulent misappropriation of electricity which is unaccounted for causes significant loss to the electricity utility, A utility is aware that the total demand for their product, electricity in the present example, will vary depending on climatic factors or consumer demand. For example, on unusually hot days, tire electricity load increases to compensate for more people switching on coolers and air- conditioners. Consumer demand also varies depending on time of day where, for example, minimal amounts of electricity ar used late at night as compared to when people are up and about or at work.
In so far as fraudulently appropriated gas or electricity is small compared to the amount provided through the entire utilities network, it is very difficult to detect the small variations
due to fraud. The electricity utilities for example are physically unable to monitor each part of their network to detemήne power consumption so as to detect fraudulent appropriation.
Generally, the utilities retrospectively review power consumption data in some or all parts of their distribution network and look for unusual or irregular usage to provide an indicator of the possibility of the presence of fraud. It is the case that a substantial amount of electricity fraud is detected by physical evidence left at the source from where the power is fraudulently removed, for example by a utility employee observing such evidence when monitoring or repairing part of the distribution network.
In other fields, such as credit card networks, fraud is also a major source of loss for credit card companies and merchant providers. For example, if a credit card is stolen or its number fraudulently appropriated, financial transactions against the credit card can be made before an alert is raised. Credit card companies are known to monitor and manipulate data indicative of transactions that have been recorded to determine if any irregular transaction patterns appear.
For example, should purchases be made at unusual places such as a different country to the card holder, or at various times of day when transactions are not normally conducted by the card holder. Such fraud detection systems, much like the utility company methods, are generally retrospective based on data recorded for events which have occurred unless a major irregularity in transactions occur, such as transactions for unusually large amounts.
A similar system for detecting telecommunications fraud is also known. For example, the telecommunication service provider can be made aware when irregular telecommunications patterns are provided for example by mobile phone or other network. When fraudulent telecommunications or data transactions occur, any irregularities in these patterns, including volume or location, can be used to trigger a fraud detector.
In all known fraud detection systems only historical data can be processed to determine if fraud has occurred except in the presence of exceptional circumstances such as very large transactions or utility consumption. US Patent No. 6,029,144 discloses a system and method for checking expense entries in a knowledge based system. More particularly, expense entries provided by employees are checked for compliance with predetermined policy rules to detect
the possibility that fraud is occurring. The system includes a knowledge based system to determine expense entry compliance with policy and to determine fraud. An auditor workflow system operating in unison with the policy checker guides manual audits of those expense entries that do not comply with the policy rules.
A data pattern analyser for detecting behaviour patterns is also employed to indicate the presence of fraud. As part of the system, a prioritiser ascribes waitings to order or rank any violations of the predetermined rules. The analyser and prioritiser are linearly interrelated in an automated system which also generates reports. This system is disadvantageous in that it can only employ historical data on the basis of patterns of non-compliance with rules.
US Patent No. 6,094,693 is directed toward a system and method for detecting credit card fraud. The system operates on the premise that fraudulent credit card activity will reflect itself by the appearance of clustered groups of suspicious transactions. Particular transactions are ranked by assigning weights to the individual transactions for use in identifying the suspicious transactions. Indicators such as the geographic region of a transaction or transactions and the time the transaction occurs are considered by the system. That is, this system employs a historical weighted data to determine the presence of fraud.
International Patent Application No. WO99/04329 discloses a method of employing evolving classifier programs for signal processing and control. A software program, or evolver, is used to examine a large number of features which may be from multiple data sources to create a "classifier" program.
The output of the classifier program is compared to a desired output and one or mox classifier programs are then created and optimised by the evolver program by means of generic programming. The desired output is then again compared to the actual classifier program output and the difference is used to measure the fitness to guide the evolution of the classifier program. This system is only suited to signal processing and control manipulation and detection, for example in classifying myoelectric signals for the control of a prosthesis or classifying remotely sensed spectra for the identification of numerals. The system is not applicable to other fields, such as utility, credit card or telecommunications fraud.
Computer systems, such as those in the above prior art International Patent application are a crude form of neural networks in which a number of processes are interconnected in a manner analogous to the connection between neurons in a human brain. These systems are able to 'learn' by a process of trial and error. Such neural network techniques have gained acceptance as a useful problem solving tool in the utility supply industry, particularly the electricity industry, with feature selection and extraction being a critical component to achieving good learning abilities and generalised performance in the neural network. Previously published technical papers such as those by Bishop "Neural networks for pattern recognition", Oxford University Press, 1995; Neal "Bayesian learning for neural networks, Lecture Notes in Statistics", Springer Press, 1994; and El-Sharkawi, "Neural network and its ancillary techniques as applied to power systems", 1995 have discussed these topics. Unfortunately, these principles rely primarily on stationary and time invariant properties whereas the patterns of utility fraud, for example, electricity gas or water theft are non-stationary, noisy, time varying and have multi-scale properties. That is, fraudulent activities are randomly initiated and terminated and the methods, quantity and frequency of the fraudulent activities change over time. As a result, it is very difficult to analyse such data using conventional techniques.
These techniques are also equally applicable for determining the presence of fraud in other fields, such as credit card networks, telecommunications networks or the like. There are no current approaches for detecting such fraud in the utility networks and those known systems for use in detecting credit card telecommunications fraud are not applicable to fields other than their own. In very recent years, wavelets have been successfully introduced as an efficient tool for use in various time series analyses, such as those by Abry et al, IEEE Trans. On Information Theory, 44(1), pp2-15, 1998 and Meyer, Proc. International Conference Wavelets and Applications, France, 1989.
The wavelet analysis technique is suited not only for wave forms that are smooth and well behaved, but also for those with abrupt changes, transients or other irregularities due to the localisation and ulti resolution analysis of the wavelets. Furthermore, wavelets can be used to determine the time of smooth or irregular changes, tire type of change by determining the first or second derivatives of the wavelets and the amplitude of the changes. As a result, a more accurate model has been developed for use in credit card fraud. Different features or
predictors can be fitted to a wavelet related sub-series and be valuably combined. However, these known wavelet techniques are inefficient in detecting fraud.
Object of the Invention It is an object of the invention to overcome or substantially ameliorate any one of the above disadvantages of the prior art or to provide a useful alternative.
Surøjorøry of the Invention
According to a first aspect of the invention there is provided a method for fraud detection including the steps of; entering data profiles into a feature extractor; entering the data profiles into a wavelet transformer and providing a wavelet decomposition; entering the wavelet decomposition into a processor and providing processed wavelet co-effϊcients; combining the processed wavelet co-efficients with the raw data profiles and assembling an extracted feature data output; entering the extracted features and data indicative of fraud history and customer labels into a model generator; allocating weights to the data entered into the model generator by means of an allocator; combining the results and validating the combined classified results with a validator; passing the model data from the model generator to the fraud detector together with fraud history data and the feature output data provided by the featured extractor; allocating weights to the extracted features and linearly combining them to provide a linearly combined output; cross-combining thelineariy combined data with the raw fraud history data and providing an output which is indicative of the probability of fraud being present in the inputted data.
According to a second aspect of the invention there is provided a fraud detection system including: a feature extractor receiving data input data profiles and providing features;
a model generator receiving input of data indicative of customer labels and input data indicative of fraud history together with the extracted features, the model generator providing a model output and an accuracy rate output; and entering the model from the model generator into a fraud detector, together with rae extracted features and data indicative of the fraud history; and wherein the fraud detector provides data output indicative of the probability of fraud.
Brief Description of the Drawings
A preferred embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
FIG 1 is a schematic overview of the embodiment;
FIG 2 is a schematic illustration of the fraud detect of FIG 1;
FIG 3 is a schematic representation of the feature extractor of FIG 2;
FIG 4 is a schematic representation of the model generator of FIG 2; FIG 5 is a schematic representation of a fraud detector of FIG 2; and
FIG 6 is a schematic illustration the embodiment of FIG 2 employed in a network.
Detailed Description of the Preferred Embodiment
In wavelet analysis under certain conditions (Chui, "An Introduction to Wavelets", Academic Press, 1995), a function/is the signal domain can be transformed into fee wavelet domain by applying a discrete wavelet transform (WT)
Where ^is the mother wavelet and its derived forms are given by;
where j is the dilation and k is the translation W
rf contains the information about the function/near the time point j and near the frequency proportional to 2
;. At each time and frequency point, the wavelets encode the details while leave the scaling function coding an image of a signal at half resolutions. The stationary wavelet transform (SWT) is popular because of its property of shift invariance: the new sequences have the same length as the original sequence. The slight change in ψ
jk {x) is {ψ
jk {x)}= 2 ψfy {x~k) je Z,ke= Z .
That is, for all integers k ≥ 0 and j ≥ 0, coefficients at all resolution levels; appear at all positions k.
Referring now to Fig. 1, there is illustrated a fraud detection system 2 which can be used to receive and process real time or historical data. A fraud detector 1 is in communication with a database 1 having historical data indicative of profiles, customer labels, fraud history (if applicable), as well as any other features such as demographic or business information. The fraud detector receives the data from the database and provides data to the database indicative of the probability of fraud occurring based on the stored data.
Fig. 2 illustrates the system for using the data to detect fraud. Three aspects of data: fraud history (if applicable), data profiles and customer labels are provided to the detector. Data profiles are entered into a feature extractor 3 which extracts relevant features from the time series data profiles and provides a feature output 21. A model generator 4 receives the feature output 21 from the feature extractor 3 as well as the fraud history data and customer labels data.
The model generator 4 men provides a fraud detection model a fraud detection model as well as an accuracy rate of the model The model data output from the model generator 4 and the feature data output of the feature, extractor 3 are entered into a fraud detector 5 together with the fraud history data. The model generator 4 uses input to train and test the model to be used, analogous to a neural type network. The detector 5 uses the model data to analyse the feature data 21 and determine the fraud probability,
The feature extractor 3, model generator 4 and fraud detector 5 are described further below. The dataset is entered into a wavelet transformer 6 which provides a wavelet decomposition output denoted W'j. It is noted, however, that a number of other wavelets can be used.
Referring to Fig. 3, there is illustrated one implementation of the feature extractor 3 wherein a "Daub 6" or a "Syn 8" filter is used for detecting abrupt and gradual changes respectively in wavelet decomposition. Wavelet analysis provides a means of implementing waveform recognition, pattern identification, pattern change location and spectral distribution in both the time and wavelet domains.
In the preferred embodiment, a total of eight levels of wavelet decomposition are employed, however, this number can vary. After the wavelet transformation has been applied to the time series data, the resulting co-efficients W'i are processed by a processor 7. The processor 7 includes removing or minimising any noise in the signal, spectrum processing, wavelet shrinkage or any other desired process. Processor 7 co-efficients are inputted into a local and global statistical analyser 8 which processes the processor coefficients W'j. The analyser 8 applies the local and global statistical analysis to process co-efficients At as well as to the raw data profiles denoted Ai. The output of the analyser 8 is then combined at 9.
The resulting features of the combined analyser outputs are assembled in a matrix for all profiles in the data profiles batch. This assembled matrix output provides the feature extractor output for use in the model generator 4 and fraud detector 5.
It is known that a local label spectrum can be defined that measures the contribution to the total energy from the vicinity at point x at a resolution level j, and unveils the frequency components of a subseries of that level. This is useful for detecting the spectral changes in the wavelet domain.
Most of the available data is usually collected for record and book keeping purposes rather than for fraud detection and often includes some level of noise. In the present embodiment, the data is normalised and compressed through wavelet shrinkage, as described above. That is, J is the number of the total decomposition levels Wj wavelet decomposition co-efficients or spectrum in the jch resolution level \xι ≤ l ≤ L) the 3th feature for a dataset, Pj processing, At { ≤ i ≤ I) analysis m - e t me domai , ^k il ≤ k ≤ κ) analysis in the wavelet domain, a
+multip)e combination and a J -> the assembly. The wavelet shrinkages are based on known processes.
Referring to Fig. 4, there is schematically illustrated a model generator 4. The generator 4 includes a model training portion 15 and a model testing portion 11. The model training 15 and model testing 11 portions each include two classifiers 12 and 17. The classifiers receive input from the feature extractor as well as the customer labels data and fraud history. The
arrangement of the classifiers 12 and 17 form aBa-yes and neural network (multiple layer perceptrons), however, any number of classifiers may be used.
For model training, weightings used by the classifiers 12 and 17 in classifying data are assigned by an allocators 13 and 16 respectively. The allocator provides an output to the classifiers 17 which also receive input from the customer label data and feature extractor to generate a new model. The results are combined at a combiner denoted 18, and are then tested by model tester 11 using a similar structure to the model trainer 15. Classification of the data occurs by a pair of classifiers 12 whose output is input to an allocator 13 which assigns weights to fraud history data and provides a comparison by comparator 14 of known customer data labels with the test results.
The model is validated at a validator 19 and the output together with the associated accuracy rate is provided, or a ne iteration starts if the accuracy rate is not satisfactory.
Referring to Fig. 5, there is illustrated the fraud detector 5. The extracted features and any number of other features such as demographic data or personal information for example, are inputted into the fraud detector denoted by 20 and 21. Allocator 22 ascribes weights to the extracted features and linearly combines them at linear combiner 23.
The output of the linear combiner 23 together with the raw and weighted features provided from the feature extractor and data indicative of the fraud history are classified by classifiers 24 to provide an output which is cross combined by a cross combiner 25 to produce a fraud probability.
The cross combiner 25 is configured to receive the outputs of the plurality of classified outputs provided by the classifier 24. In the event that there exists a probabilistic relationship between the raw data profile, time series data, and a set of features, a soft competition algorithm can be applied to select optimal feature vectors from all of the extracted features provided by die feature extractor 3.
By way of example, for a case where there are M classes C,, ..., Cro and N classifiers CLU ..., CLn- the probability that a data sample D belongs to the * class is:
P(xι) = P(De Cffl | Oι = l)
Where P(Oι = 1) is the probability that the l t,hn f .ea re vector {xi} is optimal.
Considering a feature matrix a, with I records and L features, the probability that the iΛ sample belongs to class Cm classified by the classifier CLn ύi terms of the input feature vector χu(l = i< I; l ≤ ≤ L) is:
If gn is the weight function for the πA classifier, then the probability of the i'h set data belonging to tha m" class is:
As can be seen, either historical or real-time data can be provided to the fraud detection system. Fig. 6 illustrates the fraud detector used in an electrical utility network. The fraud detector 30 receives data 29 from a controller 27 in communication with the fraud detector 30. The controller 27 includes a database 27 in communication with the controller which communicates with an electricity meter and concentrator network. The metering concentrator network data indicative of electricity use is communicated to the controller database 27. This information is processed by the controller 27 and stored in database 28 and transmitted to the fraud detector which calculates the likely probability of fraud based on the system data. It is noted that the data 29 transmitted to the fraud detector 30 can be real-time data.
Evaluation criteria
If there are N classification labels, the number of error types equals N -N. By assigning epq to the cost of an error made by wrongly classifying p into q where p ≠ q and cn the corresponding type of errors, the total cost can e expressed as c= ∑ J∑^^M • Errors do not equally impact the performance. The question is which error is more significant. False positive identification will wrongly identify honest customers and may cause unnecessary investigation and the loss of honest customers. False negative identification ignores genuine fraudulent activities and loss of revenue for the utilities.
Presently, acceptance accuracy rate is defined as the ratio of the number of the correctly reported instances of fraud to the total number of reported cases of fraud given a dataset X with total / meter readings and L analyzed features:
R{correct |
(7)
where N( report) is the total number of cases of fraud that is reported, N( correct n report) the total number of cases of fraud that is reported and correctly identified. In the next section we give some quantitative experimental results based on the feature analysis scheme with decisions made using the indicator variables η 1 and η 2 from classification by application of the following rules:
Pattern, is accepted if ηl > p^ (x t ) AND 7/2 > R.(correcι \ xIL, report); Pattemt is rejected if η] < p [x , ) OR 772 < R(correct \ x!L, report).
The fraud detector then processes the received data, being indicative of fraud history, data profiles and customer labels and provides a fraud probability output which is communicated back to the system controller 27.
The foregoing describes onJy a preferred embodiment of the present invention and modifications, obvious to those skilled in the art, can be made thereto without departing from the scope of the present invention.