WO2003038666A1 - Wavelet based fraud detection system - Google Patents

Wavelet based fraud detection system Download PDF

Info

Publication number
WO2003038666A1
WO2003038666A1 PCT/AU2002/001472 AU0201472W WO03038666A1 WO 2003038666 A1 WO2003038666 A1 WO 2003038666A1 AU 0201472 W AU0201472 W AU 0201472W WO 03038666 A1 WO03038666 A1 WO 03038666A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
fraud
model
output
wavelet
Prior art date
Application number
PCT/AU2002/001472
Other languages
French (fr)
Inventor
Rong Jiang
Original Assignee
Inovatech Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inovatech Limited filed Critical Inovatech Limited
Publication of WO2003038666A1 publication Critical patent/WO2003038666A1/en

Links

Classifications

    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07FCOIN-FREED OR LIKE APPARATUS
    • G07F7/00Mechanisms actuated by objects other than coins to free or to actuate vending, hiring, coin or paper currency dispensing or refunding apparatus
    • G07F7/08Mechanisms actuated by objects other than coins to free or to actuate vending, hiring, coin or paper currency dispensing or refunding apparatus by coded identity card or credit card or other personal identification means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/30Payment architectures, schemes or protocols characterised by the use of specific devices or networks
    • G06Q20/34Payment architectures, schemes or protocols characterised by the use of specific devices or networks using cards, e.g. integrated circuit [IC] cards or magnetic cards
    • G06Q20/342Cards defining paid or billed services or quantities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/403Solvency checks
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07FCOIN-FREED OR LIKE APPARATUS
    • G07F7/00Mechanisms actuated by objects other than coins to free or to actuate vending, hiring, coin or paper currency dispensing or refunding apparatus
    • G07F7/02Mechanisms actuated by objects other than coins to free or to actuate vending, hiring, coin or paper currency dispensing or refunding apparatus by keys or other credit registering devices
    • G07F7/025Mechanisms actuated by objects other than coins to free or to actuate vending, hiring, coin or paper currency dispensing or refunding apparatus by keys or other credit registering devices by means, e.g. cards, providing billing information at the time of purchase, e.g. identification of seller or purchaser, quantity of goods delivered or to be delivered

Definitions

  • the present invention relates to fraud detection and in particular to a system of detecting fraud.
  • the invention has been developed primarily for use in detecting utility, telecommunications or credit card fraud and will be described hereinafter with reference to these applications. However, it will be appreciated that the invention is not limited to these particular fields of use.
  • Telecommunication and utility companies such as electricity and gas suppliers, and credit card providers have extensive distribution networks. Due at least in part to their network size, these companies have found it extremely difficult to constantly monitor the total consumption of their services.
  • a utility is aware that the total demand for their product, electricity in the present example, will vary depending on climatic factors or consumer demand. For example, on unusually hot days, tire electricity load increases to compensate for more people switching on coolers and air- conditioners. Consumer demand also varies depending on time of day where, for example, minimal amounts of electricity ar used late at night as compared to when people are up and about or at work.
  • the utilities retrospectively review power consumption data in some or all parts of their distribution network and look for unusual or irregular usage to provide an indicator of the possibility of the presence of fraud. It is the case that a substantial amount of electricity fraud is detected by physical evidence left at the source from where the power is fraudulently removed, for example by a utility employee observing such evidence when monitoring or repairing part of the distribution network.
  • fraud is also a major source of loss for credit card companies and merchant providers. For example, if a credit card is stolen or its number fraudulently appropriated, financial transactions against the credit card can be made before an alert is raised. Credit card companies are known to monitor and manipulate data indicative of transactions that have been recorded to determine if any irregular transaction patterns appear.
  • a similar system for detecting telecommunications fraud is also known.
  • the telecommunication service provider can be made aware when irregular telecommunications patterns are provided for example by mobile phone or other network. When fraudulent telecommunications or data transactions occur, any irregularities in these patterns, including volume or location, can be used to trigger a fraud detector.
  • US Patent No. 6,029,144 discloses a system and method for checking expense entries in a knowledge based system. More particularly, expense entries provided by employees are checked for compliance with predetermined policy rules to detect the possibility that fraud is occurring. The system includes a knowledge based system to determine expense entry compliance with policy and to determine fraud. An auditor workflow system operating in unison with the policy checker guides manual audits of those expense entries that do not comply with the policy rules.
  • a data pattern analyser for detecting behaviour patterns is also employed to indicate the presence of fraud.
  • a prioritiser ascribes waitings to order or rank any violations of the predetermined rules.
  • the analyser and prioritiser are linearly interrelated in an automated system which also generates reports. This system is disadvantageous in that it can only employ historical data on the basis of patterns of non-compliance with rules.
  • US Patent No. 6,094,693 is directed toward a system and method for detecting credit card fraud.
  • the system operates on the premise that fraudulent credit card activity will reflect itself by the appearance of clustered groups of suspicious transactions. Particular transactions are ranked by assigning weights to the individual transactions for use in identifying the suspicious transactions. Indicators such as the geographic region of a transaction or transactions and the time the transaction occurs are considered by the system. That is, this system employs a historical weighted data to determine the presence of fraud.
  • the output of the classifier program is compared to a desired output and one or mox classifier programs are then created and optimised by the evolver program by means of generic programming.
  • the desired output is then again compared to the actual classifier program output and the difference is used to measure the fitness to guide the evolution of the classifier program.
  • This system is only suited to signal processing and control manipulation and detection, for example in classifying myoelectric signals for the control of a prosthesis or classifying remotely sensed spectra for the identification of numerals.
  • the system is not applicable to other fields, such as utility, credit card or telecommunications fraud.
  • Computer systems such as those in the above prior art International Patent application are a crude form of neural networks in which a number of processes are interconnected in a manner analogous to the connection between neurons in a human brain. These systems are able to 'learn' by a process of trial and error.
  • Such neural network techniques have gained acceptance as a useful problem solving tool in the utility supply industry, particularly the electricity industry, with feature selection and extraction being a critical component to achieving good learning abilities and generalised performance in the neural network.
  • the wavelet analysis technique is suited not only for wave forms that are smooth and well behaved, but also for those with abrupt changes, transients or other irregularities due to the localisation and ulti resolution analysis of the wavelets. Furthermore, wavelets can be used to determine the time of smooth or irregular changes, tire type of change by determining the first or second derivatives of the wavelets and the amplitude of the changes. As a result, a more accurate model has been developed for use in credit card fraud. Different features or predictors can be fitted to a wavelet related sub-series and be valuably combined. However, these known wavelet techniques are inefficient in detecting fraud.
  • a method for fraud detection including the steps of; entering data profiles into a feature extractor; entering the data profiles into a wavelet transformer and providing a wavelet decomposition; entering the wavelet decomposition into a processor and providing processed wavelet co-eff ⁇ cients; combining the processed wavelet co-efficients with the raw data profiles and assembling an extracted feature data output; entering the extracted features and data indicative of fraud history and customer labels into a model generator; allocating weights to the data entered into the model generator by means of an allocator; combining the results and validating the combined classified results with a validator; passing the model data from the model generator to the fraud detector together with fraud history data and the feature output data provided by the featured extractor; allocating weights to the extracted features and linearly combining them to provide a linearly combined output; cross-combining thelineariy combined data with the raw fraud history data and providing an output which is indicative of the probability of fraud being present in the inputted data.
  • a fraud detection system including: a feature extractor receiving data input data profiles and providing features; a model generator receiving input of data indicative of customer labels and input data indicative of fraud history together with the extracted features, the model generator providing a model output and an accuracy rate output; and entering the model from the model generator into a fraud detector, together with rae extracted features and data indicative of the fraud history; and wherein the fraud detector provides data output indicative of the probability of fraud.
  • FIG 1 is a schematic overview of the embodiment
  • FIG 2 is a schematic illustration of the fraud detect of FIG 1;
  • FIG 3 is a schematic representation of the feature extractor of FIG 2;
  • FIG 4 is a schematic representation of the model generator of FIG 2;
  • FIG 5 is a schematic representation of a fraud detector of FIG 2;
  • FIG 6 is a schematic illustration the embodiment of FIG 2 employed in a network.
  • is the mother wavelet and its derived forms are given by; where j is the dilation and k is the translation W r f contains the information about the function/near the time point j and near the frequency proportional to 2 ; .
  • the wavelets encode the details while leave the scaling function coding an image of a signal at half resolutions.
  • the stationary wavelet transform (SWT) is popular because of its property of shift invariance: the new sequences have the same length as the original sequence.
  • a fraud detection system 2 which can be used to receive and process real time or historical data.
  • a fraud detector 1 is in communication with a database 1 having historical data indicative of profiles, customer labels, fraud history (if applicable), as well as any other features such as demographic or business information.
  • the fraud detector receives the data from the database and provides data to the database indicative of the probability of fraud occurring based on the stored data.
  • Fig. 2 illustrates the system for using the data to detect fraud.
  • Three aspects of data fraud history (if applicable), data profiles and customer labels are provided to the detector.
  • Data profiles are entered into a feature extractor 3 which extracts relevant features from the time series data profiles and provides a feature output 21.
  • a model generator 4 receives the feature output 21 from the feature extractor 3 as well as the fraud history data and customer labels data.
  • the model generator 4 men provides a fraud detection model a fraud detection model as well as an accuracy rate of the model
  • the model data output from the model generator 4 and the feature data output of the feature , extractor 3 are entered into a fraud detector 5 together with the fraud history data.
  • the model generator 4 uses input to train and test the model to be used, analogous to a neural type network.
  • the detector 5 uses the model data to analyse the feature data 21 and determine the fraud probability,
  • the feature extractor 3, model generator 4 and fraud detector 5 are described further below.
  • the dataset is entered into a wavelet transformer 6 which provides a wavelet decomposition output denoted W'j. It is noted, however, that a number of other wavelets can be used.
  • a "Daub 6" or a "Syn 8" filter is used for detecting abrupt and gradual changes respectively in wavelet decomposition.
  • Wavelet analysis provides a means of implementing waveform recognition, pattern identification, pattern change location and spectral distribution in both the time and wavelet domains. In the preferred embodiment, a total of eight levels of wavelet decomposition are employed, however, this number can vary.
  • the resulting co-efficients W'i are processed by a processor 7.
  • the processor 7 includes removing or minimising any noise in the signal, spectrum processing, wavelet shrinkage or any other desired process.
  • Processor 7 co-efficients are inputted into a local and global statistical analyser 8 which processes the processor coefficients W'j.
  • the analyser 8 applies the local and global statistical analysis to process co-efficients A t as well as to the raw data profiles denoted Ai.
  • the output of the analyser 8 is then combined at 9.
  • the resulting features of the combined analyser outputs are assembled in a matrix for all profiles in the data profiles batch.
  • This assembled matrix output provides the feature extractor output for use in the model generator 4 and fraud detector 5.
  • a local label spectrum can be defined that measures the contribution to the total energy from the vicinity at point x at a resolution level j, and unveils the frequency components of a subseries of that level. This is useful for detecting the spectral changes in the wavelet domain.
  • the data is normalised and compressed through wavelet shrinkage, as described above. That is, J is the number of the total decomposition levels W j wavelet decomposition co-efficients or spectrum in the j ch resolution level ⁇ x ⁇ ⁇ l ⁇ L ) the 3 th feature for a dataset, P j processing, A t ⁇ ⁇ i ⁇ I) analysis m - e t me domai , ⁇ k i l ⁇ k ⁇ ⁇ ) analysis in the wavelet domain, a
  • the generator 4 includes a model training portion 15 and a model testing portion 11.
  • the model training 15 and model testing 11 portions each include two classifiers 12 and 17.
  • the classifiers receive input from the feature extractor as well as the customer labels data and fraud history.
  • the arrangement of the classifiers 12 and 17 form aBa-yes and neural network (multiple layer perceptrons), however, any number of classifiers may be used.
  • weightings used by the classifiers 12 and 17 in classifying data are assigned by an allocators 13 and 16 respectively.
  • the allocator provides an output to the classifiers 17 which also receive input from the customer label data and feature extractor to generate a new model.
  • the results are combined at a combiner denoted 18, and are then tested by model tester 11 using a similar structure to the model trainer 15.
  • Classification of the data occurs by a pair of classifiers 12 whose output is input to an allocator 13 which assigns weights to fraud history data and provides a comparison by comparator 14 of known customer data labels with the test results.
  • the model is validated at a validator 19 and the output together with the associated accuracy rate is provided, or a ne iteration starts if the accuracy rate is not satisfactory.
  • Fig. 5 there is illustrated the fraud detector 5.
  • the extracted features and any number of other features such as demographic data or personal information for example, are inputted into the fraud detector denoted by 20 and 21.
  • Allocator 22 ascribes weights to the extracted features and linearly combines them at linear combiner 23.
  • the output of the linear combiner 23 together with the raw and weighted features provided from the feature extractor and data indicative of the fraud history are classified by classifiers 24 to provide an output which is cross combined by a cross combiner 25 to produce a fraud probability.
  • the cross combiner 25 is configured to receive the outputs of the plurality of classified outputs provided by the classifier 24.
  • a soft competition algorithm can be applied to select optimal feature vectors from all of the extracted features provided by die feature extractor 3.
  • Fig. 6 illustrates the fraud detector used in an electrical utility network.
  • the fraud detector 30 receives data 29 from a controller 27 in communication with the fraud detector 30.
  • the controller 27 includes a database 27 in communication with the controller which communicates with an electricity meter and concentrator network.
  • the metering concentrator network data indicative of electricity use is communicated to the controller database 27.
  • This information is processed by the controller 27 and stored in database 28 and transmitted to the fraud detector which calculates the likely probability of fraud based on the system data.
  • the data 29 transmitted to the fraud detector 30 can be real-time data. Evaluation criteria
  • the number of error types equals N -N.
  • e pq the cost of an error made by wrongly classifying p into q where p ⁇ q and c n the corresponding type of errors
  • acceptance accuracy rate is defined as the ratio of the number of the correctly reported instances of fraud to the total number of reported cases of fraud given a dataset X with total / meter readings and L analyzed features:
  • N( report) is the total number of cases of fraud that is reported
  • N( correct n report) the total number of cases of fraud that is reported and correctly identified.
  • Pattern is accepted if ⁇ l > p ⁇ (x t ) AND 7/2 > R.(correc ⁇ ⁇ x IL , report); Pattem t is rejected if ⁇ ] ⁇ p [x , ) OR 772 ⁇ R(correct ⁇ x !L , report).
  • the fraud detector then processes the received data, being indicative of fraud history, data profiles and customer labels and provides a fraud probability output which is communicated back to the system controller 27.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Computer Security & Cryptography (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Finance (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A wavelet based method for fraud detection is disclosed. The method includes entering time series data profiles into a feature extractor wavelet transformer (3), providing a wavelet decomposition, entering the wavelet decomposition into a processor (7), combining the processed wavelet co-efficients with the raw data profiles and assembling an extracted feature data output (21). The extracted data and data indicative of fraud history and customer labels is entered into a model generator (4) to be and classified (12)and then validated (19). Model data is then passed to an input of the fraud detector (5) together with fraud history data and the extracted feature data. Weights are allocated (22) to the extracted features and linearly combined (23). The combined output is classified together with the raw data of the fraud history to provide an output that is a cross combination (25) of the raw data and the data which is indicative of the probability of fraud.

Description

WAVELET BASED FRAUD DETECTION SYSTEM
Field of the Invention
The present invention relates to fraud detection and in particular to a system of detecting fraud.
The invention has been developed primarily for use in detecting utility, telecommunications or credit card fraud and will be described hereinafter with reference to these applications. However, it will be appreciated that the invention is not limited to these particular fields of use.
Background of the Invention
Telecommunication and utility companies, such as electricity and gas suppliers, and credit card providers have extensive distribution networks. Due at least in part to their network size, these companies have found it extremely difficult to constantly monitor the total consumption of their services.
In the large utility networks for example, losses due to inefficiencies in di system are prevalent. For electricity suppliers, these losses include resistive power transmission losses and transformers inefficiencies and leakages for gas utilities. Other losses by the utilities occur through unregulated Supply, fraudulent activity and faults in the systems for example.
Using the electricity utilities as an example, it is the case that fraudulent misappropriation of electricity which is unaccounted for causes significant loss to the electricity utility, A utility is aware that the total demand for their product, electricity in the present example, will vary depending on climatic factors or consumer demand. For example, on unusually hot days, tire electricity load increases to compensate for more people switching on coolers and air- conditioners. Consumer demand also varies depending on time of day where, for example, minimal amounts of electricity ar used late at night as compared to when people are up and about or at work.
In so far as fraudulently appropriated gas or electricity is small compared to the amount provided through the entire utilities network, it is very difficult to detect the small variations due to fraud. The electricity utilities for example are physically unable to monitor each part of their network to detemήne power consumption so as to detect fraudulent appropriation.
Generally, the utilities retrospectively review power consumption data in some or all parts of their distribution network and look for unusual or irregular usage to provide an indicator of the possibility of the presence of fraud. It is the case that a substantial amount of electricity fraud is detected by physical evidence left at the source from where the power is fraudulently removed, for example by a utility employee observing such evidence when monitoring or repairing part of the distribution network.
In other fields, such as credit card networks, fraud is also a major source of loss for credit card companies and merchant providers. For example, if a credit card is stolen or its number fraudulently appropriated, financial transactions against the credit card can be made before an alert is raised. Credit card companies are known to monitor and manipulate data indicative of transactions that have been recorded to determine if any irregular transaction patterns appear.
For example, should purchases be made at unusual places such as a different country to the card holder, or at various times of day when transactions are not normally conducted by the card holder. Such fraud detection systems, much like the utility company methods, are generally retrospective based on data recorded for events which have occurred unless a major irregularity in transactions occur, such as transactions for unusually large amounts.
A similar system for detecting telecommunications fraud is also known. For example, the telecommunication service provider can be made aware when irregular telecommunications patterns are provided for example by mobile phone or other network. When fraudulent telecommunications or data transactions occur, any irregularities in these patterns, including volume or location, can be used to trigger a fraud detector.
In all known fraud detection systems only historical data can be processed to determine if fraud has occurred except in the presence of exceptional circumstances such as very large transactions or utility consumption. US Patent No. 6,029,144 discloses a system and method for checking expense entries in a knowledge based system. More particularly, expense entries provided by employees are checked for compliance with predetermined policy rules to detect the possibility that fraud is occurring. The system includes a knowledge based system to determine expense entry compliance with policy and to determine fraud. An auditor workflow system operating in unison with the policy checker guides manual audits of those expense entries that do not comply with the policy rules.
A data pattern analyser for detecting behaviour patterns is also employed to indicate the presence of fraud. As part of the system, a prioritiser ascribes waitings to order or rank any violations of the predetermined rules. The analyser and prioritiser are linearly interrelated in an automated system which also generates reports. This system is disadvantageous in that it can only employ historical data on the basis of patterns of non-compliance with rules.
US Patent No. 6,094,693 is directed toward a system and method for detecting credit card fraud. The system operates on the premise that fraudulent credit card activity will reflect itself by the appearance of clustered groups of suspicious transactions. Particular transactions are ranked by assigning weights to the individual transactions for use in identifying the suspicious transactions. Indicators such as the geographic region of a transaction or transactions and the time the transaction occurs are considered by the system. That is, this system employs a historical weighted data to determine the presence of fraud.
International Patent Application No. WO99/04329 discloses a method of employing evolving classifier programs for signal processing and control. A software program, or evolver, is used to examine a large number of features which may be from multiple data sources to create a "classifier" program.
The output of the classifier program is compared to a desired output and one or mox classifier programs are then created and optimised by the evolver program by means of generic programming. The desired output is then again compared to the actual classifier program output and the difference is used to measure the fitness to guide the evolution of the classifier program. This system is only suited to signal processing and control manipulation and detection, for example in classifying myoelectric signals for the control of a prosthesis or classifying remotely sensed spectra for the identification of numerals. The system is not applicable to other fields, such as utility, credit card or telecommunications fraud. Computer systems, such as those in the above prior art International Patent application are a crude form of neural networks in which a number of processes are interconnected in a manner analogous to the connection between neurons in a human brain. These systems are able to 'learn' by a process of trial and error. Such neural network techniques have gained acceptance as a useful problem solving tool in the utility supply industry, particularly the electricity industry, with feature selection and extraction being a critical component to achieving good learning abilities and generalised performance in the neural network. Previously published technical papers such as those by Bishop "Neural networks for pattern recognition", Oxford University Press, 1995; Neal "Bayesian learning for neural networks, Lecture Notes in Statistics", Springer Press, 1994; and El-Sharkawi, "Neural network and its ancillary techniques as applied to power systems", 1995 have discussed these topics. Unfortunately, these principles rely primarily on stationary and time invariant properties whereas the patterns of utility fraud, for example, electricity gas or water theft are non-stationary, noisy, time varying and have multi-scale properties. That is, fraudulent activities are randomly initiated and terminated and the methods, quantity and frequency of the fraudulent activities change over time. As a result, it is very difficult to analyse such data using conventional techniques.
These techniques are also equally applicable for determining the presence of fraud in other fields, such as credit card networks, telecommunications networks or the like. There are no current approaches for detecting such fraud in the utility networks and those known systems for use in detecting credit card telecommunications fraud are not applicable to fields other than their own. In very recent years, wavelets have been successfully introduced as an efficient tool for use in various time series analyses, such as those by Abry et al, IEEE Trans. On Information Theory, 44(1), pp2-15, 1998 and Meyer, Proc. International Conference Wavelets and Applications, France, 1989.
The wavelet analysis technique is suited not only for wave forms that are smooth and well behaved, but also for those with abrupt changes, transients or other irregularities due to the localisation and ulti resolution analysis of the wavelets. Furthermore, wavelets can be used to determine the time of smooth or irregular changes, tire type of change by determining the first or second derivatives of the wavelets and the amplitude of the changes. As a result, a more accurate model has been developed for use in credit card fraud. Different features or predictors can be fitted to a wavelet related sub-series and be valuably combined. However, these known wavelet techniques are inefficient in detecting fraud.
Object of the Invention It is an object of the invention to overcome or substantially ameliorate any one of the above disadvantages of the prior art or to provide a useful alternative.
Surøjorøry of the Invention
According to a first aspect of the invention there is provided a method for fraud detection including the steps of; entering data profiles into a feature extractor; entering the data profiles into a wavelet transformer and providing a wavelet decomposition; entering the wavelet decomposition into a processor and providing processed wavelet co-effϊcients; combining the processed wavelet co-efficients with the raw data profiles and assembling an extracted feature data output; entering the extracted features and data indicative of fraud history and customer labels into a model generator; allocating weights to the data entered into the model generator by means of an allocator; combining the results and validating the combined classified results with a validator; passing the model data from the model generator to the fraud detector together with fraud history data and the feature output data provided by the featured extractor; allocating weights to the extracted features and linearly combining them to provide a linearly combined output; cross-combining thelineariy combined data with the raw fraud history data and providing an output which is indicative of the probability of fraud being present in the inputted data.
According to a second aspect of the invention there is provided a fraud detection system including: a feature extractor receiving data input data profiles and providing features; a model generator receiving input of data indicative of customer labels and input data indicative of fraud history together with the extracted features, the model generator providing a model output and an accuracy rate output; and entering the model from the model generator into a fraud detector, together with rae extracted features and data indicative of the fraud history; and wherein the fraud detector provides data output indicative of the probability of fraud.
Brief Description of the Drawings
A preferred embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
FIG 1 is a schematic overview of the embodiment;
FIG 2 is a schematic illustration of the fraud detect of FIG 1;
FIG 3 is a schematic representation of the feature extractor of FIG 2;
FIG 4 is a schematic representation of the model generator of FIG 2; FIG 5 is a schematic representation of a fraud detector of FIG 2; and
FIG 6 is a schematic illustration the embodiment of FIG 2 employed in a network.
Detailed Description of the Preferred Embodiment
In wavelet analysis under certain conditions (Chui, "An Introduction to Wavelets", Academic Press, 1995), a function/is the signal domain can be transformed into fee wavelet domain by applying a discrete wavelet transform (WT)
Figure imgf000007_0001
Where ^is the mother wavelet and its derived forms are given by;
Figure imgf000007_0002
where j is the dilation and k is the translation Wrf contains the information about the function/near the time point j and near the frequency proportional to 2;. At each time and frequency point, the wavelets encode the details while leave the scaling function coding an image of a signal at half resolutions. The stationary wavelet transform (SWT) is popular because of its property of shift invariance: the new sequences have the same length as the original sequence. The slight change in ψjk {x) is {ψjk {x)}= 2 ψfy {x~k) je Z,ke= Z . That is, for all integers k ≥ 0 and j ≥ 0, coefficients at all resolution levels; appear at all positions k.
Referring now to Fig. 1, there is illustrated a fraud detection system 2 which can be used to receive and process real time or historical data. A fraud detector 1 is in communication with a database 1 having historical data indicative of profiles, customer labels, fraud history (if applicable), as well as any other features such as demographic or business information. The fraud detector receives the data from the database and provides data to the database indicative of the probability of fraud occurring based on the stored data.
Fig. 2 illustrates the system for using the data to detect fraud. Three aspects of data: fraud history (if applicable), data profiles and customer labels are provided to the detector. Data profiles are entered into a feature extractor 3 which extracts relevant features from the time series data profiles and provides a feature output 21. A model generator 4 receives the feature output 21 from the feature extractor 3 as well as the fraud history data and customer labels data.
The model generator 4 men provides a fraud detection model a fraud detection model as well as an accuracy rate of the model The model data output from the model generator 4 and the feature data output of the feature, extractor 3 are entered into a fraud detector 5 together with the fraud history data. The model generator 4 uses input to train and test the model to be used, analogous to a neural type network. The detector 5 uses the model data to analyse the feature data 21 and determine the fraud probability,
The feature extractor 3, model generator 4 and fraud detector 5 are described further below. The dataset is entered into a wavelet transformer 6 which provides a wavelet decomposition output denoted W'j. It is noted, however, that a number of other wavelets can be used.
Referring to Fig. 3, there is illustrated one implementation of the feature extractor 3 wherein a "Daub 6" or a "Syn 8" filter is used for detecting abrupt and gradual changes respectively in wavelet decomposition. Wavelet analysis provides a means of implementing waveform recognition, pattern identification, pattern change location and spectral distribution in both the time and wavelet domains. In the preferred embodiment, a total of eight levels of wavelet decomposition are employed, however, this number can vary. After the wavelet transformation has been applied to the time series data, the resulting co-efficients W'i are processed by a processor 7. The processor 7 includes removing or minimising any noise in the signal, spectrum processing, wavelet shrinkage or any other desired process. Processor 7 co-efficients are inputted into a local and global statistical analyser 8 which processes the processor coefficients W'j. The analyser 8 applies the local and global statistical analysis to process co-efficients At as well as to the raw data profiles denoted Ai. The output of the analyser 8 is then combined at 9.
The resulting features of the combined analyser outputs are assembled in a matrix for all profiles in the data profiles batch. This assembled matrix output provides the feature extractor output for use in the model generator 4 and fraud detector 5.
It is known that a local label spectrum can be defined that measures the contribution to the total energy from the vicinity at point x at a resolution level j, and unveils the frequency components of a subseries of that level. This is useful for detecting the spectral changes in the wavelet domain.
Most of the available data is usually collected for record and book keeping purposes rather than for fraud detection and often includes some level of noise. In the present embodiment, the data is normalised and compressed through wavelet shrinkage, as described above. That is, J is the number of the total decomposition levels Wj wavelet decomposition co-efficients or spectrum in the jch resolution level \xι ≤ l ≤ L) the 3th feature for a dataset, Pj processing, At { ≤ i ≤ I) analysis m - e t me domai , ^k il ≤ k ≤ κ) analysis in the wavelet domain, a
+multip)e combination and a J -> the assembly. The wavelet shrinkages are based on known processes.
Referring to Fig. 4, there is schematically illustrated a model generator 4. The generator 4 includes a model training portion 15 and a model testing portion 11. The model training 15 and model testing 11 portions each include two classifiers 12 and 17. The classifiers receive input from the feature extractor as well as the customer labels data and fraud history. The arrangement of the classifiers 12 and 17 form aBa-yes and neural network (multiple layer perceptrons), however, any number of classifiers may be used.
For model training, weightings used by the classifiers 12 and 17 in classifying data are assigned by an allocators 13 and 16 respectively. The allocator provides an output to the classifiers 17 which also receive input from the customer label data and feature extractor to generate a new model. The results are combined at a combiner denoted 18, and are then tested by model tester 11 using a similar structure to the model trainer 15. Classification of the data occurs by a pair of classifiers 12 whose output is input to an allocator 13 which assigns weights to fraud history data and provides a comparison by comparator 14 of known customer data labels with the test results.
The model is validated at a validator 19 and the output together with the associated accuracy rate is provided, or a ne iteration starts if the accuracy rate is not satisfactory.
Referring to Fig. 5, there is illustrated the fraud detector 5. The extracted features and any number of other features such as demographic data or personal information for example, are inputted into the fraud detector denoted by 20 and 21. Allocator 22 ascribes weights to the extracted features and linearly combines them at linear combiner 23.
The output of the linear combiner 23 together with the raw and weighted features provided from the feature extractor and data indicative of the fraud history are classified by classifiers 24 to provide an output which is cross combined by a cross combiner 25 to produce a fraud probability.
The cross combiner 25 is configured to receive the outputs of the plurality of classified outputs provided by the classifier 24. In the event that there exists a probabilistic relationship between the raw data profile, time series data, and a set of features, a soft competition algorithm can be applied to select optimal feature vectors from all of the extracted features provided by die feature extractor 3.
By way of example, for a case where there are M classes C,, ..., Cro and N classifiers CLU ..., CLn- the probability that a data sample D belongs to the * class is: P(xι) = P(De Cffl | Oι = l)
Where P(Oι = 1) is the probability that the l t,hn f .ea re vector {xi} is optimal.
Considering a feature matrix a, with I records and L features, the probability that the iΛ sample belongs to class Cm classified by the classifier CLn ύi terms of the input feature vector χu(l = i< I; l ≤ ≤ L) is:
Figure imgf000011_0001
If gn is the weight function for the πA classifier, then the probability of the i'h set data belonging to tha m" class is:
Figure imgf000011_0002
As can be seen, either historical or real-time data can be provided to the fraud detection system. Fig. 6 illustrates the fraud detector used in an electrical utility network. The fraud detector 30 receives data 29 from a controller 27 in communication with the fraud detector 30. The controller 27 includes a database 27 in communication with the controller which communicates with an electricity meter and concentrator network. The metering concentrator network data indicative of electricity use is communicated to the controller database 27. This information is processed by the controller 27 and stored in database 28 and transmitted to the fraud detector which calculates the likely probability of fraud based on the system data. It is noted that the data 29 transmitted to the fraud detector 30 can be real-time data. Evaluation criteria
If there are N classification labels, the number of error types equals N -N. By assigning epq to the cost of an error made by wrongly classifying p into q where p ≠ q and cn the corresponding type of errors, the total cost can e expressed as c= ∑ J∑^^M • Errors do not equally impact the performance. The question is which error is more significant. False positive identification will wrongly identify honest customers and may cause unnecessary investigation and the loss of honest customers. False negative identification ignores genuine fraudulent activities and loss of revenue for the utilities.
Presently, acceptance accuracy rate is defined as the ratio of the number of the correctly reported instances of fraud to the total number of reported cases of fraud given a dataset X with total / meter readings and L analyzed features:
R{correct | (7)
Figure imgf000012_0001
where N( report) is the total number of cases of fraud that is reported, N( correct n report) the total number of cases of fraud that is reported and correctly identified. In the next section we give some quantitative experimental results based on the feature analysis scheme with decisions made using the indicator variables η 1 and η 2 from classification by application of the following rules:
Pattern, is accepted if ηl > p^ (x t ) AND 7/2 > R.(correcι \ xIL, report); Pattemt is rejected if η] < p [x , ) OR 772 < R(correct \ x!L, report).
The fraud detector then processes the received data, being indicative of fraud history, data profiles and customer labels and provides a fraud probability output which is communicated back to the system controller 27.
The foregoing describes onJy a preferred embodiment of the present invention and modifications, obvious to those skilled in the art, can be made thereto without departing from the scope of the present invention.

Claims

THE CLAMS DEFINING THE INVENTION ARE AS FOLLOWS:-
1. A method for fraud detection including the steps of: entering data profiles into a feature extractor; entering time-series the data profiles into a wavelet transformer and providing a wavelet decomposition; entering the wavelet decomposition into a processor and providing processed wavelet co-efficients; combining the processed wavelet co-efficients with the raw data profiles and assembling an extracted feature data output; entering the extracted features and data indicative of fraud history and customer labels into a model generator; allocating weights to the data entered into the model generator by means of an allocator; combining the results and validating the combined classified results with a validator; passing the model data from the model generator to the fraud detector together with fraud history data and the feature output data provided by the featured extractor; allocating weights to the extracted features and linearly combining them to provide a linearly combined output; cross combining the linearly combined data with the raw fraud history data and providing which is indicative of the probability of fraud being present in the inputted data.
2. A method according to claim 1 including the step of providing an accuracy data output from the model generator.
3. A method according to claim 1 or 2 wherein the step of model training and model testing by using two classifiers includes the step of placing the classifiers in aBayes or statistical analysis network and in a neural network.
4. A method according to any one of claims 1 to 3 wherein the step of validating the model in the model generator includes the step of repeating the model generation step if the model is not of a predetermined accuracy.
5. A method according to any one of claims 1 to 4 where data profiles are extracted from a remote database.
6. A method according to claim 1 or claim 4 wherein the step of cross-combining the linearly combined data with the raw fraud history data includes providing a soft competition algorithm when there exists a probabilistic relationship between the raw and linearly combined data, the algorithm being applied to select optimal vectors from all extracted features.
7. A fraud detection system including: a feature extractor receiving time-series data input data profiles and providing features; a model generator receiving input of data indicative of customer labels and input data indicative of fraud history together with the extracted features, the model generator providing a model output and an accuracy rate output; and entering the model from the model generator into a fraud detector, together with the extracted features and data indicative of the fraud history; and wherein the fraud detector provides data output indicative of the probability of fraud.
8. A system according to claim 7 wherein time series data profiles, customer labels fraud history and other data are provided from a database in communication with the fraud detection system.
9. A system according to claim 7 or 8 wherein the data source is a remotely located data warehouse in communication with the fraud detection system, the database receiving data indicative of utility meter and concentrator network performance.
10. a system according to any one of claims 7 to 9 where the model generator employs a Bayes neural network for use in providing the extracted feature output.
PCT/AU2002/001472 2001-11-01 2002-11-01 Wavelet based fraud detection system WO2003038666A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPR8630A AUPR863001A0 (en) 2001-11-01 2001-11-01 Wavelet based fraud detection
AUPR8630 2001-11-01

Publications (1)

Publication Number Publication Date
WO2003038666A1 true WO2003038666A1 (en) 2003-05-08

Family

ID=3832452

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2002/001472 WO2003038666A1 (en) 2001-11-01 2002-11-01 Wavelet based fraud detection system

Country Status (2)

Country Link
AU (1) AUPR863001A0 (en)
WO (1) WO2003038666A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8036967B2 (en) 2007-01-12 2011-10-11 Allegacy Federal Credit Union Bank card fraud detection and/or prevention methods
US8352315B2 (en) 2009-05-04 2013-01-08 Visa International Service Association Pre-authorization of a transaction using predictive modeling
CN103245861A (en) * 2013-05-03 2013-08-14 云南电力试验研究院(集团)有限公司电力研究院 Transformer fault diagnosis method based on Bayesian network
CN105260615A (en) * 2015-10-29 2016-01-20 河南工业大学 Grain consumption forecasting method
CN109657890A (en) * 2018-09-14 2019-04-19 阿里巴巴集团控股有限公司 A kind of risk for fraud of transferring accounts determines method and device
CN110084603A (en) * 2018-01-26 2019-08-02 阿里巴巴集团控股有限公司 Method, detection method and the corresponding intrument of training fraudulent trading detection model
US10656190B2 (en) 2017-04-13 2020-05-19 Oracle International Corporation Non-parametric statistical behavioral identification ecosystem for electricity fraud detection
CN111626322A (en) * 2020-04-08 2020-09-04 中南大学 Application activity identification method of encrypted flow based on wavelet transformation
CN111800546A (en) * 2020-07-07 2020-10-20 中国工商银行股份有限公司 Method, device and system for constructing recognition model and recognizing and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5819226A (en) * 1992-09-08 1998-10-06 Hnc Software Inc. Fraud detection using predictive modeling
US6029144A (en) * 1997-08-29 2000-02-22 International Business Machines Corporation Compliance-to-policy detection method and system
WO2001031420A2 (en) * 1999-10-25 2001-05-03 Visa International Service Association Features generation for use in computer network intrusion detection
WO2001035301A1 (en) * 1999-11-09 2001-05-17 Fraud-Check.Com, Inc. Method and system for detecting fraud in non-personal transactions
US6281814B1 (en) * 1997-07-31 2001-08-28 Yamatake Corporation Data conversion method, data converter, and program storage medium
US6290654B1 (en) * 1998-10-08 2001-09-18 Sleep Solutions, Inc. Obstructive sleep apnea detection apparatus and method using pattern recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5819226A (en) * 1992-09-08 1998-10-06 Hnc Software Inc. Fraud detection using predictive modeling
US6281814B1 (en) * 1997-07-31 2001-08-28 Yamatake Corporation Data conversion method, data converter, and program storage medium
US6029144A (en) * 1997-08-29 2000-02-22 International Business Machines Corporation Compliance-to-policy detection method and system
US6290654B1 (en) * 1998-10-08 2001-09-18 Sleep Solutions, Inc. Obstructive sleep apnea detection apparatus and method using pattern recognition
WO2001031420A2 (en) * 1999-10-25 2001-05-03 Visa International Service Association Features generation for use in computer network intrusion detection
WO2001035301A1 (en) * 1999-11-09 2001-05-17 Fraud-Check.Com, Inc. Method and system for detecting fraud in non-personal transactions

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8036967B2 (en) 2007-01-12 2011-10-11 Allegacy Federal Credit Union Bank card fraud detection and/or prevention methods
US9984379B2 (en) 2009-05-04 2018-05-29 Visa International Service Association Determining targeted incentives based on consumer transaction history
US9727868B2 (en) 2009-05-04 2017-08-08 Visa International Service Association Determining targeted incentives based on consumer transaction history
US8352315B2 (en) 2009-05-04 2013-01-08 Visa International Service Association Pre-authorization of a transaction using predictive modeling
US9489674B2 (en) 2009-05-04 2016-11-08 Visa International Service Association Frequency-based transaction prediction and processing
US9773246B2 (en) 2009-05-04 2017-09-26 Visa International Service Association Pre-authorization of a transaction using predictive modeling
CN103245861A (en) * 2013-05-03 2013-08-14 云南电力试验研究院(集团)有限公司电力研究院 Transformer fault diagnosis method based on Bayesian network
CN105260615B (en) * 2015-10-29 2018-04-17 河南工业大学 A kind of grain consumption Forecasting Methodology
CN105260615A (en) * 2015-10-29 2016-01-20 河南工业大学 Grain consumption forecasting method
US10656190B2 (en) 2017-04-13 2020-05-19 Oracle International Corporation Non-parametric statistical behavioral identification ecosystem for electricity fraud detection
US10948526B2 (en) 2017-04-13 2021-03-16 Oracle International Corporation Non-parametric statistical behavioral identification ecosystem for electricity fraud detection
CN110084603A (en) * 2018-01-26 2019-08-02 阿里巴巴集团控股有限公司 Method, detection method and the corresponding intrument of training fraudulent trading detection model
CN110084603B (en) * 2018-01-26 2020-06-16 阿里巴巴集团控股有限公司 Method for training fraud transaction detection model, detection method and corresponding device
CN109657890A (en) * 2018-09-14 2019-04-19 阿里巴巴集团控股有限公司 A kind of risk for fraud of transferring accounts determines method and device
CN109657890B (en) * 2018-09-14 2023-04-25 蚂蚁金服(杭州)网络技术有限公司 Method and device for determining risk of money transfer fraud
CN111626322A (en) * 2020-04-08 2020-09-04 中南大学 Application activity identification method of encrypted flow based on wavelet transformation
CN111626322B (en) * 2020-04-08 2024-01-05 中南大学 Application activity recognition method for encrypted traffic based on wavelet transformation
CN111800546A (en) * 2020-07-07 2020-10-20 中国工商银行股份有限公司 Method, device and system for constructing recognition model and recognizing and electronic equipment

Also Published As

Publication number Publication date
AUPR863001A0 (en) 2001-11-29

Similar Documents

Publication Publication Date Title
Jiang et al. Wavelet based feature extraction and multiple classifiers for electricity fraud detection
Xia et al. Detection methods in smart meters for electricity thefts: A survey
Biswas et al. Electricity theft pinpointing through correlation analysis of master and individual meter readings
Angelos et al. Detection and identification of abnormalities in customer consumptions in power distribution systems
US7113932B2 (en) Artificial intelligence trending system
Ghasemi et al. Detection of illegal consumers using pattern classification approach combined with Levenberg-Marquardt method in smart grid
Dorronsoro et al. Neural fraud detection in credit card operations
Monedero et al. Midas: Detection of non-technical losses in electrical consumption using neural networks and statistical techniques
US20100257092A1 (en) System and method for predicting a measure of anomalousness and similarity of records in relation to a set of reference records
WO2005101265A2 (en) Systems and methods for investigation of financial reporting information
CN112200583B (en) Knowledge graph-based fraudulent client identification method
CN110309884A (en) Electricity consumption data anomalous identification system based on ubiquitous electric power Internet of Things net system
Leonard The development of a rule based expert system model for fraud alert in consumer credit
CN113554361B (en) Comprehensive energy system data processing and calculating method and processing system
CN113221931A (en) Electricity stealing prevention intelligent identification method based on electricity utilization information acquisition big data analysis
CN112257013A (en) Electricity stealing user identification and positioning method based on dynamic time warping algorithm for high-loss distribution area
WO2003038666A1 (en) Wavelet based fraud detection system
Muniz et al. Irregularity detection on low tension electric installations by neural network ensembles
CN112464281B (en) Network information analysis method based on privacy grouping and emotion recognition
Murthy et al. A naive bayes classifier for detecting unusual customer consumption profiles in power distribution systems-APSPDCL
CN114167837B (en) Intelligent fault diagnosis method and system for railway signal system
Yan et al. Comparative study of electricity-theft detection based on gradient boosting machine
Emadaleslami et al. A Machine Learning Approach to Detect Energy Fraud in Smart Distribution Network
Frank et al. Applications of neural networks to telecommunications systems
Sebastian et al. A comparative analysis of deep neural network models in IoT‐based smart systems for energy prediction and theft detection

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP