CN110322357A - Anomaly assessment method, apparatus, computer equipment and the medium of data - Google Patents
Anomaly assessment method, apparatus, computer equipment and the medium of data Download PDFInfo
- Publication number
- CN110322357A CN110322357A CN201910463901.7A CN201910463901A CN110322357A CN 110322357 A CN110322357 A CN 110322357A CN 201910463901 A CN201910463901 A CN 201910463901A CN 110322357 A CN110322357 A CN 110322357A
- Authority
- CN
- China
- Prior art keywords
- data
- feature
- value
- mould
- fisrt feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
Abstract
Anomaly assessment method, apparatus, computer equipment and the medium of a kind of data provided herein, the anomaly assessment method of data therein import detection system by obtaining the test data for needing to evaluate and test, and by the test data;The characteristic for extracting the Claims Resolution data, is handled by the Rating Model of detection system, calculates test data score value;The numerical values recited of contrast test data score value and model data score value obtains value-at-risk and Risk Results.The application can learn the overall distribution profile of normal data using the PCA algorithm of unsupervised learning, and the thought based on abnormality detection, without the concern for the distribution and variation of history abnormal data, accuracy is high.
Description
Technical field
This application involves the anti-fraud fields of insurance, in particular to the anomaly assessment method, apparatus of a kind of data, computer are set
Standby and medium.
Background technique
There are following pain spots for the anti-fraud air control Rating Model of insurance at present: taking advantage of in most of insurance company's history Claims Resolution data
The record for cheating data is seldom, and the ratio of a large amount of normal data and few abnormal data is extremely uneven, leads to much have supervision
Machine learning air control model can not use or keep its mode of learning single, less effective.Based on this, needs one kind and pass through ginseng
Examine the method that a large amount of normal datas can identify fraud data.
Summary of the invention
The main purpose of the application is to provide anomaly assessment method, apparatus, computer equipment and the medium of a kind of data, purport
It is solving the above problems.
To achieve the above object, this application provides a kind of anomaly assessment methods of data, comprising steps of
Obtain the normal data in the historical test data of preset quantity;
Feature Selection is carried out to the normal data, obtains all essential features and each institute of the normal data
State essential feature multiple fisrt feature data accordingly;
Feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
Calculate the first difference value of multiple fisrt feature data Yu multiple history restoring datas;
Multiple first difference values are brought into sigmoid Function Mapping to (0,1), then by default times of result amplification
Number, obtains multiple risk score values of the normal data, and its maximum value is taken to obtain model data score value SMould;
Obtain the test data for needing to evaluate and test;
Feature Selection is carried out to the test data according to the essential feature of normal data, obtains the institute of the test data
It is necessary to feature and each described corresponding second feature data of essential feature;
Calculate the second difference value of the second feature data Yu the history restoring data;
Second difference value is brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple,
Multiple risk score values of the test data are obtained, and its maximum value is taken to obtain model data score value SIt surveys;
By SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
Further, described the step of feature reduction is carried out to the fisrt feature data, obtains history restoring data, packet
It includes:
The fisrt feature data are normalized, the normalization creep function of historical data is obtained;
Fisrt feature matrix is converted by the normalization creep function of the historical data;
Feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history restoring data.
Further, it is described to the normal data carry out Feature Selection, obtain the normal data institute it is necessary to spies
The step of sign and each described essential feature corresponding fisrt feature data, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, obtain the normal data all essential features and each described in must
Want feature multiple fisrt feature data accordingly.
Further, described by SMould、SIt surveysThe step of being compared by preset rules, obtaining Risk Results include:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
Further, described that the fisrt feature data are normalized, obtain the normalization creep function of historical data
Step includes:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
The result that each of feature data are successively subtracted the minimum value obtains feature divided by the difference
Normalize numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
Further, the step of normalization creep function by the historical data is converted into fisrt feature matrix, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature
The fisrt feature matrix of vector composition.
Further, the contribution rate are as follows:Wherein, contrib is contribution rate, and si is spy
Value indicative.
The application proposes a kind of anomaly assessment device of data simultaneously, comprising:
First acquisition unit, the normal data in historical test data for obtaining preset quantity;
First screening unit, for carrying out Feature Selection to the normal data, obtaining all of the normal data must
Want feature and each described essential feature multiple fisrt feature data accordingly;
Reduction unit obtains multiple history restoring datas for carrying out feature reduction to multiple fisrt feature data;
First computing unit, for calculating the first of multiple fisrt feature data and multiple history restoring datas
Difference value;And bring multiple first difference values in sigmoid Function Mapping to (0,1) into, then result is amplified default
Multiple obtains multiple risk score values of the normal data, and its maximum value is taken to obtain model data score value SMould;
Second acquisition unit, for obtaining the test data for needing to evaluate and test;
Second screening unit carries out Feature Selection to the test data for the essential feature according to normal data, obtains
To all essential features of the test data and each described corresponding second feature data of essential feature;
Second computing unit, for calculating the second difference value of the second feature data Yu the history restoring data;
And bring second difference value in sigmoid Function Mapping to (0,1) into, result is then amplified into presupposition multiple, obtains institute
It states multiple risk scores of test data and its maximum value is taken to obtain model data score value SIt surveys;
Judging unit is used for SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
The application proposes a kind of computer equipment, including memory and processor simultaneously, is stored with meter in the memory
The step of calculation machine program, the processor realizes any of the above-described the method when executing the computer program.
The application proposes a kind of computer readable storage medium simultaneously, is stored thereon with computer program, the computer
The step of method described in any of the above embodiments is realized when program is executed by processor.
Anomaly assessment method, apparatus, computer equipment and the medium of a kind of data provided herein, data therein
Anomaly assessment method, need the Claims Resolution data evaluated and tested by obtaining, and by Claims Resolution data importing detection system;Extract institute
The characteristic for stating Claims Resolution data is handled by the Rating Model of detection system, calculates Claims Resolution data score value;Comparison Claims Resolution
The numerical values recited of data score value and model data score value obtains value-at-risk and Risk Results.The application uses unsupervised
The PCA algorithm of study can learn the overall distribution profile of normal data, the thought based on abnormality detection, without the concern for history
The distribution and variation of data are cheated, accuracy is high.
Detailed description of the invention
Fig. 1 is the anomaly assessment method and step schematic diagram of data in one embodiment of the application;
Fig. 2 is the anomaly assessment schematic device of data in one embodiment of the application;
Fig. 3 is the structural schematic block diagram of the computer equipment of one embodiment of the application.
The embodiments will be further described with reference to the accompanying drawings for realization, functional characteristics and the advantage of the application purpose.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
Referring to Fig.1, the application proposes a kind of anomaly assessment method of data, comprising steps of
S1, obtain preset quantity historical test data in normal data;
S2, Feature Selection is carried out to the normal data, obtains all essential features of the normal data and each
A essential feature multiple fisrt feature data accordingly;
S3, feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
S4, the first difference value for calculating multiple the fisrt feature data and multiple history restoring datas;
S5, multiple first difference values are brought into sigmoid Function Mapping to (0,1), is then amplified result pre-
If multiple, multiple risk score values of the normal data are obtained, and its maximum value is taken to obtain model data score value SMould;
S6, the test data for needing to evaluate and test is obtained;
S7, Feature Selection is carried out to the test data according to the essential feature of normal data, obtains the test data
All essential features and each described corresponding second feature data of essential feature;
S8, the second difference value for calculating the second feature data and the history restoring data;
S9, second difference value is brought into sigmoid Function Mapping to (0,1), then by default times of result amplification
Number, obtains multiple risk score values of the test data, and its maximum value is taken to obtain model data score value SIt surveys;
S10, by SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
As described in above-mentioned steps S1, the normal data in above-mentioned historical test data refers to settlement of insurance claim history compensation case
Normal data;It is that data after abnormal case are cheated in removal to settlement of insurance claim history in compensation case.Above-mentioned history compensation case
After normal data modelling, the shape for reflecting normal data is profile.
As described in above-mentioned steps S2, features described above screening is referred to according to business needs, selected characteristic subset, in feature
Comprising metric feature and nonmetric value tag, screening is exactly to find out metric and nonmetric value, only retains measurement value tag, leads to
It crosses Feature Selection and obtains required characteristic later.Features described above data refer to the specific of the attributive character in Claims Resolution data
The value of data, such as when data of settling a claim are a personal insurances, the attributive character and specific data of personal insurance Claims Resolution are
[this length of stay (10), percentage (67%) of this length of stay in similar disease maximum length of stay, this Claims Resolution gold
Volume (50000), this Claims Resolution hospital's quantity (1), patient age (45), gender ...] etc..Above-mentioned described attributive character is
Be [this length of stay, percentage of this length of stay in similar disease maximum length of stay, this amount for which loss settled, this
Claims Resolution hospital's quantity, patient age ... ...];Features described above data be [(10), (67%), (50000), (1),
(45) ... ...].
As described in above-mentioned steps S3, by establishing algorithm (detailed process is referring to next embodiment) certainly, to multiple described
One characteristic carries out feature reduction, obtains multiple history restoring datas.
As described in above-mentioned steps S4, the first difference value diff indicates to pass through between history restoring data and fisrt feature data
Difference after PCA algorithmic transformation, formula used are as follows:
Diff=sum (diff1, diff2 ..., diffm), wherein
Diff1=(X1-X1')/mean (X1)
Diff2=(X2-X2')/mean (X2)
……
Diffm=(Xm-Xm')/mean (Xm)
Mean () expression is averaged.
As described in above-mentioned steps S5, score formula are as follows: y=n/ (1+e^ (- a*diff+b)).In formula, n is presupposition multiple,
A, b is two regulatory factors, SMouldThe maximum value acquired in as all normal training datas according to scoring formula.
As described in above-mentioned steps S6, the test data of above-mentioned needs evaluation and test refers to carrying out insuring anti-fraud detection
Claims Resolution data, acquisition modes are the anomaly assessment system or model by data, and system or model are provided with data introducting interface,
Can by window pull data file, directly input the modes such as data obtain Claims Resolution data (test data for needing to evaluate and test).
As described in above-mentioned steps S7, the above-mentioned essential feature according to normal data carries out Feature Selection to the test data
Obtained all measurement value tags when screening to normal data are referred to, by the above-mentioned measurement value tag for data of settling a claim
Corresponding also to find out, remaining feature is removed.
As described in above-mentioned steps S8, the second difference value Diff indicates to pass through between history restoring data and second feature data
Difference after PCA algorithmic transformation, formula used is identical as the first used formula of difference value diff is calculated, and repeats no more.
As described in above-mentioned steps S9, score formula are as follows: y=n/ (1+e^ (- a*Diff+b)).In formula, n is presupposition multiple,
A, b is two regulatory factors, SIt surveysThe maximum value acquired in as all normal training datas according to scoring formula.
It is above-mentioned by S as described in above-mentioned steps S10Mould、SIt surveysIt compares, S will be obtainedMould、SIt surveysSize and gap model
It encloses, the preset rules are exactly to pass through SMould、SIt surveysSize and gap range obtain corresponding Risk Results.
In one embodiment, described that feature reduction is carried out to the fisrt feature data, obtain history restoring data
Step, comprising:
S10, the fisrt feature data are normalized, obtain the normalization creep function of historical data;
S20, fisrt feature matrix is converted by the normalization creep function of the historical data;
S30, feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history reduction number
According to.
In the present embodiment, above-mentioned PCA (Principal Components Analysis, principal component analysis) refers to
PCA algorithm is a kind of unsupervised learning algorithm, is mainly used for feature extraction and dimensionality reduction in the application.
It is described as described in above-mentioned steps S10, portion Claims Resolution data in, include multiple features and characteristic, a spy
Sign data are also referred to as element, and all characteristics of a feature form the character subset of this feature.It is above-mentioned to described first
Characteristic is normalized, and the normalization creep function for obtaining Claims Resolution data refers to that each element is required to be counted
It is calculated according to normalization.After calculating all is normalized to all characteristics, normalization creep function is obtained.
As described in above-mentioned steps S20, it is based on PCA algorithm, the normalization creep function conversion that step S20 is obtained is characterized square
Battle array.Specifically, assuming that X is the matrix of a m*n, m character representation data of n object are indicated, i.e., each column indicate one
Object, every a line indicate a feature.It is desirable that going out to be reduced to d dimension for feature, d is much smaller than m.Output result is Y, then Y is one
The matrix of a d*n.Specific algorithm is as follows:
(1) remember X=[x1, x2...xn], calculate the average value of each object-point
(2) remember decentralization result:Matrix SVD is to it
(Singular Value Decomposition, abbreviation SVD) is decomposed i.e.: X-x0=U Λ VT
(3) then x0 be new coordinate system origin, matrix U preceding d column be decentralization after new coordinate system, there is no harm in
It is denoted as W.So, expression of all the points under new coordinate system are as follows: Y=WT*(X-x0), similarly, new subpoint y is restored
Into former coordinate system (that is, PCA inverse transformation), as a result it can be written as: x0+W*y。
As described in above-mentioned steps S30, obtained after the information of reservation 95% after PCA is trained based on passing through for trained data x*
To W and Y, Y is then passed through into PCA inverse transformation: x0+ W*y is converted to
In one embodiment, described that Feature Selection is carried out to the normal data, obtain all of the normal data
The step of essential feature and each described essential feature corresponding fisrt feature data, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, obtain the normal data all essential features and each described in must
Want feature multiple fisrt feature data accordingly.
In the present embodiment, above-mentioned inessential feature, that is, nonmetric value tag, it is no to the behavior profile of normal data real
Border influences;Above-mentioned essential feature measures value tag, and the shape for influencing normal data is profile, therefore above-mentioned inessential feature is gone
It removes, filters out essential feature, the anomaly assessment method of data can be made more accurate, while reducing calculation amount, reduce error
Rate and raising assessment efficiency.
In one embodiment, described by SMould、SIt surveysThe step of being compared by preset rules, obtaining Risk Results packet
It includes:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
In the present embodiment, what model data score value actually reacted is the normal data of settlement of insurance claim history compensation case
Shape be profile, if Claims Resolution data score value is greater than model data score value, the shape for the data that illustrate to settle a claim is profile and normal
The shape of data is profile there are bigger difference, illustrates that there are risks;If settle a claim data score value be less than model data score value and
Greater than the 90% of model data score value, then illustrate settle a claim data shape be profile have deviate normal data shape be becoming for profile
Gesture illustrates that there may be risks;If data score value of settling a claim is less than the 90% of model data score value, illustrate data of settling a claim
It with the shape of normal data is that profile is consistent that shape, which is profile, illustrates that there is no risks.
In one embodiment, described that the fisrt feature data are normalized, obtain the normalization of historical data
The step of model includes:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
The result that each of feature data are successively subtracted the minimum value obtains feature divided by the difference
Normalize numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
In the present embodiment, normalization seeks to data to be treated to limit after treatment (by certain algorithm)
In a certain range.Normalization is the convenience for follow-up data processing first, and convergence is accelerated when followed by guarantee program is run.
Normalized specific effect is to conclude the statistical distribution of unified samples.Normalizing between 0-1 is the probability distribution counted, is returned
One change is the coordinate distribution of statistics on some section.Normalization has same, unified and unification the meaning.Logarithm in the present embodiment
According to normalized with means be max min linear normalization method, the formula used is as follows: x*=(X-
Xmin)/(Xmax-Xmin)。
In one embodiment, the normalization creep function by the historical data is converted into the step of fisrt feature matrix
Suddenly, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature
The fisrt feature matrix of vector composition.
In the present embodiment, the characteristic value [s1, s2 ... .sm] of matrix (X-x0) (X-x0) ^T u=su is arranged from big to small
Column, characteristic value si is bigger, corresponding to the feature vector ui data information amount that includes it is more.It extracts and is contributed in normalization creep function
Rate is more than that 95% Principle component extraction comes out, the fisrt feature matrix of the normalization creep function feature vector composition for data of settling a claim.
In one embodiment, the contribution rate are as follows:Wherein, contrib is contribution rate,
Si is characterized value.
A kind of anomaly assessment method of data provided herein, by obtaining the Claims Resolution data for needing to evaluate and test, and will
The Claims Resolution data import detection system;The characteristic for extracting the Claims Resolution data, at the Rating Model of detection system
Reason calculates Claims Resolution data score value;The numerical values recited of comparison Claims Resolution data score value and model data score value, obtains risk
Value and Risk Results.The application can learn the overall distribution profile of normal data using the PCA algorithm of unsupervised learning,
Thought based on abnormality detection, without the concern for the distribution and variation of fraud data, accuracy is high.
Referring to Fig. 2, a kind of anomaly assessment device of data is also proposed in the embodiment of the present application, comprising:
First acquisition unit 10, the normal data in historical test data for obtaining preset quantity;
First screening unit 20 obtains all of the normal data for carrying out Feature Selection to the normal data
Essential feature and each described essential feature multiple fisrt feature data accordingly;
Reduction unit 30 obtains multiple history reduction numbers for carrying out feature reduction to multiple fisrt feature data
According to;
First computing unit 40, for calculating the of multiple fisrt feature data and multiple history restoring datas
One difference value;And bring multiple first difference values in sigmoid Function Mapping to (0,1) into, then result is amplified pre-
If multiple, multiple risk score values of the normal data are obtained, and its maximum value is taken to obtain model data score value SMould;
Second acquisition unit 50, for obtaining the test data for needing to evaluate and test;
Second screening unit 60 carries out Feature Selection to the test data for the essential feature according to normal data,
Obtain all essential features and each described corresponding second feature data of essential feature of the test data;
Second computing unit 70, for calculating the second difference of the second feature data Yu the history restoring data
Value;And bring second difference value in sigmoid Function Mapping to (0,1) into, result is then amplified into presupposition multiple, is obtained
One risk score of the test data simultaneously takes its maximum value to obtain model data score value SIt surveys;
Judging unit 80 being compared by preset rules for surveying S mould, S, obtaining Risk Results.
In the present embodiment, above-mentioned PCA (Principal Components Analysis) refers to PCA algorithm, is one
Unsupervised learning algorithm is planted, is mainly used for feature extraction and dimensionality reduction in the application.
In first acquisition unit 10, the normal data of above-mentioned settlement of insurance claim history compensation case is that settlement of insurance claim history has been paid for
The data after abnormal case are cheated in removal in case.Above-mentioned history after the normal data modelling of compensation case, reflects normal
The shape of data is profile.
In the first screening unit 20, features described above screening is referred to according to business needs, selected characteristic subset, feature
In include metric feature and nonmetric value tag, screening is exactly to find out metric and nonmetric value, only retain measurement value tag,
By obtaining required characteristic after Feature Selection.Features described above data refer to the tool of the attributive character in Claims Resolution data
The value of volume data, such as when data of settling a claim are a personal insurances, the attributive character and specific data of personal insurance Claims Resolution
It is [this length of stay (10), percentage (67%) of this length of stay in similar disease maximum length of stay, this Claims Resolution
The amount of money (50000), this Claims Resolution hospital's quantity (1), patient age (45), gender ...] etc..Above-mentioned described attributive character
Be [this length of stay, percentage of this length of stay in similar disease maximum length of stay, this amount for which loss settled, this
Secondary Claims Resolution hospital quantity, patient age ... ...];Features described above data be [(10), (67%), (50000), (1),
(45) ... ...].
In reduction unit 30, by establishing algorithm (detailed process is referring to next embodiment) certainly, to multiple described first
Characteristic carries out feature reduction, obtains multiple history restoring datas.
In the first computing unit 40, the first difference value diff indicates to lead between history restoring data and fisrt feature data
Cross the difference after PCA algorithmic transformation, formula used are as follows:
Diff=sum (diff1, diff2 ..., diffm), wherein
Diff1=(X1-X1')/mean (X1)
Diff2=(X2-X2')/mean (X2)
……
Diffm=(Xm-Xm')/mean (Xm)
Mean () expression is averaged.
Score formula are as follows: y=n/ (1+e^ (- a*diff+b)).In formula, n is presupposition multiple, and a, b are two regulatory factors,
SMouldThe maximum value acquired in as all normal training datas according to scoring formula.
In second acquisition unit 50, the Claims Resolution data of above-mentioned needs evaluation and test refer to carrying out insuring anti-fraud detection
Claims Resolution data, acquisition modes are the anomaly assessment system or model by data, and system or model are provided with data importing and connect
Mouthful, data file can be pulled by window, directly inputted the modes such as data and obtained Claims Resolution data.
In the second screening unit 60, the above-mentioned essential feature according to normal data carries out feature sieve to the Claims Resolution data
Choosing refers to obtained all measurement value tags when screening to normal data, and the above-mentioned metric for data of settling a claim is special
Levy it is corresponding also find out, remaining feature is removed.
In the second computing unit 70, the second difference value Diff indicates to lead between history restoring data and second feature data
The difference after PCA algorithmic transformation is crossed, formula used is identical as the first used formula of difference value diff is calculated, and repeats no more.
Score formula are as follows: y=n/ (1+e^ (- a*Diff+b)).In formula, n is presupposition multiple, and a, b are two regulatory factors,
SIt surveysThe maximum value acquired in as all normal training datas according to scoring formula.
It is above-mentioned by S in judging unit 80Mould、SIt surveysIt compares, S will be obtainedMould、SIt surveysSize and gap range, institute
Stating preset rules is exactly to pass through SMould、SIt surveysSize and gap range obtain corresponding Risk Results.
Referring to Fig. 3, a kind of computer equipment is also proposed in the embodiment of the present application, which can be server,
Its internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, network connected by system bus
Interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment is deposited
Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program
And database.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.
The database of the computer equipment is for storing history settlement of insurance claim case data etc..The network interface of the computer equipment is used for
It is communicated with external terminal by network connection.It is commented when the computer program is executed by processor with the exception for realizing a kind of data
Estimate method.
Above-mentioned processor executes the step of above method:
Obtain the normal data in the historical test data of preset quantity;
Feature Selection is carried out to the normal data, obtains all essential features and each institute of the normal data
State essential feature multiple fisrt feature data accordingly;
Feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
Calculate the first difference value of multiple fisrt feature data Yu multiple history restoring datas;
Multiple first difference values are brought into sigmoid Function Mapping to (0,1), then by default times of result amplification
Number, obtains multiple risk score values of the normal data, and its maximum value is taken to obtain model data score value SMould;
Obtain the test data for needing to evaluate and test;
Feature Selection is carried out to the test data according to the essential feature of normal data, obtains the institute of the test data
It is necessary to feature and each described corresponding second feature data of essential feature;
Calculate the second difference value of the second feature data Yu the history restoring data;
Second difference value is brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple,
Multiple risk score values of the test data are obtained, and its maximum value is taken to obtain model data score value SIt surveys;
By SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
Further, described the step of feature reduction is carried out to the fisrt feature data, obtains history restoring data, packet
It includes:
The fisrt feature data are normalized, the normalization creep function of historical data is obtained;
Fisrt feature matrix is converted by the normalization creep function of the historical data;
Feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history restoring data.
Further, it is described to the normal data carry out Feature Selection, obtain the normal data institute it is necessary to spies
The step of sign and each described essential feature corresponding fisrt feature data, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, obtain the normal data all essential features and each described in must
Want feature multiple fisrt feature data accordingly.
Further, described by SMould、SIt surveysThe step of being compared by preset rules, obtaining Risk Results include:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
Further, described that the fisrt feature data are normalized, obtain the normalization creep function of historical data
Step includes:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
The result that each of feature data are successively subtracted the minimum value obtains feature divided by the difference
Normalize numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
Further, the step of normalization creep function by the historical data is converted into fisrt feature matrix, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature
The fisrt feature matrix of vector composition.
Further, the contribution rate are as follows:Wherein, contrib is contribution rate, and si is spy
Value indicative.
One embodiment of the application also proposes a kind of computer readable storage medium, is stored thereon with computer program, calculates
Machine program realizes a kind of anomaly assessment method of data when being executed by processor, comprising steps of
Obtain the normal data in the historical test data of preset quantity;
Feature Selection is carried out to the normal data, obtains all essential features and each institute of the normal data
State essential feature multiple fisrt feature data accordingly;
Feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
Calculate the first difference value of multiple fisrt feature data Yu multiple history restoring datas;
Multiple first difference values are brought into sigmoid Function Mapping to (0,1), then by default times of result amplification
Number, obtains multiple risk score values of the normal data, and its maximum value is taken to obtain model data score value SMould;
Obtain the test data for needing to evaluate and test;
Feature Selection is carried out to the test data according to the essential feature of normal data, obtains the institute of the test data
It is necessary to feature and each described corresponding second feature data of essential feature;
Calculate the second difference value of the second feature data Yu the history restoring data;
Second difference value is brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple,
Multiple risk score values of the test data are obtained, and its maximum value is taken to obtain model data score value SIt surveys;
By SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
In one embodiment, described that feature reduction is carried out to the fisrt feature data, obtain history restoring data
Step, comprising:
The fisrt feature data are normalized, the normalization creep function of historical data is obtained;
Fisrt feature matrix is converted by the normalization creep function of the historical data;
Feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history restoring data.
In one embodiment, described that Feature Selection is carried out to the normal data, obtain all of the normal data
The step of essential feature and each described essential feature corresponding fisrt feature data, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, obtain the normal data all essential features and each described in must
Want feature multiple fisrt feature data accordingly.
Further, described by SMould、SIt surveysThe step of being compared by preset rules, obtaining Risk Results include:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
In one embodiment, described that the fisrt feature data are normalized, obtain the normalization of historical data
The step of model includes:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
The result that each of feature data are successively subtracted the minimum value obtains feature divided by the difference
Normalize numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
In one embodiment, the normalization creep function by the historical data is converted into the step of fisrt feature matrix
Suddenly, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature
The fisrt feature matrix of vector composition.
In one embodiment, the contribution rate are as follows:Wherein, contrib is contribution rate,
Si is characterized value.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can store and a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,
Any reference used in provided herein and embodiment to memory, storage, database or other media,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM can by diversified forms
, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), increasing
Strong type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and
And further include the other elements being not explicitly listed, or further include for this process, device, article or method institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, device of element, article or method.
The foregoing is merely preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all utilizations
Equivalent structure or equivalent flow shift made by present specification and accompanying drawing content is applied directly or indirectly in other correlations
Technical field, similarly include in the scope of patent protection of the application.
Claims (10)
1. a kind of anomaly assessment method of data, which is characterized in that comprising steps of
Obtain the normal data in the historical test data of preset quantity;
To the normal data carry out Feature Selection, obtain the normal data all essential features and each described in must
Want feature multiple fisrt feature data accordingly;
Feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
Calculate the first difference value of multiple fisrt feature data Yu multiple history restoring datas;
Multiple first difference values are brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple,
Multiple risk score values of the normal data are obtained, and its maximum value is taken to obtain model data score value SMould;
Obtain the test data for needing to evaluate and test;
Feature Selection is carried out to the test data according to the essential feature of normal data, obtaining all of the test data must
Want feature and each described corresponding second feature data of essential feature;
Calculate the second difference value of the second feature data Yu the history restoring data;
Second difference value is brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple, is obtained
Multiple risk score values of the test data, and its maximum value is taken to obtain model data score value SIt surveys;
By SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
2. the anomaly assessment method of data according to claim 1, which is characterized in that described to the fisrt feature data
The step of carrying out feature reduction, obtaining history restoring data, comprising:
The fisrt feature data are normalized, the normalization creep function of historical data is obtained;
Fisrt feature matrix is converted by the normalization creep function of the historical data;
Feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history restoring data.
3. the anomaly assessment method of data according to claim 1, which is characterized in that described to be carried out to the normal data
Feature Selection obtains all essential features and each described corresponding fisrt feature number of essential feature of the normal data
According to the step of, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, all essential features and each described necessary spy of the normal data are obtained
Levy corresponding multiple fisrt feature data.
4. the anomaly assessment method of data according to claim 1, which is characterized in that described by SMould、SIt surveysPass through default rule
The step of then comparing, obtaining Risk Results include:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
5. the anomaly assessment method of data according to claim 2, which is characterized in that described to the fisrt feature data
The step of being normalized, obtaining the normalization creep function of historical data include:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
Each of feature data are successively subtracted to the result of the minimum value divided by the difference, obtain feature normalizing
Change numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
6. the anomaly assessment method of data according to claim 2, which is characterized in that the normalizing by the historical data
Change the step of model conversation is fisrt feature matrix, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature vector
The fisrt feature matrix of composition.
7. the anomaly assessment method of data according to claim 6, which is characterized in that the contribution rate are as follows:Wherein, contrib is contribution rate, and si is characterized value.
8. a kind of anomaly assessment device of data characterized by comprising
First acquisition unit, the normal data in historical test data for obtaining preset quantity;
First screening unit, for the normal data carry out Feature Selection, obtain the normal data institute it is necessary to spies
Sign and each described essential feature multiple fisrt feature data accordingly;
Reduction unit obtains multiple history restoring datas for carrying out feature reduction to multiple fisrt feature data;
First computing unit, for calculating the first difference of multiple fisrt feature data Yu multiple history restoring datas
Value;And bring multiple first difference values in sigmoid Function Mapping to (0,1) into, result is then amplified into presupposition multiple,
Multiple risk score values of the normal data are obtained, and its maximum value is taken to obtain model data score value SMould;
Second acquisition unit, for obtaining the test data for needing to evaluate and test;
Second screening unit carries out Feature Selection to the test data for the essential feature according to normal data, obtains institute
State all essential features and each described corresponding second feature data of essential feature of test data;
Second computing unit, for calculating the second difference value of the second feature data Yu the history restoring data;And it will
Second difference value is brought into sigmoid Function Mapping to (0,1), and result is then amplified presupposition multiple, obtains the survey
It tries multiple risk scores of data and its maximum value is taken to obtain model data score value SIt surveys;
Judging unit is used for SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
9. a kind of computer equipment, including memory and processor, it is stored with computer program in the memory, feature exists
In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910463901.7A CN110322357A (en) | 2019-05-30 | 2019-05-30 | Anomaly assessment method, apparatus, computer equipment and the medium of data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910463901.7A CN110322357A (en) | 2019-05-30 | 2019-05-30 | Anomaly assessment method, apparatus, computer equipment and the medium of data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110322357A true CN110322357A (en) | 2019-10-11 |
Family
ID=68119104
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910463901.7A Pending CN110322357A (en) | 2019-05-30 | 2019-05-30 | Anomaly assessment method, apparatus, computer equipment and the medium of data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110322357A (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1825958A (en) * | 2005-02-22 | 2006-08-30 | 乐金电子(中国)研究开发中心有限公司 | Compressed video quality checking method for image quality evaluation |
US20130085769A1 (en) * | 2010-03-31 | 2013-04-04 | Risk Management Solutions Llc | Characterizing healthcare provider, claim, beneficiary and healthcare merchant normal behavior using non-parametric statistical outlier detection scoring techniques |
US20140058763A1 (en) * | 2012-07-24 | 2014-02-27 | Deloitte Development Llc | Fraud detection methods and systems |
CN104881783A (en) * | 2015-05-14 | 2015-09-02 | 中国科学院信息工程研究所 | E-bank account fraudulent conduct and risk detecting method and system |
US20170017760A1 (en) * | 2010-03-31 | 2017-01-19 | Fortel Analytics LLC | Healthcare claims fraud, waste and abuse detection system using non-parametric statistics and probability based scores |
CN108256720A (en) * | 2017-11-07 | 2018-07-06 | 中国平安财产保险股份有限公司 | A kind of settlement of insurance claim methods of risk assessment and terminal device |
CN108595667A (en) * | 2018-04-28 | 2018-09-28 | 广东电网有限责任公司 | A kind of correlation analysis method of Network Abnormal data |
CN109002988A (en) * | 2018-07-18 | 2018-12-14 | 平安科技(深圳)有限公司 | Risk passenger method for predicting, device, computer equipment and storage medium |
CN109118376A (en) * | 2018-08-14 | 2019-01-01 | 平安医疗健康管理股份有限公司 | Medical insurance premium calculation principle method, apparatus, computer equipment and storage medium |
CN109242107A (en) * | 2018-09-11 | 2019-01-18 | 北京芯盾时代科技有限公司 | Anti- fraud model training method, system based on transfer learning and counter cheat method |
CN109670929A (en) * | 2018-09-13 | 2019-04-23 | 深圳壹账通智能科技有限公司 | Control method, device, equipment and the computer readable storage medium of loan early warning |
CN109741194A (en) * | 2018-12-25 | 2019-05-10 | 斑马网络技术有限公司 | Processing method, equipment and the storage medium of vehicle collision |
-
2019
- 2019-05-30 CN CN201910463901.7A patent/CN110322357A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1825958A (en) * | 2005-02-22 | 2006-08-30 | 乐金电子(中国)研究开发中心有限公司 | Compressed video quality checking method for image quality evaluation |
US20130085769A1 (en) * | 2010-03-31 | 2013-04-04 | Risk Management Solutions Llc | Characterizing healthcare provider, claim, beneficiary and healthcare merchant normal behavior using non-parametric statistical outlier detection scoring techniques |
US20170017760A1 (en) * | 2010-03-31 | 2017-01-19 | Fortel Analytics LLC | Healthcare claims fraud, waste and abuse detection system using non-parametric statistics and probability based scores |
US20140058763A1 (en) * | 2012-07-24 | 2014-02-27 | Deloitte Development Llc | Fraud detection methods and systems |
CN104881783A (en) * | 2015-05-14 | 2015-09-02 | 中国科学院信息工程研究所 | E-bank account fraudulent conduct and risk detecting method and system |
CN108256720A (en) * | 2017-11-07 | 2018-07-06 | 中国平安财产保险股份有限公司 | A kind of settlement of insurance claim methods of risk assessment and terminal device |
CN108595667A (en) * | 2018-04-28 | 2018-09-28 | 广东电网有限责任公司 | A kind of correlation analysis method of Network Abnormal data |
CN109002988A (en) * | 2018-07-18 | 2018-12-14 | 平安科技(深圳)有限公司 | Risk passenger method for predicting, device, computer equipment and storage medium |
CN109118376A (en) * | 2018-08-14 | 2019-01-01 | 平安医疗健康管理股份有限公司 | Medical insurance premium calculation principle method, apparatus, computer equipment and storage medium |
CN109242107A (en) * | 2018-09-11 | 2019-01-18 | 北京芯盾时代科技有限公司 | Anti- fraud model training method, system based on transfer learning and counter cheat method |
CN109670929A (en) * | 2018-09-13 | 2019-04-23 | 深圳壹账通智能科技有限公司 | Control method, device, equipment and the computer readable storage medium of loan early warning |
CN109741194A (en) * | 2018-12-25 | 2019-05-10 | 斑马网络技术有限公司 | Processing method, equipment and the storage medium of vehicle collision |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gupta et al. | Do hedge funds have enough capital? A value-at-risk approach | |
Gordy et al. | Spectral backtests of forecast distributions with application to risk management | |
CN110729054B (en) | Abnormal diagnosis behavior detection method and device, computer equipment and storage medium | |
Bakshi et al. | First-passage probability, jump models, and intra-horizon risk | |
Kim et al. | Dynamic forecasts of financial distress of Australian firms | |
Florez-Lopez | Modelling of insurers’ rating determinants. An application of machine learning techniques and statistical models | |
Stanisic et al. | Predicting the type of auditor opinion: Statistics, machine learning, or a combination of the two? | |
Youssef et al. | Oil-gold nexus: Evidence from regime switching-quantile regression approach | |
Calvet et al. | Dimension-invariant dynamic term structures | |
CN111429289B (en) | Single disease identification method and device, computer equipment and storage medium | |
Jin et al. | Modeling the linkages between Bitcoin, gold, dollar, crude oil, and stock markets: A GARCH-EVT-copula approach | |
Chen et al. | Modeling recovery rate for leveraged loans | |
Chen et al. | Empirical performance of the constant elasticity variance option pricing model | |
WO2022249927A1 (en) | Classification system | |
Consigli et al. | Portfolio choice under cumulative prospect theory: sensitivity analysis and an empirical study | |
CN110322357A (en) | Anomaly assessment method, apparatus, computer equipment and the medium of data | |
Diaz et al. | Testing for long-memory and chaos in the returns of currency exchange-traded notes (ETNs) | |
Marsani et al. | Non-Stationary in Extreme Share Return: World Indices Application | |
Ortobelli Lozza et al. | Timing portfolio strategies with exponential Lévy processes | |
Lu | Monte Carlo analysis of methods for extracting risk‐neutral densities with affine jump diffusions | |
Ellickson et al. | Estimating a local heston model | |
Opdyke et al. | Operational risk capital estimation and planning: exact sensitivity analysis and business decision making using the influence function | |
So et al. | Multivariate GARCH models with correlation clustering | |
Svensson | A Bayesian approach to modeling operational risk when data is scarce | |
Wang et al. | Crosscorrelation analysis between P2P lending market and stock market in China |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |