CN110322357A - Anomaly assessment method, apparatus, computer equipment and the medium of data - Google Patents

Anomaly assessment method, apparatus, computer equipment and the medium of data Download PDF

Info

Publication number
CN110322357A
CN110322357A CN201910463901.7A CN201910463901A CN110322357A CN 110322357 A CN110322357 A CN 110322357A CN 201910463901 A CN201910463901 A CN 201910463901A CN 110322357 A CN110322357 A CN 110322357A
Authority
CN
China
Prior art keywords
data
feature
value
mould
fisrt feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910463901.7A
Other languages
Chinese (zh)
Inventor
李金乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910463901.7A priority Critical patent/CN110322357A/en
Publication of CN110322357A publication Critical patent/CN110322357A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

Anomaly assessment method, apparatus, computer equipment and the medium of a kind of data provided herein, the anomaly assessment method of data therein import detection system by obtaining the test data for needing to evaluate and test, and by the test data;The characteristic for extracting the Claims Resolution data, is handled by the Rating Model of detection system, calculates test data score value;The numerical values recited of contrast test data score value and model data score value obtains value-at-risk and Risk Results.The application can learn the overall distribution profile of normal data using the PCA algorithm of unsupervised learning, and the thought based on abnormality detection, without the concern for the distribution and variation of history abnormal data, accuracy is high.

Description

Anomaly assessment method, apparatus, computer equipment and the medium of data
Technical field
This application involves the anti-fraud fields of insurance, in particular to the anomaly assessment method, apparatus of a kind of data, computer are set Standby and medium.
Background technique
There are following pain spots for the anti-fraud air control Rating Model of insurance at present: taking advantage of in most of insurance company's history Claims Resolution data The record for cheating data is seldom, and the ratio of a large amount of normal data and few abnormal data is extremely uneven, leads to much have supervision Machine learning air control model can not use or keep its mode of learning single, less effective.Based on this, needs one kind and pass through ginseng Examine the method that a large amount of normal datas can identify fraud data.
Summary of the invention
The main purpose of the application is to provide anomaly assessment method, apparatus, computer equipment and the medium of a kind of data, purport It is solving the above problems.
To achieve the above object, this application provides a kind of anomaly assessment methods of data, comprising steps of
Obtain the normal data in the historical test data of preset quantity;
Feature Selection is carried out to the normal data, obtains all essential features and each institute of the normal data State essential feature multiple fisrt feature data accordingly;
Feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
Calculate the first difference value of multiple fisrt feature data Yu multiple history restoring datas;
Multiple first difference values are brought into sigmoid Function Mapping to (0,1), then by default times of result amplification Number, obtains multiple risk score values of the normal data, and its maximum value is taken to obtain model data score value SMould
Obtain the test data for needing to evaluate and test;
Feature Selection is carried out to the test data according to the essential feature of normal data, obtains the institute of the test data It is necessary to feature and each described corresponding second feature data of essential feature;
Calculate the second difference value of the second feature data Yu the history restoring data;
Second difference value is brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple, Multiple risk score values of the test data are obtained, and its maximum value is taken to obtain model data score value SIt surveys
By SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
Further, described the step of feature reduction is carried out to the fisrt feature data, obtains history restoring data, packet It includes:
The fisrt feature data are normalized, the normalization creep function of historical data is obtained;
Fisrt feature matrix is converted by the normalization creep function of the historical data;
Feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history restoring data.
Further, it is described to the normal data carry out Feature Selection, obtain the normal data institute it is necessary to spies The step of sign and each described essential feature corresponding fisrt feature data, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, obtain the normal data all essential features and each described in must Want feature multiple fisrt feature data accordingly.
Further, described by SMould、SIt surveysThe step of being compared by preset rules, obtaining Risk Results include:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
Further, described that the fisrt feature data are normalized, obtain the normalization creep function of historical data Step includes:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
The result that each of feature data are successively subtracted the minimum value obtains feature divided by the difference Normalize numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
Further, the step of normalization creep function by the historical data is converted into fisrt feature matrix, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature The fisrt feature matrix of vector composition.
Further, the contribution rate are as follows:Wherein, contrib is contribution rate, and si is spy Value indicative.
The application proposes a kind of anomaly assessment device of data simultaneously, comprising:
First acquisition unit, the normal data in historical test data for obtaining preset quantity;
First screening unit, for carrying out Feature Selection to the normal data, obtaining all of the normal data must Want feature and each described essential feature multiple fisrt feature data accordingly;
Reduction unit obtains multiple history restoring datas for carrying out feature reduction to multiple fisrt feature data;
First computing unit, for calculating the first of multiple fisrt feature data and multiple history restoring datas Difference value;And bring multiple first difference values in sigmoid Function Mapping to (0,1) into, then result is amplified default Multiple obtains multiple risk score values of the normal data, and its maximum value is taken to obtain model data score value SMould
Second acquisition unit, for obtaining the test data for needing to evaluate and test;
Second screening unit carries out Feature Selection to the test data for the essential feature according to normal data, obtains To all essential features of the test data and each described corresponding second feature data of essential feature;
Second computing unit, for calculating the second difference value of the second feature data Yu the history restoring data; And bring second difference value in sigmoid Function Mapping to (0,1) into, result is then amplified into presupposition multiple, obtains institute It states multiple risk scores of test data and its maximum value is taken to obtain model data score value SIt surveys
Judging unit is used for SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
The application proposes a kind of computer equipment, including memory and processor simultaneously, is stored with meter in the memory The step of calculation machine program, the processor realizes any of the above-described the method when executing the computer program.
The application proposes a kind of computer readable storage medium simultaneously, is stored thereon with computer program, the computer The step of method described in any of the above embodiments is realized when program is executed by processor.
Anomaly assessment method, apparatus, computer equipment and the medium of a kind of data provided herein, data therein Anomaly assessment method, need the Claims Resolution data evaluated and tested by obtaining, and by Claims Resolution data importing detection system;Extract institute The characteristic for stating Claims Resolution data is handled by the Rating Model of detection system, calculates Claims Resolution data score value;Comparison Claims Resolution The numerical values recited of data score value and model data score value obtains value-at-risk and Risk Results.The application uses unsupervised The PCA algorithm of study can learn the overall distribution profile of normal data, the thought based on abnormality detection, without the concern for history The distribution and variation of data are cheated, accuracy is high.
Detailed description of the invention
Fig. 1 is the anomaly assessment method and step schematic diagram of data in one embodiment of the application;
Fig. 2 is the anomaly assessment schematic device of data in one embodiment of the application;
Fig. 3 is the structural schematic block diagram of the computer equipment of one embodiment of the application.
The embodiments will be further described with reference to the accompanying drawings for realization, functional characteristics and the advantage of the application purpose.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
Referring to Fig.1, the application proposes a kind of anomaly assessment method of data, comprising steps of
S1, obtain preset quantity historical test data in normal data;
S2, Feature Selection is carried out to the normal data, obtains all essential features of the normal data and each A essential feature multiple fisrt feature data accordingly;
S3, feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
S4, the first difference value for calculating multiple the fisrt feature data and multiple history restoring datas;
S5, multiple first difference values are brought into sigmoid Function Mapping to (0,1), is then amplified result pre- If multiple, multiple risk score values of the normal data are obtained, and its maximum value is taken to obtain model data score value SMould
S6, the test data for needing to evaluate and test is obtained;
S7, Feature Selection is carried out to the test data according to the essential feature of normal data, obtains the test data All essential features and each described corresponding second feature data of essential feature;
S8, the second difference value for calculating the second feature data and the history restoring data;
S9, second difference value is brought into sigmoid Function Mapping to (0,1), then by default times of result amplification Number, obtains multiple risk score values of the test data, and its maximum value is taken to obtain model data score value SIt surveys
S10, by SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
As described in above-mentioned steps S1, the normal data in above-mentioned historical test data refers to settlement of insurance claim history compensation case Normal data;It is that data after abnormal case are cheated in removal to settlement of insurance claim history in compensation case.Above-mentioned history compensation case After normal data modelling, the shape for reflecting normal data is profile.
As described in above-mentioned steps S2, features described above screening is referred to according to business needs, selected characteristic subset, in feature Comprising metric feature and nonmetric value tag, screening is exactly to find out metric and nonmetric value, only retains measurement value tag, leads to It crosses Feature Selection and obtains required characteristic later.Features described above data refer to the specific of the attributive character in Claims Resolution data The value of data, such as when data of settling a claim are a personal insurances, the attributive character and specific data of personal insurance Claims Resolution are [this length of stay (10), percentage (67%) of this length of stay in similar disease maximum length of stay, this Claims Resolution gold Volume (50000), this Claims Resolution hospital's quantity (1), patient age (45), gender ...] etc..Above-mentioned described attributive character is Be [this length of stay, percentage of this length of stay in similar disease maximum length of stay, this amount for which loss settled, this Claims Resolution hospital's quantity, patient age ... ...];Features described above data be [(10), (67%), (50000), (1), (45) ... ...].
As described in above-mentioned steps S3, by establishing algorithm (detailed process is referring to next embodiment) certainly, to multiple described One characteristic carries out feature reduction, obtains multiple history restoring datas.
As described in above-mentioned steps S4, the first difference value diff indicates to pass through between history restoring data and fisrt feature data Difference after PCA algorithmic transformation, formula used are as follows:
Diff=sum (diff1, diff2 ..., diffm), wherein
Diff1=(X1-X1')/mean (X1)
Diff2=(X2-X2')/mean (X2)
……
Diffm=(Xm-Xm')/mean (Xm)
Mean () expression is averaged.
As described in above-mentioned steps S5, score formula are as follows: y=n/ (1+e^ (- a*diff+b)).In formula, n is presupposition multiple, A, b is two regulatory factors, SMouldThe maximum value acquired in as all normal training datas according to scoring formula.
As described in above-mentioned steps S6, the test data of above-mentioned needs evaluation and test refers to carrying out insuring anti-fraud detection Claims Resolution data, acquisition modes are the anomaly assessment system or model by data, and system or model are provided with data introducting interface, Can by window pull data file, directly input the modes such as data obtain Claims Resolution data (test data for needing to evaluate and test).
As described in above-mentioned steps S7, the above-mentioned essential feature according to normal data carries out Feature Selection to the test data Obtained all measurement value tags when screening to normal data are referred to, by the above-mentioned measurement value tag for data of settling a claim Corresponding also to find out, remaining feature is removed.
As described in above-mentioned steps S8, the second difference value Diff indicates to pass through between history restoring data and second feature data Difference after PCA algorithmic transformation, formula used is identical as the first used formula of difference value diff is calculated, and repeats no more.
As described in above-mentioned steps S9, score formula are as follows: y=n/ (1+e^ (- a*Diff+b)).In formula, n is presupposition multiple, A, b is two regulatory factors, SIt surveysThe maximum value acquired in as all normal training datas according to scoring formula.
It is above-mentioned by S as described in above-mentioned steps S10Mould、SIt surveysIt compares, S will be obtainedMould、SIt surveysSize and gap model It encloses, the preset rules are exactly to pass through SMould、SIt surveysSize and gap range obtain corresponding Risk Results.
In one embodiment, described that feature reduction is carried out to the fisrt feature data, obtain history restoring data Step, comprising:
S10, the fisrt feature data are normalized, obtain the normalization creep function of historical data;
S20, fisrt feature matrix is converted by the normalization creep function of the historical data;
S30, feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history reduction number According to.
In the present embodiment, above-mentioned PCA (Principal Components Analysis, principal component analysis) refers to PCA algorithm is a kind of unsupervised learning algorithm, is mainly used for feature extraction and dimensionality reduction in the application.
It is described as described in above-mentioned steps S10, portion Claims Resolution data in, include multiple features and characteristic, a spy Sign data are also referred to as element, and all characteristics of a feature form the character subset of this feature.It is above-mentioned to described first Characteristic is normalized, and the normalization creep function for obtaining Claims Resolution data refers to that each element is required to be counted It is calculated according to normalization.After calculating all is normalized to all characteristics, normalization creep function is obtained.
As described in above-mentioned steps S20, it is based on PCA algorithm, the normalization creep function conversion that step S20 is obtained is characterized square Battle array.Specifically, assuming that X is the matrix of a m*n, m character representation data of n object are indicated, i.e., each column indicate one Object, every a line indicate a feature.It is desirable that going out to be reduced to d dimension for feature, d is much smaller than m.Output result is Y, then Y is one The matrix of a d*n.Specific algorithm is as follows:
(1) remember X=[x1, x2...xn], calculate the average value of each object-point
(2) remember decentralization result:Matrix SVD is to it (Singular Value Decomposition, abbreviation SVD) is decomposed i.e.: X-x0=U Λ VT
(3) then x0 be new coordinate system origin, matrix U preceding d column be decentralization after new coordinate system, there is no harm in It is denoted as W.So, expression of all the points under new coordinate system are as follows: Y=WT*(X-x0), similarly, new subpoint y is restored Into former coordinate system (that is, PCA inverse transformation), as a result it can be written as: x0+W*y。
As described in above-mentioned steps S30, obtained after the information of reservation 95% after PCA is trained based on passing through for trained data x* To W and Y, Y is then passed through into PCA inverse transformation: x0+ W*y is converted to
In one embodiment, described that Feature Selection is carried out to the normal data, obtain all of the normal data The step of essential feature and each described essential feature corresponding fisrt feature data, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, obtain the normal data all essential features and each described in must Want feature multiple fisrt feature data accordingly.
In the present embodiment, above-mentioned inessential feature, that is, nonmetric value tag, it is no to the behavior profile of normal data real Border influences;Above-mentioned essential feature measures value tag, and the shape for influencing normal data is profile, therefore above-mentioned inessential feature is gone It removes, filters out essential feature, the anomaly assessment method of data can be made more accurate, while reducing calculation amount, reduce error Rate and raising assessment efficiency.
In one embodiment, described by SMould、SIt surveysThe step of being compared by preset rules, obtaining Risk Results packet It includes:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
In the present embodiment, what model data score value actually reacted is the normal data of settlement of insurance claim history compensation case Shape be profile, if Claims Resolution data score value is greater than model data score value, the shape for the data that illustrate to settle a claim is profile and normal The shape of data is profile there are bigger difference, illustrates that there are risks;If settle a claim data score value be less than model data score value and Greater than the 90% of model data score value, then illustrate settle a claim data shape be profile have deviate normal data shape be becoming for profile Gesture illustrates that there may be risks;If data score value of settling a claim is less than the 90% of model data score value, illustrate data of settling a claim It with the shape of normal data is that profile is consistent that shape, which is profile, illustrates that there is no risks.
In one embodiment, described that the fisrt feature data are normalized, obtain the normalization of historical data The step of model includes:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
The result that each of feature data are successively subtracted the minimum value obtains feature divided by the difference Normalize numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
In the present embodiment, normalization seeks to data to be treated to limit after treatment (by certain algorithm) In a certain range.Normalization is the convenience for follow-up data processing first, and convergence is accelerated when followed by guarantee program is run. Normalized specific effect is to conclude the statistical distribution of unified samples.Normalizing between 0-1 is the probability distribution counted, is returned One change is the coordinate distribution of statistics on some section.Normalization has same, unified and unification the meaning.Logarithm in the present embodiment According to normalized with means be max min linear normalization method, the formula used is as follows: x*=(X- Xmin)/(Xmax-Xmin)。
In one embodiment, the normalization creep function by the historical data is converted into the step of fisrt feature matrix Suddenly, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature The fisrt feature matrix of vector composition.
In the present embodiment, the characteristic value [s1, s2 ... .sm] of matrix (X-x0) (X-x0) ^T u=su is arranged from big to small Column, characteristic value si is bigger, corresponding to the feature vector ui data information amount that includes it is more.It extracts and is contributed in normalization creep function Rate is more than that 95% Principle component extraction comes out, the fisrt feature matrix of the normalization creep function feature vector composition for data of settling a claim.
In one embodiment, the contribution rate are as follows:Wherein, contrib is contribution rate, Si is characterized value.
A kind of anomaly assessment method of data provided herein, by obtaining the Claims Resolution data for needing to evaluate and test, and will The Claims Resolution data import detection system;The characteristic for extracting the Claims Resolution data, at the Rating Model of detection system Reason calculates Claims Resolution data score value;The numerical values recited of comparison Claims Resolution data score value and model data score value, obtains risk Value and Risk Results.The application can learn the overall distribution profile of normal data using the PCA algorithm of unsupervised learning, Thought based on abnormality detection, without the concern for the distribution and variation of fraud data, accuracy is high.
Referring to Fig. 2, a kind of anomaly assessment device of data is also proposed in the embodiment of the present application, comprising:
First acquisition unit 10, the normal data in historical test data for obtaining preset quantity;
First screening unit 20 obtains all of the normal data for carrying out Feature Selection to the normal data Essential feature and each described essential feature multiple fisrt feature data accordingly;
Reduction unit 30 obtains multiple history reduction numbers for carrying out feature reduction to multiple fisrt feature data According to;
First computing unit 40, for calculating the of multiple fisrt feature data and multiple history restoring datas One difference value;And bring multiple first difference values in sigmoid Function Mapping to (0,1) into, then result is amplified pre- If multiple, multiple risk score values of the normal data are obtained, and its maximum value is taken to obtain model data score value SMould
Second acquisition unit 50, for obtaining the test data for needing to evaluate and test;
Second screening unit 60 carries out Feature Selection to the test data for the essential feature according to normal data, Obtain all essential features and each described corresponding second feature data of essential feature of the test data;
Second computing unit 70, for calculating the second difference of the second feature data Yu the history restoring data Value;And bring second difference value in sigmoid Function Mapping to (0,1) into, result is then amplified into presupposition multiple, is obtained One risk score of the test data simultaneously takes its maximum value to obtain model data score value SIt surveys
Judging unit 80 being compared by preset rules for surveying S mould, S, obtaining Risk Results.
In the present embodiment, above-mentioned PCA (Principal Components Analysis) refers to PCA algorithm, is one Unsupervised learning algorithm is planted, is mainly used for feature extraction and dimensionality reduction in the application.
In first acquisition unit 10, the normal data of above-mentioned settlement of insurance claim history compensation case is that settlement of insurance claim history has been paid for The data after abnormal case are cheated in removal in case.Above-mentioned history after the normal data modelling of compensation case, reflects normal The shape of data is profile.
In the first screening unit 20, features described above screening is referred to according to business needs, selected characteristic subset, feature In include metric feature and nonmetric value tag, screening is exactly to find out metric and nonmetric value, only retain measurement value tag, By obtaining required characteristic after Feature Selection.Features described above data refer to the tool of the attributive character in Claims Resolution data The value of volume data, such as when data of settling a claim are a personal insurances, the attributive character and specific data of personal insurance Claims Resolution It is [this length of stay (10), percentage (67%) of this length of stay in similar disease maximum length of stay, this Claims Resolution The amount of money (50000), this Claims Resolution hospital's quantity (1), patient age (45), gender ...] etc..Above-mentioned described attributive character Be [this length of stay, percentage of this length of stay in similar disease maximum length of stay, this amount for which loss settled, this Secondary Claims Resolution hospital quantity, patient age ... ...];Features described above data be [(10), (67%), (50000), (1), (45) ... ...].
In reduction unit 30, by establishing algorithm (detailed process is referring to next embodiment) certainly, to multiple described first Characteristic carries out feature reduction, obtains multiple history restoring datas.
In the first computing unit 40, the first difference value diff indicates to lead between history restoring data and fisrt feature data Cross the difference after PCA algorithmic transformation, formula used are as follows:
Diff=sum (diff1, diff2 ..., diffm), wherein
Diff1=(X1-X1')/mean (X1)
Diff2=(X2-X2')/mean (X2)
……
Diffm=(Xm-Xm')/mean (Xm)
Mean () expression is averaged.
Score formula are as follows: y=n/ (1+e^ (- a*diff+b)).In formula, n is presupposition multiple, and a, b are two regulatory factors, SMouldThe maximum value acquired in as all normal training datas according to scoring formula.
In second acquisition unit 50, the Claims Resolution data of above-mentioned needs evaluation and test refer to carrying out insuring anti-fraud detection Claims Resolution data, acquisition modes are the anomaly assessment system or model by data, and system or model are provided with data importing and connect Mouthful, data file can be pulled by window, directly inputted the modes such as data and obtained Claims Resolution data.
In the second screening unit 60, the above-mentioned essential feature according to normal data carries out feature sieve to the Claims Resolution data Choosing refers to obtained all measurement value tags when screening to normal data, and the above-mentioned metric for data of settling a claim is special Levy it is corresponding also find out, remaining feature is removed.
In the second computing unit 70, the second difference value Diff indicates to lead between history restoring data and second feature data The difference after PCA algorithmic transformation is crossed, formula used is identical as the first used formula of difference value diff is calculated, and repeats no more.
Score formula are as follows: y=n/ (1+e^ (- a*Diff+b)).In formula, n is presupposition multiple, and a, b are two regulatory factors, SIt surveysThe maximum value acquired in as all normal training datas according to scoring formula.
It is above-mentioned by S in judging unit 80Mould、SIt surveysIt compares, S will be obtainedMould、SIt surveysSize and gap range, institute Stating preset rules is exactly to pass through SMould、SIt surveysSize and gap range obtain corresponding Risk Results.
Referring to Fig. 3, a kind of computer equipment is also proposed in the embodiment of the present application, which can be server, Its internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, network connected by system bus Interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program And database.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium. The database of the computer equipment is for storing history settlement of insurance claim case data etc..The network interface of the computer equipment is used for It is communicated with external terminal by network connection.It is commented when the computer program is executed by processor with the exception for realizing a kind of data Estimate method.
Above-mentioned processor executes the step of above method:
Obtain the normal data in the historical test data of preset quantity;
Feature Selection is carried out to the normal data, obtains all essential features and each institute of the normal data State essential feature multiple fisrt feature data accordingly;
Feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
Calculate the first difference value of multiple fisrt feature data Yu multiple history restoring datas;
Multiple first difference values are brought into sigmoid Function Mapping to (0,1), then by default times of result amplification Number, obtains multiple risk score values of the normal data, and its maximum value is taken to obtain model data score value SMould
Obtain the test data for needing to evaluate and test;
Feature Selection is carried out to the test data according to the essential feature of normal data, obtains the institute of the test data It is necessary to feature and each described corresponding second feature data of essential feature;
Calculate the second difference value of the second feature data Yu the history restoring data;
Second difference value is brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple, Multiple risk score values of the test data are obtained, and its maximum value is taken to obtain model data score value SIt surveys
By SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
Further, described the step of feature reduction is carried out to the fisrt feature data, obtains history restoring data, packet It includes:
The fisrt feature data are normalized, the normalization creep function of historical data is obtained;
Fisrt feature matrix is converted by the normalization creep function of the historical data;
Feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history restoring data.
Further, it is described to the normal data carry out Feature Selection, obtain the normal data institute it is necessary to spies The step of sign and each described essential feature corresponding fisrt feature data, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, obtain the normal data all essential features and each described in must Want feature multiple fisrt feature data accordingly.
Further, described by SMould、SIt surveysThe step of being compared by preset rules, obtaining Risk Results include:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
Further, described that the fisrt feature data are normalized, obtain the normalization creep function of historical data Step includes:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
The result that each of feature data are successively subtracted the minimum value obtains feature divided by the difference Normalize numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
Further, the step of normalization creep function by the historical data is converted into fisrt feature matrix, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature The fisrt feature matrix of vector composition.
Further, the contribution rate are as follows:Wherein, contrib is contribution rate, and si is spy Value indicative.
One embodiment of the application also proposes a kind of computer readable storage medium, is stored thereon with computer program, calculates Machine program realizes a kind of anomaly assessment method of data when being executed by processor, comprising steps of
Obtain the normal data in the historical test data of preset quantity;
Feature Selection is carried out to the normal data, obtains all essential features and each institute of the normal data State essential feature multiple fisrt feature data accordingly;
Feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
Calculate the first difference value of multiple fisrt feature data Yu multiple history restoring datas;
Multiple first difference values are brought into sigmoid Function Mapping to (0,1), then by default times of result amplification Number, obtains multiple risk score values of the normal data, and its maximum value is taken to obtain model data score value SMould
Obtain the test data for needing to evaluate and test;
Feature Selection is carried out to the test data according to the essential feature of normal data, obtains the institute of the test data It is necessary to feature and each described corresponding second feature data of essential feature;
Calculate the second difference value of the second feature data Yu the history restoring data;
Second difference value is brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple, Multiple risk score values of the test data are obtained, and its maximum value is taken to obtain model data score value SIt surveys
By SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
In one embodiment, described that feature reduction is carried out to the fisrt feature data, obtain history restoring data Step, comprising:
The fisrt feature data are normalized, the normalization creep function of historical data is obtained;
Fisrt feature matrix is converted by the normalization creep function of the historical data;
Feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history restoring data.
In one embodiment, described that Feature Selection is carried out to the normal data, obtain all of the normal data The step of essential feature and each described essential feature corresponding fisrt feature data, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, obtain the normal data all essential features and each described in must Want feature multiple fisrt feature data accordingly.
Further, described by SMould、SIt surveysThe step of being compared by preset rules, obtaining Risk Results include:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
In one embodiment, described that the fisrt feature data are normalized, obtain the normalization of historical data The step of model includes:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
The result that each of feature data are successively subtracted the minimum value obtains feature divided by the difference Normalize numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
In one embodiment, the normalization creep function by the historical data is converted into the step of fisrt feature matrix Suddenly, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature The fisrt feature matrix of vector composition.
In one embodiment, the contribution rate are as follows:Wherein, contrib is contribution rate, Si is characterized value.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can store and a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM can by diversified forms , such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), increasing Strong type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include the other elements being not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.
The foregoing is merely preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all utilizations Equivalent structure or equivalent flow shift made by present specification and accompanying drawing content is applied directly or indirectly in other correlations Technical field, similarly include in the scope of patent protection of the application.

Claims (10)

1. a kind of anomaly assessment method of data, which is characterized in that comprising steps of
Obtain the normal data in the historical test data of preset quantity;
To the normal data carry out Feature Selection, obtain the normal data all essential features and each described in must Want feature multiple fisrt feature data accordingly;
Feature reduction is carried out to multiple fisrt feature data, obtains multiple history restoring datas;
Calculate the first difference value of multiple fisrt feature data Yu multiple history restoring datas;
Multiple first difference values are brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple, Multiple risk score values of the normal data are obtained, and its maximum value is taken to obtain model data score value SMould
Obtain the test data for needing to evaluate and test;
Feature Selection is carried out to the test data according to the essential feature of normal data, obtaining all of the test data must Want feature and each described corresponding second feature data of essential feature;
Calculate the second difference value of the second feature data Yu the history restoring data;
Second difference value is brought into sigmoid Function Mapping to (0,1), result is then amplified into presupposition multiple, is obtained Multiple risk score values of the test data, and its maximum value is taken to obtain model data score value SIt surveys
By SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
2. the anomaly assessment method of data according to claim 1, which is characterized in that described to the fisrt feature data The step of carrying out feature reduction, obtaining history restoring data, comprising:
The fisrt feature data are normalized, the normalization creep function of historical data is obtained;
Fisrt feature matrix is converted by the normalization creep function of the historical data;
Feature reduction is carried out by the method for PCA inverse transformation to the fisrt feature matrix, obtains history restoring data.
3. the anomaly assessment method of data according to claim 1, which is characterized in that described to be carried out to the normal data Feature Selection obtains all essential features and each described corresponding fisrt feature number of essential feature of the normal data According to the step of, comprising:
Identify all features of normal data;
If the characteristic value quantity of characteristic therein is less than or equal to 3, it is determined as inessential feature;
If the characteristic value quantity of characteristic therein is greater than 3, it is determined as essential feature;
Inessential feature therein is removed, all essential features and each described necessary spy of the normal data are obtained Levy corresponding multiple fisrt feature data.
4. the anomaly assessment method of data according to claim 1, which is characterized in that described by SMould、SIt surveysPass through default rule The step of then comparing, obtaining Risk Results include:
If SIt surveys>SMould, then determine that there are risks;
If SMould* 90% < SIt surveys<SMould, then determine that there may be risks;
If SIt surveys<SMould* 90%, then determine that risk is not present.
5. the anomaly assessment method of data according to claim 2, which is characterized in that described to the fisrt feature data The step of being normalized, obtaining the normalization creep function of historical data include:
The maximum value and minimum value of same feature are obtained, and calculates the difference of maximum value and minimum value;
Each of feature data are successively subtracted to the result of the minimum value divided by the difference, obtain feature normalizing Change numerical value;
Feature normalization numerical value is acquired to all data in all features to get normalization creep function is arrived.
6. the anomaly assessment method of data according to claim 2, which is characterized in that the normalizing by the historical data Change the step of model conversation is fisrt feature matrix, comprising:
The Principle component extraction that contribution rate in the normalization creep function of the historical data is more than 95% is come out, is obtained by feature vector The fisrt feature matrix of composition.
7. the anomaly assessment method of data according to claim 6, which is characterized in that the contribution rate are as follows:Wherein, contrib is contribution rate, and si is characterized value.
8. a kind of anomaly assessment device of data characterized by comprising
First acquisition unit, the normal data in historical test data for obtaining preset quantity;
First screening unit, for the normal data carry out Feature Selection, obtain the normal data institute it is necessary to spies Sign and each described essential feature multiple fisrt feature data accordingly;
Reduction unit obtains multiple history restoring datas for carrying out feature reduction to multiple fisrt feature data;
First computing unit, for calculating the first difference of multiple fisrt feature data Yu multiple history restoring datas Value;And bring multiple first difference values in sigmoid Function Mapping to (0,1) into, result is then amplified into presupposition multiple, Multiple risk score values of the normal data are obtained, and its maximum value is taken to obtain model data score value SMould
Second acquisition unit, for obtaining the test data for needing to evaluate and test;
Second screening unit carries out Feature Selection to the test data for the essential feature according to normal data, obtains institute State all essential features and each described corresponding second feature data of essential feature of test data;
Second computing unit, for calculating the second difference value of the second feature data Yu the history restoring data;And it will Second difference value is brought into sigmoid Function Mapping to (0,1), and result is then amplified presupposition multiple, obtains the survey It tries multiple risk scores of data and its maximum value is taken to obtain model data score value SIt surveys
Judging unit is used for SMould、SIt surveysIt is compared by preset rules, obtains Risk Results.
9. a kind of computer equipment, including memory and processor, it is stored with computer program in the memory, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201910463901.7A 2019-05-30 2019-05-30 Anomaly assessment method, apparatus, computer equipment and the medium of data Pending CN110322357A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910463901.7A CN110322357A (en) 2019-05-30 2019-05-30 Anomaly assessment method, apparatus, computer equipment and the medium of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910463901.7A CN110322357A (en) 2019-05-30 2019-05-30 Anomaly assessment method, apparatus, computer equipment and the medium of data

Publications (1)

Publication Number Publication Date
CN110322357A true CN110322357A (en) 2019-10-11

Family

ID=68119104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910463901.7A Pending CN110322357A (en) 2019-05-30 2019-05-30 Anomaly assessment method, apparatus, computer equipment and the medium of data

Country Status (1)

Country Link
CN (1) CN110322357A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1825958A (en) * 2005-02-22 2006-08-30 乐金电子(中国)研究开发中心有限公司 Compressed video quality checking method for image quality evaluation
US20130085769A1 (en) * 2010-03-31 2013-04-04 Risk Management Solutions Llc Characterizing healthcare provider, claim, beneficiary and healthcare merchant normal behavior using non-parametric statistical outlier detection scoring techniques
US20140058763A1 (en) * 2012-07-24 2014-02-27 Deloitte Development Llc Fraud detection methods and systems
CN104881783A (en) * 2015-05-14 2015-09-02 中国科学院信息工程研究所 E-bank account fraudulent conduct and risk detecting method and system
US20170017760A1 (en) * 2010-03-31 2017-01-19 Fortel Analytics LLC Healthcare claims fraud, waste and abuse detection system using non-parametric statistics and probability based scores
CN108256720A (en) * 2017-11-07 2018-07-06 中国平安财产保险股份有限公司 A kind of settlement of insurance claim methods of risk assessment and terminal device
CN108595667A (en) * 2018-04-28 2018-09-28 广东电网有限责任公司 A kind of correlation analysis method of Network Abnormal data
CN109002988A (en) * 2018-07-18 2018-12-14 平安科技(深圳)有限公司 Risk passenger method for predicting, device, computer equipment and storage medium
CN109118376A (en) * 2018-08-14 2019-01-01 平安医疗健康管理股份有限公司 Medical insurance premium calculation principle method, apparatus, computer equipment and storage medium
CN109242107A (en) * 2018-09-11 2019-01-18 北京芯盾时代科技有限公司 Anti- fraud model training method, system based on transfer learning and counter cheat method
CN109670929A (en) * 2018-09-13 2019-04-23 深圳壹账通智能科技有限公司 Control method, device, equipment and the computer readable storage medium of loan early warning
CN109741194A (en) * 2018-12-25 2019-05-10 斑马网络技术有限公司 Processing method, equipment and the storage medium of vehicle collision

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1825958A (en) * 2005-02-22 2006-08-30 乐金电子(中国)研究开发中心有限公司 Compressed video quality checking method for image quality evaluation
US20130085769A1 (en) * 2010-03-31 2013-04-04 Risk Management Solutions Llc Characterizing healthcare provider, claim, beneficiary and healthcare merchant normal behavior using non-parametric statistical outlier detection scoring techniques
US20170017760A1 (en) * 2010-03-31 2017-01-19 Fortel Analytics LLC Healthcare claims fraud, waste and abuse detection system using non-parametric statistics and probability based scores
US20140058763A1 (en) * 2012-07-24 2014-02-27 Deloitte Development Llc Fraud detection methods and systems
CN104881783A (en) * 2015-05-14 2015-09-02 中国科学院信息工程研究所 E-bank account fraudulent conduct and risk detecting method and system
CN108256720A (en) * 2017-11-07 2018-07-06 中国平安财产保险股份有限公司 A kind of settlement of insurance claim methods of risk assessment and terminal device
CN108595667A (en) * 2018-04-28 2018-09-28 广东电网有限责任公司 A kind of correlation analysis method of Network Abnormal data
CN109002988A (en) * 2018-07-18 2018-12-14 平安科技(深圳)有限公司 Risk passenger method for predicting, device, computer equipment and storage medium
CN109118376A (en) * 2018-08-14 2019-01-01 平安医疗健康管理股份有限公司 Medical insurance premium calculation principle method, apparatus, computer equipment and storage medium
CN109242107A (en) * 2018-09-11 2019-01-18 北京芯盾时代科技有限公司 Anti- fraud model training method, system based on transfer learning and counter cheat method
CN109670929A (en) * 2018-09-13 2019-04-23 深圳壹账通智能科技有限公司 Control method, device, equipment and the computer readable storage medium of loan early warning
CN109741194A (en) * 2018-12-25 2019-05-10 斑马网络技术有限公司 Processing method, equipment and the storage medium of vehicle collision

Similar Documents

Publication Publication Date Title
Gupta et al. Do hedge funds have enough capital? A value-at-risk approach
Gordy et al. Spectral backtests of forecast distributions with application to risk management
CN110729054B (en) Abnormal diagnosis behavior detection method and device, computer equipment and storage medium
Bakshi et al. First-passage probability, jump models, and intra-horizon risk
Kim et al. Dynamic forecasts of financial distress of Australian firms
Florez-Lopez Modelling of insurers’ rating determinants. An application of machine learning techniques and statistical models
Stanisic et al. Predicting the type of auditor opinion: Statistics, machine learning, or a combination of the two?
Youssef et al. Oil-gold nexus: Evidence from regime switching-quantile regression approach
Calvet et al. Dimension-invariant dynamic term structures
CN111429289B (en) Single disease identification method and device, computer equipment and storage medium
Jin et al. Modeling the linkages between Bitcoin, gold, dollar, crude oil, and stock markets: A GARCH-EVT-copula approach
Chen et al. Modeling recovery rate for leveraged loans
Chen et al. Empirical performance of the constant elasticity variance option pricing model
WO2022249927A1 (en) Classification system
Consigli et al. Portfolio choice under cumulative prospect theory: sensitivity analysis and an empirical study
CN110322357A (en) Anomaly assessment method, apparatus, computer equipment and the medium of data
Diaz et al. Testing for long-memory and chaos in the returns of currency exchange-traded notes (ETNs)
Marsani et al. Non-Stationary in Extreme Share Return: World Indices Application
Ortobelli Lozza et al. Timing portfolio strategies with exponential Lévy processes
Lu Monte Carlo analysis of methods for extracting risk‐neutral densities with affine jump diffusions
Ellickson et al. Estimating a local heston model
Opdyke et al. Operational risk capital estimation and planning: exact sensitivity analysis and business decision making using the influence function
So et al. Multivariate GARCH models with correlation clustering
Svensson A Bayesian approach to modeling operational risk when data is scarce
Wang et al. Crosscorrelation analysis between P2P lending market and stock market in China

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination