CN109992578A - Anti- fraud method, apparatus, computer equipment and storage medium based on unsupervised learning - Google Patents
Anti- fraud method, apparatus, computer equipment and storage medium based on unsupervised learning Download PDFInfo
- Publication number
- CN109992578A CN109992578A CN201910011758.8A CN201910011758A CN109992578A CN 109992578 A CN109992578 A CN 109992578A CN 201910011758 A CN201910011758 A CN 201910011758A CN 109992578 A CN109992578 A CN 109992578A
- Authority
- CN
- China
- Prior art keywords
- point
- fraud
- data
- cluster
- determined
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
Abstract
The embodiment of the present application provides a kind of anti-fraud method, apparatus, computer equipment and storage medium based on unsupervised learning.The described method includes: filtering out the data with high risk of fraud from business datum according to default rule engine;Multidimensional characteristic is constructed according to the data with high risk of fraud;And multi-dimentional scale transformation model is utilized, the multidimensional characteristic of building is visualized in lower dimensional space to obtain multiple data points;Normal point and potential fraud point are determined from multiple data points after visualization;According to clustering algorithm, the normal point and the potential fraud point are clustered, with the cluster after being clustered;Ratio shared by the potential fraud point in each cluster is calculated;It is determined as potential fraud point proportion to cheat data higher than business datum corresponding to each point in the cluster of preset ratio.For the embodiment of the present application by the method judgement fraud data based on unsupervised learning, judgement is more accurate, improves the accuracy rate of identification fraud data.
Description
Technical field
This application involves technical field of data processing more particularly to a kind of anti-fraud methods based on unsupervised learning, dress
It sets, computer equipment and storage medium.
Background technique
In big data era, data are widely used in many fields.From a large amount of data, how more
Accurately determine which data is normal data, which data is the data there are fraud, is had become more and more heavier
It wants.For example, there is some Claims Resolution cases for being related to fraud in settlement of insurance claim case in insurance field.It is common in the industry
Identification improper data algorithm, namely anti-fraud algorithm is two classification methods.With the increase of data volume, two classification are calculated
The decline of method processing capacity, is difficult really to identify improper data from numerous data, is such as difficult to identify that there are take advantage of
The Claims Resolution case of swindleness behavior.If anti-the recognition capability of fraud algorithm is too poor, it will lead to more fraud cases, directly to enterprise
Bring loss.
Summary of the invention
The embodiment of the present application provides a kind of anti-fraud method, apparatus, computer equipment and storage based on unsupervised learning
The accuracy rate of identification fraud data can be improved in medium.
In a first aspect, the embodiment of the present application provides a kind of anti-fraud method based on unsupervised learning, this method packet
It includes:
The data with high risk of fraud are filtered out from business datum according to default rule engine;According to the tool
There are the historical behavior data of user corresponding to the data and the data with high risk of fraud of high risk of fraud to construct
Multidimensional characteristic;According to the data with high risk of fraud, using multi-dimentional scale transformation model, by the multidimensional characteristic of building
It is visualized in lower dimensional space to obtain multiple data points;From multiple data points after visualization determine normal point and
The abnormal point is determined as potential fraud point by abnormal point;According to density clustering algorithm, to the normal point and described potential
Fraud point is clustered, with the cluster after being clustered;Ratio shared by the potential fraud point in each cluster is calculated;It will dive
It is higher than the cluster of preset ratio as target cluster in fraud point proportion;By industry corresponding to each point in the target cluster
Business data are determined as cheating data.
Second aspect, the anti-rogue device based on unsupervised learning that the embodiment of the invention provides a kind of should be based on no prison
The anti-rogue device that educational inspector practises includes for executing the corresponding unit of method described in above-mentioned first aspect.
The third aspect, the embodiment of the invention provides a kind of computer equipment, the computer equipment includes memory,
And the processor being connected with the memory;
The memory is for storing computer program, and the processor is based on running and storing in the memory
Calculation machine program, to execute method described in above-mentioned first aspect.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer program, when the computer program is executed by processor, realizes side described in above-mentioned first aspect
Method.
The embodiment of the present application filters out the number with high risk of fraud according to default rule engine from business datum
According to taking out multidimensional characteristic further according to the data spy with high risk of fraud, then multidimensional characteristic handled in lower dimensional space
In visualized, determine abnormal point and potential fraud point from multiple data points after visualization, and to abnormal point and potential
Fraud point is clustered, and determines fraud data according to cluster result.The embodiment of the present application is by a kind of based on unsupervised learning
Method come judge cheat data, it is no longer necessary to data are labeled, to prevent the mark of error in data from bringing to model learning
Influence;Specifically, the method judgement fraud data by default rule engine and based on unsupervised learning, judge more smart
Standard improves the accuracy rate of identification fraud data.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to required use in embodiment description
Attached drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is the flow diagram of the anti-fraud method provided by the embodiments of the present application based on unsupervised learning;
Fig. 2 is the sub-process schematic diagram of the anti-fraud method provided by the embodiments of the present application based on unsupervised learning;
Fig. 3 is the sub-process schematic diagram of Fig. 2 provided by the embodiments of the present application;
Fig. 4 is the sub-process schematic diagram of the anti-fraud method provided by the embodiments of the present application based on unsupervised learning;
Fig. 5 is the sub-process schematic diagram of the anti-fraud method provided by the embodiments of the present application based on unsupervised learning;
Fig. 6 is the schematic block diagram of the anti-rogue device provided by the embodiments of the present application based on unsupervised learning;
Fig. 7 is the schematic block diagram of low-dimensional visualization provided by the embodiments of the present application;
Fig. 8 is the schematic block diagram of provided by the embodiments of the present application determination unit;
Fig. 9 is the schematic block diagram of cluster cell provided by the embodiments of the present application;
Figure 10 is the schematic block diagram of computer equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen
Please in embodiment, every other reality obtained by those of ordinary skill in the art without making creative efforts
Example is applied, shall fall in the protection scope of this application.
In order to facilitate understanding, the data being related in following methods step are by taking the Claims Resolution forms data in insurance field as an example
It is illustrated, it is possible to understand that ground, the data in the embodiment of the present application are not limited to the Claims Resolution forms data in insurance field, may be used also
To be other data in other field.
Fig. 1 is the flow diagram of the anti-fraud method provided by the embodiments of the present application based on unsupervised learning.Such as Fig. 1
Shown, this method includes S101-S108.
S101 filters out the data with high risk of fraud according to default rule engine from business datum.
Such as insurance field, business datum can be Claims Resolution odd number involved in certain particular kind of insurances
According to, such as the Claims Resolution forms data that medical insurance, serious illness insurance are related to.Business datum saves in the database, such as hive database.
Specifically, step S101 includes: that different screening rules is determined according to different business datums;Using preset
Regulation engine, to filter out the data with high risk of fraud from corresponding business datum according to different screening rules.Root
Different screening rules is formulated, according to different business datums to filter out from different business datums with high risk of fraud
Data, the screening rule as corresponding to different types of insurance is different, the number with high risk of fraud filtered out
According to there is also differences.The different screening rules that different business datums is determined are realized using SQL scripted code, and same
The realization of Shi Liyong SQL scripted code filters out the data with high risk of fraud from different business datums, and periodically automatic
Operation updates, and periodically output has the data of high risk of fraud.Default rule engine implementation is above-mentioned to utilize SQL scripted code
The function being related to.
Such as include: for the screening rule with high risk of fraud data of chronic disease
(1) disease of being in danger containing chronic disease (chronic disease has corresponding slow sick table herein, comprising disease type, disease name and
Disease code amounts to 681 kinds of chronic diseases);(2) adjustment insurance kind is medical insurance kind;(3) reason for the request is by disease medical treatment;(4) case
Part adjust medical insurance kind the effective date expired to accident day for the first time the latest is often 30~180 days etc..
It is to be appreciated that according to the screening rule with high risk of fraud data of the chronic disease, in hive database
It is automated using SQL scripted code and realizes data screening and cleaning, and realize that regular automatic running updates, periodically export target
Case filters out the chronic disease Claims Resolution forms data with high risk of fraud from the Claims Resolution forms data of medical insurance.
S102, according to corresponding to the data with high risk of fraud and the data with high risk of fraud
The historical behavior data of user construct multidimensional characteristic.
Firstly, obtaining going through for user corresponding to the data with high risk of fraud and the data with high risk of fraud
History behavioral data.The data with high risk of fraud are obtained, including obtaining the Claims Resolution list master for having the Claims Resolution of high risk of fraud single
Information and insurer's information, warrantee's information in the single master of Claims Resolution etc..The single Claims Resolution list master of Claims Resolution with high risk of fraud
Information and Claims Resolution it is single main in insurer's information, warrantee's information, static attribute (feature) including the single main appendix agreement of Claims Resolution,
The information such as the essential attribute (feature) of corresponding insurer and warrantee.Obtain use corresponding to the data with high risk of fraud
The historical behavior data at family, the history including obtaining insurer are insured corresponding to data corresponding to behavior and Claims Resolution behavior
Data etc..
Then, the use according to corresponding to the data with high risk of fraud of acquisition and the data with high risk of fraud
The historical behavior data at family construct multi-dimensional feature data.
The multidimensional characteristic such as constructed includes:
(1) customer insured's information, comprising: customer ID, name, birthday, gender, certificate, work unit, marital status etc.
37 dimensions;
(2) policy information, comprising: division code, number of policy, main appendix agreement number, insurance kind type, insurance kind code, effective date
Deng 84 dimensions;
(3) client's physical examination information, comprising: customer ID, number of policy examine doctor, and physical examination type, checks knot at physical examination project
186 dimensions such as fruit, medical history;
(4) insurance kind attribute information, comprising: serial 3 fields of insurance kind code, insurance kind attribute, insurance kind;
(5) core protects result information, comprising: number of policy, control number, core order-preserving number, main appendix agreement number, customer ID is insured amount, shelves
Secondary, core protects 68 dimensions such as reason;
(6) Claims Resolution case information, comprising: case number, processing type, case classification, case state, the number of reporting a case to the security authorities, accident hair
101 fields such as birthday;
(7) Claims Resolution bill information, comprising: case number, number of policy, date of being hospitalized, discharge date, medical amount incurred, residue
Volume, 69 dimensions such as deal with insurance money;
(8) disease settle a claim information, comprising: case number, disease serial number, disease code, medical diagnosis on disease result, operation code,
8 dimensions such as disease recovery from illness situation.
(9) it insures behavioural information of settling a claim, comprising: number of insuring, odd number of insuring, type of insuring insured amount, Claims Resolution time
The dimensions such as number, amount for which loss settled, Claims Resolution disease, Claims Resolution time interval.
Wherein, static nature refers to the feature that information will not be caused different because of behavior difference of insuring or settle a claim every time,
Static nature in such as policy information includes: number of policy, main appendix agreement number, main appendix agreement quantity, insurance kind type, insurance kind code.It throws
Guarantor and warrantee's static nature include: date of birth, gender, occupation, marriage etc..It insures the behavioural characteristic meeting of referring to of settling a claim
Because behavior of insuring or settle a claim every time generates the feature of variation, behavioural characteristic of such as insuring includes: insure number, odd number of insuring, throwing
Protect type insured amount etc.;Claims Resolution behavioural characteristic includes claim times, amount for which loss settled, Claims Resolution disease, Claims Resolution time interval etc..
S103, according to the data with high risk of fraud, using multi-dimentional scale transformation model, by the multidimensional of building
Feature is visualized in lower dimensional space to obtain multiple data points.
Multi-dimentional scale transformation model refers to multidimensional scaling, MDS, is to go to open up in lower dimensional space
Show a kind of method for visualizing of higher-dimension multivariate data.The elementary object of multi-dimentional scale transformation is by initial data " fitting " to one
In a low-dimensional coordinate, so that any deformation as caused by dimensionality reduction is minimum.
Specifically, as shown in Fig. 2, step S103 includes S201-S202.
Nonmetric type characteristics of variables in multidimensional characteristic is converted to measurement in such a way that dummy variable is converted by S201
Type characteristics of variables.
Multi-dimentional scale transformation can be divided into metric form multi-dimentional scale transformation (metric MDS) and nonmetric type multi-dimentional scale becomes
Change (non-metric MDS).Metric form multi-dimentional scale transform method is used in the embodiment of the present application.Dummy variable is also known as illusory change
Amount, nominal variable, are the independents variable quantified, and usual value is 0 or 1.Introduce dummy variable make to problem describe it is conciser,
And close reality.Such as BMI is divided into low birth weight, normal type, overweight, fat classification according to clinical criteria, usually
It can assume to be assigned a value of 1,2,3,4.From the perspective of number, after being assigned a value of 1,2,3,4, they are that have from small to large
Certain ordinal relation, and in fact, exist between four kinds of weight sorting classifications there is no this size relation, they it
Between should be the independent relationship of mutual equality.If according to 1,2,3,4 assignment and be brought into model be it is unreasonable, at this time
It just needs to be translated into dummy variable.As dummy variable is arranged as reference in " normal type ", the value of " normal type " is set
It is set to 1, other " low birth weights ", " overweight ", " obesity " etc. are set as 0 using " non-normal type " as reference.It is understood that
For other weight sortings are compared with normal type, such more specific practical significance.It include duty in the multidimensional characteristic of building
Whether industry suffers from the features such as certain disease, belongs to nonmetric type variable, can become nonmetric type in such a way that dummy variable is converted
Amount is converted to 0-1 metric form variable.
S202 is utilized according to the multidimensional metric form characteristics of variables after the data and conversion with high risk of fraud
Multi-dimentional scale transformation is handled, and obtains multiple data points to be visualized in lower dimensional space.
Multi-dimentional scale converts the classical multi-dimentional scale transformation that can be used in metric form multi-dimentional scale transformation (metric MDS)
(classical MDS) method, that is, the standard measured use Euclidean distance.Wherein, lower dimensional space can be three-dimensional space etc..
In one embodiment, as shown in figure 3, step S202 includes the following steps S301-S308.
S301 obtains the item number with the data of high risk of fraud, it is assumed that is n, obtains multidimensional metric form characteristics of variables
Dimension, it is assumed that be q, using the characteristic of n q dimension as sample data, obtain matrix X.
S302 calculates Euclidean distance matrix D according to matrix X.
S303, according to Euclidean distance matrix D structural matrix A.
S304 calculates inner product matrix B according to matrix A.
Wherein,For the mean value of all values of the i-th row in matrix A,For in matrix A jth arrange all values mean value,For the mean value of all values in matrix A.
S305 calculates the characteristic value and feature vector of inner product matrix B, wherein characteristic value is arranged according to sequence from big to small
Sequence.Such as eigenvalue λ1≥λ2≥λ3≥......
S306 determines the dimension k in visual space.Such as k=3, it means that carrying out in three dimensions visual
Change, such as k=4, it is meant that visualized in space-time.
S307, reconstructWherein, ΕkIt is the matrix of the preceding k feature vector composition of inner product matrix B, ΛkIt is
The preceding k eigenvalue cluster of inner product matrix B at diagonal matrix.The square formed according to the preceding k feature vector of inner product matrix B
Battle array and inner product matrix B preceding k eigenvalue cluster at diagonal matrix restructuring matrix
Wherein, k characteristic value is extracted according to calculated characteristic value sequence from big to small, k corresponding feature to
Amount is feature vector corresponding to k characteristic value.
S308, using the value reconstructed as the point in k dimension space.
If it should be noted that visualizing in three dimensions, the coordinate of each point in the three-dimensional space reconstructed
Value is not three features in original multidimensional characteristic, but is obtained by several dimensional features in original multidimensional characteristic, i.e., each
Each coordinate value of point is obtained by several dimensional features in original multidimensional characteristic.
In this way, being converted using multi-dimentional scale by initial data " fitting " into a low-dimensional coordinate, so that being caused by dimensionality reduction
Any deformation it is minimum, the point in low-dimensional coordinate can express initial data to the greatest extent.
S104 determines normal point and abnormal point from multiple data points after visualization, the abnormal point is determined as diving
In fraud point.
Normal point and potential fraud point are determined from multiple data points after visualization.
In one embodiment, as shown in figure 4, step S104 includes the following steps S401-S404.
S401, the x coordinate value of all the points after obtaining visualization, y-coordinate value ..., k coordinate value, wherein k is indicated
The dimension in visual space.
Such as k=3, then obtaining the x coordinate value of all the points after visualization, y-coordinate value, z coordinate value.
S402 determines x value range according to the x coordinate value of all the points, so that x coordinate value falls into the point of x value range
Accounting reach the first preset ratio;Y value range is determined according to the y-coordinate value of all the points, is taken so that y-coordinate value falls into y
The accounting of the point of value range reaches the second preset ratio;......;K value range is determined according to the k coordinate value of all the points, with
So that the accounting that k coordinate value falls into the point of k value range reaches k-th presumed ratio.
Wherein, the first preset ratio, the second preset ratio ..., k-th presumed ratio can be the same ratio value
Such as 90%, or different ratio values, including each ratio value are different from, or that there are ratio values is different
Situation.
S403, according to determining x value range, y value range ..., k value range, determine a space.
Wherein, the dimension in the space is related with k value.Such as k=3, then according to determining x value range, y value range,
Z value range determines a three-dimensional space.
S404, is determined as normal point for the point fallen into the space, and the point not fallen in the space is determined as exception
The abnormal point is determined as potential fraud point by point.
The embodiment by a new angle, i.e., space determined by the value range by each latitude coordinates come
Determine normal point and potential fraud point.
In one embodiment, step S104 includes: all the points obtained after visualization, according to all the points after visualization
Fit a threshold function table;Normal point and abnormal point are determined according to threshold function table, and the abnormal point is determined as potential fraud
Point.Point such as by the point after visualization with a distance from threshold function table greater than pre-determined distance is determined as abnormal point, after visualization
Point is determined as normal point no more than the point of pre-determined distance with a distance from threshold function table.The threshold value that different business datums fits
Function is different to accordingly, pre-determined distance may also be different.By the way that the point in lower dimensional space is fitted to a threshold value
Function determines normal point and potential fraud point according to threshold function table.
S105 clusters the normal point and the potential fraud point, after obtaining cluster according to clustering algorithm
Cluster.
Wherein, clustering algorithm can be density clustering algorithm etc., wherein density clustering algorithm can be DBSCAN algorithm.
This kind of density clustering algorithm commonly assumes that classification can be determined by the tightness degree of sample distribution.Same category of sample, he
Between it is closely coupled, that is to say, that nearby centainly there is generic sample to deposit around category arbitrary sample
?.By dividing closely coupled sample into one kind, a cluster classification has thus been obtained.By the way that all each groups are close
Connected sample divides each different classification into, then has obtained final all cluster category results.Between the clustering algorithm sample
Distance use Euclidean distance.
Normal point and potential fraud point are divided into three classes by the clustering algorithm:
Core point: the point for having more than MinPts quantity is included in radius r;Boundary point: the quantity put in radius r is less than
MinPts, but fall in core neighborhood of a point;Noise point: neither core point is also not the point of boundary point.
In one embodiment, as shown in figure 5, step S105 includes the following steps S501-S503.
Radius value and distance value is arranged in S501.Wherein, radius value r, distance value MinPts.Distance can be European
Distance etc..Such as r=3, MinPts=3.
The normal point and the potential fraud point are labeled as core according to set radius value and distance value by S502
Heart point, boundary point, noise point, and delete noise point.Specifically, the set of point its radius r field in is calculated each point;
It is more than the point of MinPts as core point using the number put in set;Check left point whether in the field of core point;If surplus
Remaining point is in the field of core point, it is determined that is boundary point, otherwise, it determines being noise point.After noise point has been determined, it will make an uproar
The point of articulation is deleted.If r=3, MinPts=3, the set of the point in its neighborhood r=3 is calculated each point, gathers interior put
Number is more than the point of MinPts=3 as core point, if left point is in the field of core point, it is determined that is boundary point.
The point that distance is no more than set distance value is connected with each other by S503, forms a cluster, includes portion in the cluster
Divide the boundary point in core point and the part core point pre-determined distance value neighborhood, so obtains multiple clusters, the multiple clusters that will be obtained
As the cluster after cluster.
Point such as by distance no more than MinPts=3 is connected with each other, and forms a cluster, the point in core point field also can
It is added into cluster, in this way, including the boundary in part core point and the part core point pre-determined distance value neighborhood in the cluster
Point.In this way, multiple clusters can be obtained, using obtained multiple clusters as the cluster after cluster.
Ratio shared by the potential fraud point in each cluster is calculated in S106.
The shared ratio of potential fraud point is that the number of potential fraud point is in total points in the cluster in each cluster.Such as
Total points in some cluster are 10, and potential fraud point is 4, then the potential shared ratio of point of cheating is 4/10*100%
=40%.
Potential fraud point proportion is higher than the cluster of preset ratio as target cluster by S107.
Wherein, preset ratio can be set to 80% etc..Preset ratio may be set to be other numerical value.
Business datum corresponding to each point in the target cluster is determined as cheating data by S108.
It is to be appreciated that density clustering algorithm carry out cluster form cluster when, it has been contemplated that the distributing position feelings of point
Condition, therefore, when the point for being more than preset ratio in some cluster is all potential fraud point, then the point in the cluster belong to it is latent
It is very big a possibility that cheating point, then data corresponding to all the points in the cluster all can be determined as to cheat data.
Above method embodiment uses the method based on unsupervised learning, without being labeled to data, to prevent data
Influence of the marking error to model learning;Method based on unsupervised learning improves so that the judgement of fraud data is more accurate
The accuracy of identification fraud data.If the scheme in above method embodiment is applied in insurance field, it can be achieved that intelligent core
It pays for.
Fig. 6 is the schematic block diagram of the anti-rogue device provided by the embodiments of the present application based on unsupervised learning.Such as Fig. 6
Shown, which includes for executing unit corresponding to the above-mentioned anti-fraud method based on unsupervised learning.Specifically, such as
Shown in Fig. 6, the device 60 include screening unit 601, feature construction unit 602, visualization 603, point determination unit 604,
Cluster cell 605, ratio computing unit 606, cluster determination unit 607 and fraud data determination unit 608.
Screening unit 601, for being filtered out from business datum according to default rule engine with high risk of fraud
Data.Wherein, screening unit 601 includes condition determining unit, data screening unit.Wherein, condition determining unit is used for root
Different screening conditions are determined according to different business datums.Data screening unit, for utilizing default rule engine, with root
The data with high risk of fraud are filtered out from corresponding business datum according to different screening conditions.
Feature construction unit 602, for according to the data with high risk of fraud and described there is high fraud wind
The historical behavior data of user corresponding to the data of danger construct multidimensional characteristic.
Visualization 603, for according to the data with high risk of fraud, using multi-dimentional scale transformation model,
The multidimensional characteristic of building is visualized in lower dimensional space to obtain multiple data points.
In one embodiment, visualization 603 includes variable converting unit, low-dimensional visualization.Wherein, variable
Converting unit, for the nonmetric type characteristics of variables in multidimensional characteristic to be converted to measurement in such a way that dummy variable is converted
Type characteristics of variables.Low-dimensional visualization, for according to the various dimensions after the data and conversion with high risk of fraud
Amount type characteristics of variables is handled using multi-dimentional scale transformation, obtains multiple data to be visualized in lower dimensional space
Point.
In one embodiment, as shown in fig. 7, low-dimensional visualization 70 includes sample data acquiring unit 701, distance
Matrix calculation unit 702, matrix construction unit 703, inner product matrix calculation unit 704, characteristic value computing unit 705, space dimension
Number determination unit 706, reconfiguration unit 707 and lower dimensional space point determination unit 708.Wherein, sample data acquiring unit 701,
For obtaining the item number n of the data with high risk of fraud, the dimension q of multidimensional metric form characteristics of variables is obtained, n q is tieed up
Characteristic obtains matrix X as sample data.Distance matrix computing unit 702, for calculating Euclidean distance according to matrix X
Matrix D, whereinMatrix construction unit 703, for according to Euclidean distance matrix construction matrix A, whereinInner product matrix calculation unit 704, for calculating inner product matrix B according to matrix A, whereinCharacteristic value computing unit 705, for calculate inner product matrix B characteristic value and feature to
Amount, wherein characteristic value sorts according to sequence from big to small.Space dimensionality determination unit 706, it is visual empty for determining
Between dimension k.Reconfiguration unit 707, for reconstructingWherein EkIt is the preceding k feature vector composition of inner product matrix B
Matrix, ΛkPreceding k eigenvalue cluster at diagonal matrix.Lower dimensional space point determination unit 708, the value for will reconstruct
As the point in k dimension space.
Point determination unit 604 will be described for determining normal point and abnormal point from multiple data points after visualization
Abnormal point is determined as potential fraud point.
In one embodiment, as shown in figure 8, point determination unit 604 includes coordinate value acquiring unit 801, the determining list of range
Member 802, space determination unit 803 and the first determination unit 804.Wherein, coordinate value acquiring unit 801, it is visual for obtaining
The x coordinate value of all the points after change, y-coordinate value ..., k coordinate value, wherein k indicates the dimension in visual space.Model
Determination unit 802 is enclosed, for determining x value range according to the x coordinate value of all the points, so that x coordinate value falls into x value model
The accounting of the point enclosed reaches the first preset ratio;Y value range is determined according to the y-coordinate value of all the points, so that y-coordinate value
The accounting for falling into the point of y value range reaches the second preset ratio;......;K value is determined according to the k coordinate value of all the points
Range, so that the accounting that k coordinate value falls into the point of k value range reaches k-th presumed ratio.Space determination unit 803 is used
According to determining x value range, y value range ..., k value range, determine a space.First determination unit
804, for the point fallen into the space to be determined as normal point, the point not fallen in the space is determined as abnormal point, by institute
It states abnormal point and is determined as potential fraud point.
In one embodiment, point determination unit 604 includes fitting unit, the second determination unit.Wherein, fitting unit is used
All the points after obtaining visualization, fit a threshold function table according to all the points after visualization.Second determination unit,
For determining normal point and abnormal point according to threshold function table, the abnormal point is determined as potential fraud point.
Cluster cell 605, for being clustered to the normal point and the potential fraud point according to clustering algorithm, with
Cluster after being clustered.
In one embodiment, as shown in figure 9, cluster cell 605 include setting unit 901, point marking unit 902 and
Cluster cluster cell 903.Wherein, setting unit 901, for radius value and distance value to be arranged.Point marking unit 902, is used for basis
Set radius value and distance value, by the normal point and the potential fraud point classification marker be core point, boundary point,
Noise point, and delete noise point.Cluster cluster cell 903, the point for distance to be no more than set distance value mutually interconnect
It connects, forms a cluster, include the boundary point in part core point and the part core point pre-determined distance value neighborhood in the cluster, such as
This obtains multiple clusters, using obtained multiple clusters as the cluster after cluster.
Ratio computing unit 606, ratio shared by the potential fraud point for being calculated in each cluster.
Cluster determination unit 607, for potential fraud point proportion to be higher than the cluster of preset ratio as target cluster.
Data determination unit 608 is cheated, for business datum corresponding to each point in the target cluster to be determined as
Cheat data.
It should be noted that it is apparent to those skilled in the art that, the tool of above-mentioned apparatus and each unit
Body realizes process, can be with reference to the corresponding description in preceding method embodiment, for convenience of description and succinctly, herein no longer
It repeats.
Above-mentioned apparatus can be implemented as a kind of form of computer program, and computer program can be as shown in Figure 10
It is run in computer equipment.
Figure 10 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.The equipment is that terminal etc. is set
It is standby, such as mobile terminal, PC terminal, IPad.The equipment 100 includes the processor 102 connected by system bus 101, storage
Device and network interface 103, wherein memory may include non-volatile memory medium 104 and built-in storage 105.
The non-volatile memory medium 104 can storage program area 1041 and computer program 1042.This is non-volatile to deposit
, it can be achieved that described above based on unsupervised when the computer program 1042 stored in storage media is executed by processor 102
The anti-fraud method practised.The processor 102 supports the operation of whole equipment for providing calculating and control ability.The interior storage
Device 105 provides environment for the operation of the computer program in non-volatile memory medium, and the computer program is by processor 102
When execution, processor 102 may make to execute the anti-fraud method described above based on unsupervised learning.The network interface 103
For carrying out network communication.It will be understood by those skilled in the art that structure shown in Figure 10, only and application scheme
The block diagram of relevant part-structure does not constitute the restriction for the equipment being applied thereon to application scheme, specifically sets
Standby may include perhaps combining certain components or with different component cloth than more or fewer components as shown in the figure
It sets.
Wherein, the processor 102 is for running computer program stored in memory, to realize following steps:
The data with high risk of fraud are filtered out from business datum according to default rule engine;According to the tool
There are the historical behavior data of user corresponding to the data and the data with high risk of fraud of high risk of fraud to construct
Multidimensional characteristic;According to the data with high risk of fraud, using multi-dimentional scale transformation model, by the multidimensional characteristic of building
It is visualized in lower dimensional space to obtain multiple data points;From multiple data points after visualization determine normal point and
The abnormal point is determined as potential fraud point by abnormal point;According to clustering algorithm, to the normal point and the potential fraud
Point is clustered, with the cluster after being clustered;Ratio shared by the potential fraud point in each cluster is calculated;It is taken advantage of potential
Swindleness point proportion is higher than the cluster of preset ratio as target cluster;By business number corresponding to each point in the target cluster
According to be determined as cheat data.
In one embodiment, the processor 102 is normal in the execution determination from multiple data points after visualization
Point and abnormal point are implemented as follows step when the abnormal point is determined as the step of potential fraud point:
The x coordinate value of all the points after obtaining visualization, y-coordinate value ..., k coordinate value, wherein k indicates visual
The dimension in the space of change;X value range is determined according to the x coordinate value of all the points, so that x coordinate value falls into x value range
The accounting of point reaches the first preset ratio;Y value range is determined according to the y-coordinate value of all the points, so that y-coordinate value falls into y
The accounting of the point of value range reaches the second preset ratio;......;K value range is determined according to the k coordinate value of all the points,
So that the accounting that k coordinate value falls into the point of k value range reaches k-th presumed ratio;It is taken according to determining x value range, y
Be worth range ..., k value range, determine a space;The point fallen into the space is determined as normal point, will not fallen within
Point in the space is determined as abnormal point, and the abnormal point is determined as potential fraud point.
In one embodiment, the processor 102 is normal in the execution determination from multiple data points after visualization
Point and abnormal point are implemented as follows step when the abnormal point is determined as the step of potential fraud point:
All the points after obtaining visualization, fit a threshold function table according to all the points after visualization;According to threshold value
Function determines normal point and abnormal point, and the abnormal point is determined as potential fraud point.
In one embodiment, the processor 102 is executing the data according to high risk of fraud, benefit
With multi-dimentional scale transformation model, the multidimensional characteristic of building is visualized in lower dimensional space to obtain multiple data points
When step, it is implemented as follows step:
Nonmetric type characteristics of variables in multidimensional characteristic is converted into metric form variable in such a way that dummy variable is converted
Feature;According to the multidimensional metric form characteristics of variables after the data and conversion with high risk of fraud, multi-dimentional scale is utilized
Transformation is handled, and obtains multiple data points to be visualized in lower dimensional space.
In one embodiment, the processor 102 execute it is described according to high risk of fraud data and
Multidimensional metric form characteristics of variables after conversion is handled using multi-dimentional scale transformation, to be visualized in lower dimensional space
When obtaining the step of multiple data points, it is implemented as follows step:
The item number n with the data of high risk of fraud is obtained, the dimension q of multidimensional metric form characteristics of variables is obtained, by n q
The characteristic of dimension obtains matrix X as sample data;Euclidean distance matrix D is calculated according to matrix X, wherein;
According to Euclidean distance matrix construction matrix A, whereinInner product matrix B is calculated according to matrix A, whereinCalculate the characteristic value and feature vector of inner product matrix B, wherein characteristic value is according to from big
It sorts to small sequence;Determine the dimension k in visual space;ReconstructWherein ΕkIt is the preceding k of inner product matrix B
The matrix of a feature vector composition, ΛkPreceding k eigenvalue cluster at diagonal matrix;Using the value reconstructed as k dimension space
In point.
In one embodiment, the processor 102 execute it is described according to clustering algorithm, to the normal point and described
Potential fraud point is clustered, and when with the step of the cluster after being clustered, is implemented as follows step:
Radius value and distance value are set;According to set radius value and distance value, by the normal point and described potential
Fraud point classification marker is core point, boundary point, noise point, and deletes noise point;Distance is no more than set distance value
Point be connected with each other, form a cluster, include in part core point and the part core point pre-determined distance value neighborhood in the cluster
Boundary point so obtains multiple clusters, using obtained multiple clusters as the cluster after cluster.
In one embodiment, the processor 102 described is sieved according to default rule engine from business datum executing
When selecting the step of the data with high risk of fraud, it is implemented as follows step:
Different screening rules is determined according to different business datums;Using default rule engine, according to different
Screening rule filters out the data with high risk of fraud from corresponding business datum.
It should be appreciated that in the embodiment of the present application, alleged processor 102 can be central processing unit (Central
Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital
Signal Processor, DSP), specific integrated circuit (application program lication Specific Integrated
Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other can
Programmed logic device, discrete gate or transistor logic, discrete hardware components etc..General processor can be micro process
Device or the processor are also possible to any conventional processor etc..
Those of ordinary skill in the art will appreciate that be all or part of stream in the method for realize above-described embodiment
Journey is relevant hardware can be instructed to complete by computer program.The computer program can be stored in a storage medium
In, which can be computer readable storage medium.The computer program is by least one of the computer system
Processor executes, to realize the process step of the embodiment of the above method.
Therefore, present invention also provides a kind of storage mediums.The storage medium can be computer readable storage medium.
The storage medium is stored with computer program, which performs the steps of when being executed by a processor
The data with high risk of fraud are filtered out from business datum according to default rule engine;According to the tool
There are the historical behavior data of user corresponding to the data and the data with high risk of fraud of high risk of fraud to construct
Multidimensional characteristic;According to the data with high risk of fraud, using multi-dimentional scale transformation model, by the multidimensional characteristic of building
It is visualized in lower dimensional space to obtain multiple data points;From multiple data points after visualization determine normal point and
The abnormal point is determined as potential fraud point by abnormal point;According to clustering algorithm, to the normal point and the potential fraud
Point is clustered, with the cluster after being clustered;Ratio shared by the potential fraud point in each cluster is calculated;It is taken advantage of potential
Swindleness point proportion is higher than the cluster of preset ratio as target cluster;By business number corresponding to each point in the target cluster
According to be determined as cheat data.
In one embodiment, the processor is executing the normal point determining from multiple data points after visualization
And abnormal point is implemented as follows step when the abnormal point is determined as the step of potential fraud point:
The x coordinate value of all the points after obtaining visualization, y-coordinate value ..., k coordinate value, wherein k indicates visual
The dimension in the space of change;X value range is determined according to the x coordinate value of all the points, so that x coordinate value falls into x value range
The accounting of point reaches the first preset ratio;Y value range is determined according to the y-coordinate value of all the points, so that y-coordinate value falls into y
The accounting of the point of value range reaches the second preset ratio;......;K value range is determined according to the k coordinate value of all the points,
So that the accounting that k coordinate value falls into the point of k value range reaches k-th presumed ratio;It is taken according to determining x value range, y
Be worth range ..., k value range, determine a space;The point fallen into the space is determined as normal point, will not fallen within
Point in the space is determined as abnormal point, and the abnormal point is determined as potential fraud point.
In one embodiment, the processor is executing the normal point determining from multiple data points after visualization
And abnormal point is implemented as follows step when the abnormal point is determined as the step of potential fraud point:
All the points after obtaining visualization, fit a threshold function table according to all the points after visualization;According to threshold value
Function determines normal point and abnormal point, and the abnormal point is determined as potential fraud point.
In one embodiment, the processor execute it is described according to the data of high risk of fraud, using more
Change of scale model is tieed up, the step of the multidimensional characteristic of building is visualized in lower dimensional space to obtain multiple data points
When, it is implemented as follows step:
Nonmetric type characteristics of variables in multidimensional characteristic is converted into metric form variable in such a way that dummy variable is converted
Feature;According to the multidimensional metric form characteristics of variables after the data and conversion with high risk of fraud, multi-dimentional scale is utilized
Transformation is handled, and obtains multiple data points to be visualized in lower dimensional space.
In one embodiment, the processor is executing the data according to high risk of fraud and is turning
Multidimensional metric form characteristics of variables after changing, using multi-dimentional scale transformation handled, with visualized in lower dimensional space with
When obtaining the step of multiple data points, it is implemented as follows step:
The item number n with the data of high risk of fraud is obtained, the dimension q of multidimensional metric form characteristics of variables is obtained, by n q
The characteristic of dimension obtains matrix X as sample data;Euclidean distance matrix D is calculated according to matrix X, wherein
According to Euclidean distance matrix construction matrix A, whereinInner product matrix B is calculated according to matrix A, whereinCalculate the characteristic value and feature vector of inner product matrix B, wherein characteristic value is according to from big
It sorts to small sequence;Determine the dimension k in visual space;ReconstructWherein ΕkIt is the preceding k of inner product matrix B
The matrix of a feature vector composition, ΛkPreceding k eigenvalue cluster at diagonal matrix;Using the value reconstructed as k dimension space
In point.
In one embodiment, the processor execute it is described according to clustering algorithm, to the normal point and described potential
Fraud point is clustered, and when with the step of the cluster after being clustered, is implemented as follows step:
Radius value and distance value are set;According to set radius value and distance value, by the normal point and described potential
Fraud point classification marker is core point, boundary point, noise point, and deletes noise point;Distance is no more than set distance value
Point be connected with each other, form a cluster, include in part core point and the part core point pre-determined distance value neighborhood in the cluster
Boundary point so obtains multiple clusters, using obtained multiple clusters as the cluster after cluster.
In one embodiment, the processor described is screened according to default rule engine from business datum executing
When providing the step of the data of high risk of fraud, it is implemented as follows step:
Different screening rules is determined according to different business datums;Using default rule engine, according to different
Screening rule filters out the data with high risk of fraud from corresponding business datum.
The storage medium can be USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), magnetic disk
Or the various computer readable storage mediums that can store program code such as CD.
In several embodiments provided herein, it should be understood that disclosed device, device and method, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, the division of the unit,
Only a kind of logical function partition, there may be another division manner in actual implementation.Those skilled in the art can be with
It is well understood, for convenience of description and succinctly, the specific work process of the device of foregoing description, equipment and unit can
With with reference to the corresponding process in preceding method embodiment, details are not described herein.The above, the only specific implementation of the application
Mode, but the protection scope of the application is not limited thereto, and anyone skilled in the art discloses in the application
Technical scope in, various equivalent modifications or substitutions can be readily occurred in, these modifications or substitutions should all cover in the application
Protection scope within.Therefore, the protection scope of the application should be subject to the protection scope in claims.
Claims (10)
1. a kind of anti-fraud method based on unsupervised learning, which is characterized in that the described method includes:
The data with high risk of fraud are filtered out from business datum according to default rule engine;
According to the history of user corresponding to the data with high risk of fraud and the data with high risk of fraud
Behavioral data constructs multidimensional characteristic;
According to the data with high risk of fraud, using multi-dimentional scale transformation model, by the multidimensional characteristic of building in low-dimensional
It is visualized in space to obtain multiple data points;
Normal point and abnormal point are determined from multiple data points after visualization, and the abnormal point is determined as potential fraud point;
According to clustering algorithm, the normal point and the potential fraud point are clustered, with the cluster after being clustered;
Ratio shared by the potential fraud point in each cluster is calculated;
Potential fraud point proportion is higher than the cluster of preset ratio as target cluster;
Business datum corresponding to each point in the target cluster is determined as to cheat data.
2. the method according to claim 1, wherein being determined in multiple data points from after visualization normal
Point and abnormal point, are determined as potential fraud point for the abnormal point, comprising:
The x coordinate value of all the points after obtaining visualization, y-coordinate value ..., k coordinate value, wherein k indicates visual empty
Between dimension;
X value range is determined according to the x coordinate value of all the points, so that the accounting that x coordinate value falls into the point of x value range reaches
First preset ratio;Y value range is determined according to the y-coordinate value of all the points, so that y-coordinate value falls into the point of y value range
Accounting reach the second preset ratio;......;K value range is determined according to the k coordinate value of all the points, so that k coordinate value
The accounting for falling into the point of k value range reaches k-th presumed ratio;
According to determining x value range, y value range ..., k value range, determine a space;
The point fallen into the space is determined as normal point, the point not fallen in the space is determined as abnormal point, it will be described different
Often point is determined as potential fraud point.
3. the method according to claim 1, wherein being determined in multiple data points from after visualization normal
Point and abnormal point, are determined as potential fraud point for the abnormal point, comprising:
All the points after obtaining visualization, fit a threshold function table according to all the points after visualization;
Normal point and abnormal point are determined according to threshold function table, and the abnormal point is determined as potential fraud point.
4. the method according to claim 1, wherein the data according to high risk of fraud, benefit
With multi-dimentional scale transformation model, the multidimensional characteristic of building is visualized in lower dimensional space to obtain multiple data points, packet
It includes:
Nonmetric type characteristics of variables in multidimensional characteristic is converted into metric form characteristics of variables in such a way that dummy variable is converted;
According to the multidimensional metric form characteristics of variables after the data and conversion with high risk of fraud, become using multi-dimentional scale
The processing of swap-in row, obtains multiple data points to be visualized in lower dimensional space.
5. according to the method described in claim 4, it is characterized in that, it is described according to high risk of fraud data and
Multidimensional metric form characteristics of variables after conversion is handled using multi-dimentional scale transformation, to be visualized in lower dimensional space
To obtain multiple data points, comprising:
The item number n with the data of high risk of fraud is obtained, the dimension q of multidimensional metric form characteristics of variables is obtained, n q is tieed up
Characteristic obtains matrix X as sample data;
Euclidean distance matrix D is calculated according to matrix X, wherein
According to Euclidean distance matrix construction matrix A, wherein
Inner product matrix B is calculated according to matrix A, wherein
Calculate the characteristic value and feature vector of inner product matrix B, wherein characteristic value sorts according to sequence from big to small;
Determine the dimension k in visual space;
ReconstructWherein ΕkIt is the matrix of the preceding k feature vector composition of inner product matrix B, ΛkIt is preceding k characteristic value
The diagonal matrix of composition;
Using the value reconstructed as the point in k dimension space.
6. the method according to claim 1, wherein described according to clustering algorithm, to the normal point and described
Potential fraud point is clustered, with the cluster after being clustered, comprising:
Radius value and distance value are set;
According to set radius value and distance value, by the normal point and the potential fraud point classification marker be core point,
Boundary point, noise point, and delete noise point;
Distance is no more than the point of set distance value to be connected with each other, form a cluster, include in the cluster part core point and
Boundary point in the part core point pre-determined distance value neighborhood, so obtains multiple clusters, using obtained multiple clusters as cluster after
Cluster.
7. the method according to claim 1, wherein described sieve from business datum according to default rule engine
Select the data with high risk of fraud, comprising:
Different screening rules is determined according to different business datums;
Using default rule engine, there is high fraud to filter out from corresponding business datum according to different screening rules
The data of risk.
8. a kind of anti-rogue device based on unsupervised learning, which is characterized in that the anti-fraud dress based on unsupervised learning
It sets and includes:
Screening unit, for filtering out the data with high risk of fraud from business datum according to default rule engine;
Feature construction unit, for according to the data and the data with high risk of fraud with high risk of fraud
The historical behavior data of corresponding user construct multidimensional characteristic;
Visualization, for having the data of high risk of fraud according to, using multi-dimentional scale transformation model, by building
Multidimensional characteristic is visualized in lower dimensional space to obtain multiple data points;
Point determination unit, it is for determining normal point and abnormal point from multiple data points after visualization, the abnormal point is true
It is set to potential fraud point;
Cluster cell, for being clustered to the normal point and the potential fraud point, to be clustered according to clustering algorithm
Cluster afterwards;
Ratio computing unit, ratio shared by the potential fraud point for being calculated in each cluster;
Cluster determination unit, for potential fraud point proportion to be higher than the cluster of preset ratio as target cluster;
Data determination unit is cheated, cheats number for business datum corresponding to each point in the target cluster to be determined as
According to.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory, and is connected with the memory
Processor;
The memory is for storing computer program;The processor is for running the computer journey stored in the memory
Sequence, to execute the method according to claim 1 to 7.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence when the computer program is executed by processor, realizes the method according to claim 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910011758.8A CN109992578B (en) | 2019-01-07 | 2019-01-07 | Anti-fraud method and device based on unsupervised learning, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910011758.8A CN109992578B (en) | 2019-01-07 | 2019-01-07 | Anti-fraud method and device based on unsupervised learning, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109992578A true CN109992578A (en) | 2019-07-09 |
CN109992578B CN109992578B (en) | 2023-08-08 |
Family
ID=67129914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910011758.8A Active CN109992578B (en) | 2019-01-07 | 2019-01-07 | Anti-fraud method and device based on unsupervised learning, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109992578B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659997A (en) * | 2019-08-15 | 2020-01-07 | 中国平安财产保险股份有限公司 | Data cluster identification method and device, computer system and readable storage medium |
CN111145911A (en) * | 2019-12-20 | 2020-05-12 | 平安医疗健康管理股份有限公司 | Abnormal data identification processing method and device, computer equipment and storage medium |
CN111861767A (en) * | 2020-07-29 | 2020-10-30 | 贵州力创科技发展有限公司 | System and method for monitoring vehicle insurance fraud behaviors |
CN111861699B (en) * | 2020-07-02 | 2021-06-22 | 北京睿知图远科技有限公司 | Anti-fraud index generation method based on operator data |
WO2022001140A1 (en) * | 2020-06-29 | 2022-01-06 | 苏州浪潮智能科技有限公司 | Operation behavior monitoring method and apparatus, electronic device, and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104376078A (en) * | 2014-11-14 | 2015-02-25 | 南京大学 | Abnormal data detection method based on knowledge entropy |
CN106874658A (en) * | 2017-01-18 | 2017-06-20 | 天津艾登科技有限公司 | A kind of medical insurance fraud recognition methods based on Principal Component Analysis Algorithm |
CN107391569A (en) * | 2017-06-16 | 2017-11-24 | 阿里巴巴集团控股有限公司 | Identification, model training, Risk Identification Method, device and the equipment of data type |
CN107657536A (en) * | 2017-02-20 | 2018-02-02 | 平安科技(深圳)有限公司 | The recognition methods of social security fraud and device |
US20180240273A1 (en) * | 2017-02-23 | 2018-08-23 | International Business Machines Corporation | Displaying data lineage using three dimensional virtual reality model |
CN108564460A (en) * | 2018-01-12 | 2018-09-21 | 阳光财产保险股份有限公司 | Real-time fraud detection method under internet credit scene and device |
-
2019
- 2019-01-07 CN CN201910011758.8A patent/CN109992578B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104376078A (en) * | 2014-11-14 | 2015-02-25 | 南京大学 | Abnormal data detection method based on knowledge entropy |
CN106874658A (en) * | 2017-01-18 | 2017-06-20 | 天津艾登科技有限公司 | A kind of medical insurance fraud recognition methods based on Principal Component Analysis Algorithm |
CN107657536A (en) * | 2017-02-20 | 2018-02-02 | 平安科技(深圳)有限公司 | The recognition methods of social security fraud and device |
US20180240273A1 (en) * | 2017-02-23 | 2018-08-23 | International Business Machines Corporation | Displaying data lineage using three dimensional virtual reality model |
CN107391569A (en) * | 2017-06-16 | 2017-11-24 | 阿里巴巴集团控股有限公司 | Identification, model training, Risk Identification Method, device and the equipment of data type |
WO2018228428A1 (en) * | 2017-06-16 | 2018-12-20 | 阿里巴巴集团控股有限公司 | Data type identification, model training, and risk identification method and apparatus, and device |
CN108564460A (en) * | 2018-01-12 | 2018-09-21 | 阳光财产保险股份有限公司 | Real-time fraud detection method under internet credit scene and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659997A (en) * | 2019-08-15 | 2020-01-07 | 中国平安财产保险股份有限公司 | Data cluster identification method and device, computer system and readable storage medium |
CN110659997B (en) * | 2019-08-15 | 2023-06-27 | 中国平安财产保险股份有限公司 | Data cluster recognition method, device, computer system and readable storage medium |
CN111145911A (en) * | 2019-12-20 | 2020-05-12 | 平安医疗健康管理股份有限公司 | Abnormal data identification processing method and device, computer equipment and storage medium |
WO2022001140A1 (en) * | 2020-06-29 | 2022-01-06 | 苏州浪潮智能科技有限公司 | Operation behavior monitoring method and apparatus, electronic device, and storage medium |
US11693957B1 (en) | 2020-06-29 | 2023-07-04 | Inspur Suzhou Intelligent Technology Co., Ltd. | Operation behavior monitoring method and apparatus, electronic device, and storage medium |
CN111861699B (en) * | 2020-07-02 | 2021-06-22 | 北京睿知图远科技有限公司 | Anti-fraud index generation method based on operator data |
CN111861767A (en) * | 2020-07-29 | 2020-10-30 | 贵州力创科技发展有限公司 | System and method for monitoring vehicle insurance fraud behaviors |
Also Published As
Publication number | Publication date |
---|---|
CN109992578B (en) | 2023-08-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992578A (en) | Anti- fraud method, apparatus, computer equipment and storage medium based on unsupervised learning | |
CN109523412A (en) | Intelligent core protects method, apparatus, computer equipment and computer readable storage medium | |
CN109859054A (en) | Network community method for digging, device, computer equipment and storage medium | |
CN108352035A (en) | Dynamic topology system and method for efficient claims treatment | |
CN106156791A (en) | Business datum sorting technique and device | |
CA2518394A1 (en) | Method and apparatus for constructing a forecast standard deviation for automated valuation modeling | |
CN108021945A (en) | A kind of transformer state evaluation model method for building up and device | |
CN107992883A (en) | A kind of metering industry customer's divided method based on CRFM models | |
CN108830830A (en) | A kind of quantitative detecting method of brain atrophy, detection device and terminal device | |
CN111563821A (en) | Financial stock fluctuation prediction method based on quantitative investment of support vector machine | |
CN110503566A (en) | Air control method for establishing model, device, computer equipment and storage medium | |
Brunekreef et al. | Nature of spatial universes in 3D Lorentzian quantum gravity | |
CN108280224B (en) | Ten thousand grades of dimension data generation methods, device, equipment and storage medium | |
CN104573696B (en) | Method and apparatus for handling face characteristic data | |
TWI599896B (en) | Multiple decision attribute selection and data discretization classification method | |
CN114926261A (en) | Method and medium for predicting fraud probability of automobile financial user application | |
CN107845407A (en) | Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined | |
CN108985811A (en) | Method, apparatus and electronic equipment for precision marketing | |
Porell et al. | Alternative geographic configurations for Medicare payments to health maintenance organizations | |
Młodak | Classification of multivariate objects using interval quantile classes | |
Chakrabarti et al. | A new test for simple tree alternative in a 2 xk table | |
CN113268477A (en) | Data table cleaning method and device and server | |
CN116307085A (en) | Intelligent prediction-based gridding power distribution planning method, device, equipment and medium | |
CN112258328A (en) | Reliability optimization method for crop insurance pricing | |
Huang et al. | Detecting patterns of bivariate mean vectors using model‐selection criteria |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |