CN109801073A - Risk subscribers recognition methods, device, computer equipment and storage medium - Google Patents
Risk subscribers recognition methods, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109801073A CN109801073A CN201811527385.1A CN201811527385A CN109801073A CN 109801073 A CN109801073 A CN 109801073A CN 201811527385 A CN201811527385 A CN 201811527385A CN 109801073 A CN109801073 A CN 109801073A
- Authority
- CN
- China
- Prior art keywords
- node
- knot vector
- risk subscribers
- cluster
- sampling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 238000003860 storage Methods 0.000 title claims abstract description 20
- 239000013598 vector Substances 0.000 claims abstract description 60
- 238000009826 distribution Methods 0.000 claims abstract description 51
- 238000005070 sampling Methods 0.000 claims abstract description 46
- 230000008569 process Effects 0.000 claims description 31
- 239000000203 mixture Substances 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 14
- 238000005192 partition Methods 0.000 claims description 13
- 238000003780 insertion Methods 0.000 claims description 3
- 230000037431 insertion Effects 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 14
- 238000013508 migration Methods 0.000 description 8
- 230000005012 migration Effects 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 239000004744 fabric Substances 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000008439 repair process Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005295 random walk Methods 0.000 description 2
- 238000013077 scoring method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses risk subscribers recognition methods, device, computer equipment and storage mediums.This method comprises: obtaining the corresponding node of Claims Resolution data, being embedded in by figure by Node is knot vector;The knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups;The probability distribution of each cluster group is obtained by sampling, if being greater than preset probability threshold value according to the probability value that the probability distribution of cluster group obtains, node included by corresponding cluster group is carried out to the mark of risk subscribers.This method makes parameter setting more reasonable by machine learning model, and improves the accuracy of risk client prediction.
Description
Technical field
The present invention relates to fraud identification technology fields more particularly to a kind of risk subscribers recognition methods, device, computer to set
Standby and storage medium.
Background technique
Currently, traditional user's fraud scoring method is scored using rule in the anti-fraud analysis based on social networks,
The setting of middle rule setting and regular score value be summed up based on business experience come.After user data is made into network, network meeting
A certain number of rules of triggering simultaneously calculate the regular total score of triggering, use the insurance fraud risk of total score assessment user.But it is existing
Some methods have the disadvantage that
1) rule feature is relatively simple, cannot cover recessive fraud mode;
2) regular score value by virtue of experience provides, inaccurate;
3) traditional statistical method does not have applicating history data learning parameter, and error is larger.
Summary of the invention
The embodiment of the invention provides a kind of risk subscribers recognition methods, device, computer equipment and storage mediums, it is intended to
It solves user's fraud scoring method in the prior art to score using rule, wherein the setting of rule setting and regular score value is to be based on
Business experience, which sums up, to be come, and recessive fraud mode, and the problem that calculated result accuracy is not high cannot be covered.
In a first aspect, the embodiment of the invention provides a kind of risk subscribers recognition methods comprising:
The corresponding node of Claims Resolution data is obtained, being embedded in by figure by Node is knot vector;
The knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups;
The probability distribution that each cluster group is obtained by sampling, if big according to the probability value that the probability distribution of cluster group obtains
In preset probability threshold value, node included by corresponding cluster group is carried out to the mark of risk subscribers.
Second aspect, the embodiment of the invention provides a kind of risk subscribers identification devices comprising:
Knot vector acquiring unit, for obtaining the corresponding node of Claims Resolution data, being embedded in by figure by Node is section
Point vector;
Cluster cell obtains multiple for being clustered by Bayes's nonparametric mixed model to the knot vector
Cluster group;
Risk identification unit, for obtaining the probability distribution of each cluster group by sampling, if according to the probability of cluster group
It is distributed obtained probability value and is greater than preset probability threshold value, node included by corresponding cluster group is carried out to the mark of risk subscribers
Know.
The third aspect, the embodiment of the present invention provide a kind of computer equipment again comprising memory, processor and storage
On the memory and the computer program that can run on the processor, the processor execute the computer program
Risk subscribers recognition methods described in the above-mentioned first aspect of Shi Shixian.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, wherein the computer can
It reads storage medium and is stored with computer program, it is above-mentioned that the computer program when being executed by a processor executes the processor
Risk subscribers recognition methods described in first aspect.
The embodiment of the invention provides a kind of risk subscribers recognition methods, device, computer equipment and storage mediums.The party
Method includes obtaining the corresponding node of Claims Resolution data, and being embedded in by figure by Node is knot vector;Pass through Bayes's nonparametric
Mixed model clusters the knot vector, obtains multiple cluster groups;The probability point of each cluster group is obtained by sampling
Cloth will be included by corresponding cluster group if being greater than preset probability threshold value according to the probability value that the probability distribution of cluster group obtains
Node carry out risk subscribers mark.This method makes parameter setting more reasonable by machine learning model, and improves
The accuracy of risk client prediction.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow diagram of risk subscribers recognition methods provided in an embodiment of the present invention;
Fig. 2 is the sub-process schematic diagram of risk subscribers recognition methods provided in an embodiment of the present invention;
Fig. 3 is another sub-process schematic diagram of risk subscribers recognition methods provided in an embodiment of the present invention;
Fig. 4 is another sub-process schematic diagram of risk subscribers recognition methods provided in an embodiment of the present invention;
Fig. 5 is the schematic block diagram of risk subscribers identification device provided in an embodiment of the present invention;
Fig. 6 is the subelement schematic block diagram of risk subscribers identification device provided in an embodiment of the present invention;
Fig. 7 is another subelement schematic block diagram of risk subscribers identification device provided in an embodiment of the present invention;
Fig. 8 is another subelement schematic block diagram of risk subscribers identification device provided in an embodiment of the present invention;
Fig. 9 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction
Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded
Body, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment
And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on
Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is
Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
Referring to Fig. 1, Fig. 1 is the flow diagram of risk subscribers recognition methods provided in an embodiment of the present invention, the risk
User identification method is applied in intelligent terminal, and this method is executed by the application software being installed in intelligent terminal.
As shown in Figure 1, the method comprising the steps of S110~S130.
S110, the corresponding node of Claims Resolution data is obtained, being embedded in by figure by Node is knot vector.
In the present embodiment, when enterprise obtain magnanimity case data (case data can be understood as Claims Resolution data.Example
If the case data under vehicle insurance Claims Resolution scene include driver, reporter, beneficiary and the wounded and repair shop, phone number, inspection
Repair the data such as place, GPS information) when, in order to more easily carry out the identification of risk subscribers, section is converted by Claims Resolution data correspondence
Point carries out data mining and behavioural analysis by way of figure.
Wherein, it since each data impossible in case data are converted to a node, therefore can selectively select
A portion data are as master data and corresponding generation node, and remaining data are then used as master data in above-mentioned generation node
Attribute data.Such as reporter is as master data, the telephone number of reporter, identification card number are as its attribute data.
Being embedded in (Graph Embedding) method for Node by figure again later is knot vector, is first to figure
It is sampled, the sequence construct model then drawn by sampling.
In one embodiment, as shown in Fig. 2, step S110 includes:
S111, knitmesh is carried out according to the node of Claims Resolution data, obtains initial graph;
S112, the initial graph is sampled by the sampling of Weight, obtains multiple sample sequences, by each sampling
Sequence is as a knot vector.
In the present embodiment, knitmesh is carried out according to the node of Claims Resolution data, what is obtained is connected by the connection side of Weight
The initial graph connect.By the sampling of Weight, (sampling of Weight specifically such as weighted walk), obtains multiple samplings later
Sequence, using each sample sequence as a knot vector.By the sampling of Weight, can effectively excavate in initial graph
Existing potential relationship between indirect connection node.
In one embodiment, as shown in figure 3, step S111 includes:
S1111, using the node for data of settling a claim as the start node of figure;
If there are data correlations between S1112, start node, by corresponding node by the connection of connection side, and pass through number
According to the corresponding weighted value for obtaining connection side of associated weighted value, with obtain include weight connection side initial graph.
Wherein, the connection side between node and node represents to exist between node and contact, and the weighted value for connecting side indicates section
Associated weighted value between point.Pass through
Using Weight the method for sampling when, make sampling as far as possible toward popular node direction migration, such as with the presence of a figure four
This four nodes of a node, respectively A, B, C, D, the weighted value that side is connected between A and B connect the power on side between 0.1, A and C
Weight values be between 0.7, B and C connect side weighted value connected for 0.4, C and D between side weighted value be 0.8.Assuming that migration 2
Step, from node A, when taking next neighbor node at random, if it is random walk algorithm (i.e. walk random algorithm),
Can equiprobable migration but node C can be taken with 7/8 probability to B or C node, then with 8/12 probability migration to node D,
It can finally be produced in maximum probability very much and carry out a sequence (A, C, D), for original graph, node A and node D are no associated, but
It is the sampling by Weight, can effectively excavates the relationship of egress A and node D.
S120, the knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups.
In the present embodiment, Bayes's nonparametric mixed model is a kind of Bayes being defined on infinite dimension parameter space
The size of model, Bayes's nonparametric mixed model can the change of adaptive model with increasing or reducing for data in model
Change, can determine model according to how much selection parameters of data.Wherein, common Bayes's nonparametric mixed model is Di Li
Cray process mixed model.
Specifically, Di Li Cray process mixed model is defined as follows:
There is one group of data y1, y2... ..., yn, they are independent from each other and from some unknown distributions.yiIt can be with
It is polynary, yiValue can be real number, be also possible to classification type.Assuming that yiThese observations are from a mixing point
Cloth, form are F (θ), and parameter θ is from Di Li Cray process (Dirichlet Process, DP) G, the i.e. parameter of the distribution
Priori be a Di Li Cray process, lumped parameter α, base distribution is G0.So, Di Li Cray process mixed model is such as
Under:
yi|θi~F (θi)
θi| G~G
G~DP (G0, α)
Not needing specified categorical measure by Di Li Cray process can infer that categorical measure, classics are realized from data
Process can refer to Chinese-style restaurant's process (English abbreviation of Chinese-style restaurant's process is CRP).Assume that there is unlimited table in only one China restaurant
Son, first customer are sitting on first desk after arriving.Second customer comes can choose and be sitting on first desk,
Also it can choose and be sitting on a new desk, it is assumed that when (n+1)th customer arrives, having had on k desks has customer
, n has been sat respectively1, n2... ..., nkA customer, then (n+1)th customer can be n with probabilityi/ (α+n) is sitting in i-th table
On son, wherein niFor the Number of Customers on i-th desk;Having probability simultaneously is that α/(α+n) one new desk of selection is sat down.So
After n customer settles down, it is clear that this n customer has been divided into K heap, i.e. K cluster by CRP, can prove that CRP is exactly one
Di Li Cray process.
In one embodiment, as shown in figure 4, step S120 includes:
The Gaussian mixtures that S121, acquisition are made of the knot vector;
S122, the finite partition for meeting Di Li Cray process in Gaussian mixtures is obtained, obtained corresponding with knot vector
Multiple cluster groups.
In the present embodiment, the Gaussian mixtures for obtaining knot vector composition are to obtain F (θ), when obtaining
After this prior distribution (i.e. Gaussian mixtures) of F (θ), if any one denumerable division can be found, (denumerable division is denoted as T1,
T2... ..., Tk, denumerable division also is understood as finite partition) make:
G(T1) ... ..., G (Tk)~Dir (α G0(T1) ... ..., α G0(Tk));
So, finite partition T1, T2... ..., TkIt can be considered multiple cluster groups such as corresponding with knot vector.Pass through Di
Sharp Cray process mixed model can quickly cluster multiple knot vectors to realize grouping, convenient for more efficiently excavating
Connection between knot vector.
S130, the probability distribution that each cluster group is obtained by sampling, if being obtained according to the probability distribution of cluster group general
Rate value is greater than preset probability threshold value, and node included by corresponding cluster group is carried out to the mark of risk subscribers.
In the present embodiment, since each cluster group can obtain corresponding probability distribution (such as Poisson point by sampling
Cloth), after the probability distribution for obtaining each Probability Group, the probability value for calculating each probability distribution can be corresponded to.If it exists certain
The probability distribution of one or more cluster group corresponds to probability value greater than the probability threshold value, indicates that there are risks in corresponding cluster group
The Probability maximum of user needs that the mark that node included by cluster group carries out risk subscribers will be corresponded at this time.
In one embodiment, the probability distribution of each cluster group is obtained in step S130 by sampling, comprising:
The probability distribution of each cluster group is obtained by Gibbs model.
In statistics and statistical physics, gibbs sampling (gibbs sampling is Gibbs model) is that Markov chain is covered
It is used to obtain in special karr theoretical (Markov Chain Monte Carlo, abbreviation MCMC) and a series of is approximately equal to specified multidimensional
The algorithm of probability distribution (such as joint probability distribution of 2 or multiple variables immediately) observation sample.Wherein Markov chain
It is the set of one group of event, in this set, event occurs one by one, and the generation of next event, only
It is determined by the event currently occurred.The probability distribution of each cluster group can be obtained, quickly by Gibbs model so as to count
The probability value of cluster group is calculated to carry out the accurate identification of risk client.
This method crosses machine learning model and makes parameter setting more reasonable, and improves the accurate of risk client prediction
Degree.
The embodiment of the present invention also provides a kind of risk subscribers identification device, and the risk subscribers identification device is aforementioned for executing
Any embodiment of risk subscribers recognition methods.Specifically, referring to Fig. 5, Fig. 5 is risk subscribers provided in an embodiment of the present invention
The schematic block diagram of identification device.The risk subscribers identification device 100 can be configured in intelligent terminal.
As shown in figure 5, risk subscribers identification device 100 includes knot vector acquiring unit 110, cluster cell 120, risk
Recognition unit 130.
Knot vector acquiring unit 110 is used to obtain the corresponding node of Claims Resolution data, is by Node by figure insertion
Knot vector.
In the present embodiment, when enterprise obtain magnanimity case data (case data can be understood as Claims Resolution data.Example
If the case data under vehicle insurance Claims Resolution scene include driver, reporter, beneficiary and the wounded and repair shop, phone number, inspection
Repair the data such as place, GPS information) when, in order to more easily carry out the identification of risk subscribers, section is converted by Claims Resolution data correspondence
Point carries out data mining and behavioural analysis by way of figure.
Wherein, it since each data impossible in case data are converted to a node, therefore can selectively select
A portion data are as master data and corresponding generation node, and remaining data are then used as master data in above-mentioned generation node
Attribute data.Such as reporter is as master data, the telephone number of reporter, identification card number are as its attribute data.
Being embedded in (Graph Embedding) method for Node by figure again later is knot vector, is first to figure
It is sampled, the sequence construct model then drawn by sampling.
In one embodiment, as shown in fig. 6, knot vector acquiring unit 110 includes:
Initial knitmesh unit 111 obtains initial graph for carrying out knitmesh according to the node of Claims Resolution data;
Weight sampling unit 112 samples the initial graph for the sampling by Weight, obtains multiple samplings
Sequence, using each sample sequence as a knot vector.
In the present embodiment, knitmesh is carried out according to the node of Claims Resolution data, what is obtained is connected by the connection side of Weight
The initial graph connect.By the sampling of Weight, (sampling of Weight specifically such as weighted walk), obtains multiple samplings later
Sequence, using each sample sequence as a knot vector.By the sampling of Weight, can effectively excavate in initial graph
Existing potential relationship between indirect connection node.
In one embodiment, as shown in fig. 7, initial knitmesh unit 111 includes:
Node builds unit 1111, start node of the node for the data that will settle a claim as figure;
Node connection unit 1112, if corresponding node is passed through connection for there are data correlations between start node
Side connection, and by the corresponding weighted value for obtaining connection side of the weighted value of data correlation, it include weight connection side to obtain
Initial graph.
Wherein, the connection side between node and node represents to exist between node and contact, and the weighted value for connecting side indicates section
Associated weighted value between point.Pass through
Using Weight the method for sampling when, make sampling as far as possible toward popular node direction migration, such as with the presence of a figure four
This four nodes of a node, respectively A, B, C, D, the weighted value that side is connected between A and B connect the power on side between 0.1, A and C
Weight values be between 0.7, B and C connect side weighted value connected for 0.4, C and D between side weighted value be 0.8.Assuming that migration 2
Step, from node A, when taking next neighbor node at random, if it is random walk algorithm (i.e. walk random algorithm),
Can equiprobable migration but node C can be taken with 7/8 probability to B or C node, then with 8/12 probability migration to node D,
It can finally be produced in maximum probability very much and carry out a sequence (A, C, D), for original graph, node A and node D are no associated, but
It is the sampling by Weight, can effectively excavates the relationship of egress A and node D.
Cluster cell 120 obtains more for being clustered by Bayes's nonparametric mixed model to the knot vector
A cluster group.
In the present embodiment, Bayes's nonparametric mixed model is a kind of Bayes being defined on infinite dimension parameter space
The size of model, Bayes's nonparametric mixed model can the change of adaptive model with increasing or reducing for data in model
Change, can determine model according to how much selection parameters of data.Wherein, common Bayes's nonparametric mixed model is Di Li
Cray process mixed model.
Specifically, Di Li Cray process mixed model is defined as follows:
There is one group of data y1, y2... ..., yn, they are independent from each other and from some unknown distributions.yiIt can be with
It is polynary, yiValue can be real number, be also possible to classification type.Assuming that yiThese observations are from a mixing point
Cloth, form are F (θ), and parameter θ is from Di Li Cray process (Dirichlet Process, DP) G, the i.e. parameter of the distribution
Priori be a Di Li Cray process, lumped parameter α, base distribution is G0.So, Di Li Cray process mixed model is such as
Under:
yi|θi~F (θi)
θi| G~G
G~DP (G0, α)
Not needing specified categorical measure by Di Li Cray process can infer that categorical measure, classics are realized from data
Process can refer to Chinese-style restaurant's process (English abbreviation of Chinese-style restaurant's process is CRP).
In one embodiment, as shown in figure 8, cluster cell 120 includes:
Prior distribution acquiring unit 121, for obtaining the Gaussian mixtures being made of the knot vector;
Finite partition unit 122 is obtained for obtaining the finite partition for meeting Di Li Cray process in Gaussian mixtures
Multiple cluster groups corresponding with knot vector.
In the present embodiment, the Gaussian mixtures for obtaining knot vector composition are to obtain F (θ), when obtaining
After this prior distribution (i.e. Gaussian mixtures) of F (θ), if any one denumerable division can be found, (denumerable division is denoted as T1,
T2... ..., Tk, denumerable division also is understood as finite partition) make:
G(T1) ... ..., G (Tk)~Dir (α G0(T1) ... ..., α G0(Tk));
So, finite partition T1, T2... ..., TkIt can be considered multiple cluster groups such as corresponding with knot vector.Pass through Di
Sharp Cray process mixed model can quickly cluster multiple knot vectors to realize grouping, convenient for more efficiently excavating
Connection between knot vector.
Risk identification unit 130, for obtaining the probability distribution of each cluster group by sampling, if according to the general of cluster group
The probability value that rate is distributed is greater than preset probability threshold value, and node included by corresponding cluster group is carried out risk subscribers
Mark.
In the present embodiment, since each cluster group can obtain corresponding probability distribution (such as Poisson point by sampling
Cloth), after the probability distribution for obtaining each Probability Group, the probability value for calculating each probability distribution can be corresponded to.If it exists certain
The probability distribution of one or more cluster group corresponds to probability value greater than the probability threshold value, indicates that there are risks in corresponding cluster group
The Probability maximum of user needs that the mark that node included by cluster group carries out risk subscribers will be corresponded at this time.
In one embodiment, risk identification unit 130 is also used to:
The probability distribution of each cluster group is obtained by Gibbs model.
In statistics and statistical physics, gibbs sampling (gibbs sampling is Gibbs model) is that Markov chain is covered
It is used to obtain in special karr theoretical (Markov Chain Monte Carlo, abbreviation MCMC) and a series of is approximately equal to specified multidimensional
The algorithm of probability distribution (such as joint probability distribution of 2 or multiple variables immediately) observation sample.Wherein Markov chain
It is the set of one group of event, in this set, event occurs one by one, and the generation of next event, only
It is determined by the event currently occurred.The probability distribution of each cluster group can be obtained, quickly by Gibbs model so as to count
The probability value of cluster group is calculated to carry out the accurate identification of risk client.
The device makes parameter setting more reasonable by machine learning model, and improves the accurate of risk client prediction
Degree.
Above-mentioned risk subscribers identification device can be implemented as the form of computer program, which can such as scheme
It is run in computer equipment shown in 9.
Referring to Fig. 9, Fig. 9 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.
Refering to Fig. 9, which includes processor 502, memory and the net connected by system bus 501
Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.
The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program
5032 are performed, and processor 502 may make to execute risk subscribers recognition methods.
The processor 502 supports the operation of entire computer equipment 500 for providing calculating and control ability.
The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, should
When computer program 5032 is executed by processor 502, processor 502 may make to execute risk subscribers recognition methods.
The network interface 505 is for carrying out network communication, such as the transmission of offer data information.Those skilled in the art can
To understand, structure shown in Fig. 9, only the block diagram of part-structure relevant to the present invention program, is not constituted to this hair
The restriction for the computer equipment 500 that bright scheme is applied thereon, specific computer equipment 500 may include than as shown in the figure
More or fewer components perhaps combine certain components or with different component layouts.
Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize following function
Can: the corresponding node of Claims Resolution data is obtained, being embedded in by figure by Node is knot vector;It is mixed by Bayes's nonparametric
Model clusters the knot vector, obtains multiple cluster groups;The probability distribution of each cluster group is obtained by sampling, if
It is greater than preset probability threshold value according to the probability value that the probability distribution of cluster group obtains, by node included by corresponding cluster group
Carry out the mark of risk subscribers.
In one embodiment, processor 502 is the step of executing described by figure insertion is knot vector by Node
When, it performs the following operations: knitmesh being carried out according to the node of Claims Resolution data, obtains initial graph;By the sampling of Weight to described
Initial graph is sampled, and multiple sample sequences are obtained, using each sample sequence as a knot vector.
In one embodiment, processor 502 is executing the node progress knitmesh according to Claims Resolution data, obtains initial graph
Step when, perform the following operations: using the node for data of settling a claim as the start node of figure;If there are data between start node
Association, by corresponding node by the connection of connection side, and the corresponding weighted value for obtaining connection side of weighted value for passing through data correlation,
With obtain include weight connection side initial graph.
In one embodiment, processor 502 execute it is described by Bayes's nonparametric mixed model to the node to
Amount is clustered, and when obtaining the step of multiple cluster groups, performs the following operations: it is high to obtain the mixing being made of the knot vector
This distribution;The finite partition for meeting Di Li Cray process in Gaussian mixtures is obtained, is obtained corresponding with knot vector multiple
Cluster group.
In one embodiment, processor 502 is in the step for executing the probability distribution for obtaining each cluster group by sampling
It when rapid, performs the following operations: obtaining the probability distribution of each cluster group by Gibbs model.
It will be understood by those skilled in the art that the embodiment of computer equipment shown in Fig. 9 is not constituted to computer
The restriction of equipment specific composition, in other embodiments, computer equipment may include components more more or fewer than diagram, or
Person combines certain components or different component layouts.For example, in some embodiments, computer equipment can only include depositing
Reservoir and processor, in such embodiments, the structure and function of memory and processor are consistent with embodiment illustrated in fig. 9,
Details are not described herein.
It should be appreciated that in embodiments of the present invention, processor 502 can be central processing unit (Central
Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital
Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit,
ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic
Device, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or
Person's processor is also possible to any conventional processor etc..
Computer readable storage medium is provided in another embodiment of the invention.The computer readable storage medium can be with
For non-volatile computer readable storage medium.The computer-readable recording medium storage has computer program, wherein calculating
Machine program performs the steps of acquisition Claims Resolution data corresponding node when being executed by processor, be embedded in by figure by Node
For knot vector;The knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups;Pass through
Sampling obtains the probability distribution of each cluster group, if being greater than preset probability according to the probability value that the probability distribution of cluster group obtains
Node included by corresponding cluster group is carried out the mark of risk subscribers by threshold value.
In one embodiment, described be embedded in by figure by Node is knot vector, comprising: according to the section of Claims Resolution data
Point carries out knitmesh, obtains initial graph;The initial graph is sampled by the sampling of Weight, obtains multiple sample sequences,
Using each sample sequence as a knot vector.
In one embodiment, the node according to Claims Resolution data carries out knitmesh, obtains initial graph, comprising: by number of settling a claim
According to start node of the node as figure;If there are data correlations between start node, corresponding node is connected by connection side
It connects, and includes the initial of weight connection side to obtain by the corresponding weighted value for obtaining connection side of the weighted value of data correlation
Figure.
In one embodiment, described that the knot vector is clustered by Bayes's nonparametric mixed model, it obtains
Multiple cluster groups, comprising: obtain the Gaussian mixtures being made of the knot vector;It obtains in Gaussian mixtures and meets Di
The finite partition of sharp Cray process obtains multiple cluster groups corresponding with knot vector.
In one embodiment, the probability distribution that each cluster group is obtained by sampling, comprising: pass through Gibbs model
Obtain the probability distribution of each cluster group.
It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is set
The specific work process of standby, device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Those of ordinary skill in the art may be aware that unit described in conjunction with the examples disclosed in the embodiments of the present disclosure and algorithm
Step can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and software
Interchangeability generally describes each exemplary composition and step according to function in the above description.These functions are studied carefully
Unexpectedly the specific application and design constraint depending on technical solution are implemented in hardware or software.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
The scope of the present invention.
In several embodiments provided by the present invention, it should be understood that disclosed unit and method, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only logical function partition, there may be another division manner in actual implementation, can also will be with the same function
Unit set is at a unit, such as multiple units or components can be combined or can be integrated into another system or some
Feature can be ignored, or not execute.In addition, shown or discussed mutual coupling, direct-coupling or communication connection can
Be through some interfaces, the indirect coupling or communication connection of device or unit, be also possible to electricity, mechanical or other shapes
Formula connection.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.Some or all of unit therein can be selected to realize the embodiment of the present invention according to the actual needs
Purpose.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated
Unit both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The all or part of part or the technical solution that technology contributes can be embodied in the form of software products, should
Computer software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be
Personal computer, server or network equipment etc.) execute all or part of step of each embodiment the method for the present invention
Suddenly.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk or
The various media that can store program code such as person's CD.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right
It is required that protection scope subject to.
Claims (10)
1. a kind of risk subscribers recognition methods characterized by comprising
The corresponding node of Claims Resolution data is obtained, being embedded in by figure by Node is knot vector;
The knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups;
The probability distribution of each cluster group is obtained by sampling, if being greater than according to the probability value that the probability distribution of cluster group obtains pre-
If probability threshold value, by it is corresponding cluster group included by node carry out risk subscribers mark.
2. risk subscribers recognition methods according to claim 1, which is characterized in that described to be embedded in by figure by Node
For knot vector, comprising:
Knitmesh is carried out according to the node of Claims Resolution data, obtains initial graph;
The initial graph is sampled by the sampling of Weight, obtains multiple sample sequences, using each sample sequence as
One knot vector.
3. risk subscribers recognition methods according to claim 2, which is characterized in that it is described according to Claims Resolution data node into
Row knitmesh, obtains initial graph, comprising:
Using the node for data of settling a claim as the start node of figure;
If there are data correlations between start node, corresponding node is connected by connection side, and the power for passing through data correlation
The corresponding weighted value for obtaining connection side of weight values, with obtain include weight connection side initial graph.
4. risk subscribers recognition methods according to claim 1, which is characterized in that described to be mixed by Bayes's nonparametric
Model clusters the knot vector, obtains multiple cluster groups, comprising:
Obtain the Gaussian mixtures being made of the knot vector;
The finite partition for meeting Di Li Cray process in Gaussian mixtures is obtained, multiple clusters corresponding with knot vector are obtained
Group.
5. risk subscribers recognition methods according to claim 1, which is characterized in that described to obtain each cluster by sampling
The probability distribution of group, comprising:
The probability distribution of each cluster group is obtained by Gibbs model.
6. a kind of risk subscribers identification device characterized by comprising
Knot vector acquiring unit, for obtaining the corresponding node of Claims Resolution data, by figure insertion by Node be node to
Amount;
Cluster cell obtains multiple clusters for clustering by Bayes's nonparametric mixed model to the knot vector
Group;
Risk identification unit, for obtaining the probability distribution of each cluster group by sampling, if according to the probability distribution of cluster group
Obtained probability value is greater than preset probability threshold value, and node included by corresponding cluster group is carried out to the mark of risk subscribers.
7. risk subscribers identification device according to claim 6, which is characterized in that the knot vector acquiring unit, packet
It includes:
Initial knitmesh unit obtains initial graph for carrying out knitmesh according to the node of Claims Resolution data;
Weight sampling unit samples the initial graph for the sampling by Weight, obtains multiple sample sequences, will
Each sample sequence is as a knot vector.
8. risk subscribers identification device according to claim 6, which is characterized in that the cluster cell, comprising:
Prior distribution acquiring unit, for obtaining the Gaussian mixtures being made of the knot vector;
Finite partition unit obtains and node for obtaining the finite partition for meeting Di Li Cray process in Gaussian mixtures
The corresponding multiple cluster groups of vector.
9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor
The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 5 when executing the computer program
Any one of described in risk subscribers recognition methods.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence, the computer program make the processor execute such as wind described in any one of claim 1 to 5 when being executed by a processor
Dangerous user identification method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811527385.1A CN109801073A (en) | 2018-12-13 | 2018-12-13 | Risk subscribers recognition methods, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811527385.1A CN109801073A (en) | 2018-12-13 | 2018-12-13 | Risk subscribers recognition methods, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109801073A true CN109801073A (en) | 2019-05-24 |
Family
ID=66556635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811527385.1A Pending CN109801073A (en) | 2018-12-13 | 2018-12-13 | Risk subscribers recognition methods, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109801073A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428337A (en) * | 2019-06-14 | 2019-11-08 | 南京泛函智能技术研究院有限公司 | Vehicle insurance cheats recognition methods and the device of clique |
CN111694969A (en) * | 2020-06-18 | 2020-09-22 | 拉卡拉支付股份有限公司 | User identity identification method and device |
CN114066173A (en) * | 2021-10-26 | 2022-02-18 | 福建正孚软件有限公司 | Capital flow behavior analysis method and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100121792A1 (en) * | 2007-01-05 | 2010-05-13 | Qiong Yang | Directed Graph Embedding |
KR20110116563A (en) * | 2010-04-19 | 2011-10-26 | 목포대학교산학협력단 | Fuzzy clustering method using principal component analysis and markov chain monte carlos |
US20150012255A1 (en) * | 2013-07-03 | 2015-01-08 | International Business Machines Corporation | Clustering based continuous performance prediction and monitoring for semiconductor manufacturing processes using nonparametric bayesian models |
CN105426911A (en) * | 2015-11-13 | 2016-03-23 | 浙江大学 | Dirichlet process mixture model based TAC clustering method |
CN106355405A (en) * | 2015-07-14 | 2017-01-25 | 阿里巴巴集团控股有限公司 | Method and device for identifying risks and system for preventing and controlling same |
CN107273517A (en) * | 2017-06-21 | 2017-10-20 | 复旦大学 | Picture and text cross-module state search method based on the embedded study of figure |
CN108334897A (en) * | 2018-01-22 | 2018-07-27 | 上海海事大学 | A kind of floating marine object trajectory predictions method based on adaptive GMM |
CN108446273A (en) * | 2018-03-15 | 2018-08-24 | 哈工大机器人(合肥)国际创新研究院 | Kalman filtering term vector learning method based on Di's formula process |
CN108734479A (en) * | 2018-04-12 | 2018-11-02 | 阿里巴巴集团控股有限公司 | Data processing method, device, equipment and the server of Insurance Fraud identification |
-
2018
- 2018-12-13 CN CN201811527385.1A patent/CN109801073A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100121792A1 (en) * | 2007-01-05 | 2010-05-13 | Qiong Yang | Directed Graph Embedding |
KR20110116563A (en) * | 2010-04-19 | 2011-10-26 | 목포대학교산학협력단 | Fuzzy clustering method using principal component analysis and markov chain monte carlos |
US20150012255A1 (en) * | 2013-07-03 | 2015-01-08 | International Business Machines Corporation | Clustering based continuous performance prediction and monitoring for semiconductor manufacturing processes using nonparametric bayesian models |
CN106355405A (en) * | 2015-07-14 | 2017-01-25 | 阿里巴巴集团控股有限公司 | Method and device for identifying risks and system for preventing and controlling same |
CN105426911A (en) * | 2015-11-13 | 2016-03-23 | 浙江大学 | Dirichlet process mixture model based TAC clustering method |
CN107273517A (en) * | 2017-06-21 | 2017-10-20 | 复旦大学 | Picture and text cross-module state search method based on the embedded study of figure |
CN108334897A (en) * | 2018-01-22 | 2018-07-27 | 上海海事大学 | A kind of floating marine object trajectory predictions method based on adaptive GMM |
CN108446273A (en) * | 2018-03-15 | 2018-08-24 | 哈工大机器人(合肥)国际创新研究院 | Kalman filtering term vector learning method based on Di's formula process |
CN108734479A (en) * | 2018-04-12 | 2018-11-02 | 阿里巴巴集团控股有限公司 | Data processing method, device, equipment and the server of Insurance Fraud identification |
Non-Patent Citations (4)
Title |
---|
刘正铭;马宏;刘树新;杨奕卓;李星;: "一种融合节点文本属性信息的网络表示学习算法", 计算机工程, no. 11, 25 September 2018 (2018-09-25), pages 165 - 171 * |
周建英;王飞跃;曾大军;: "分层Dirichlet过程及其应用综述", 自动化学报, no. 04, pages 389 - 407 * |
张媛媛;: "一种基于非参数贝叶斯模型的聚类算法", 宁波大学学报(理工版), no. 04, pages 24 - 28 * |
高悦;王文贤;杨淑贤;: "一种基于狄利克雷过程混合模型的文本聚类算法", 信息网络安全, no. 11, pages 60 - 65 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428337A (en) * | 2019-06-14 | 2019-11-08 | 南京泛函智能技术研究院有限公司 | Vehicle insurance cheats recognition methods and the device of clique |
CN110428337B (en) * | 2019-06-14 | 2023-01-20 | 南京极谷人工智能有限公司 | Vehicle insurance fraud group partner identification method and device |
CN111694969A (en) * | 2020-06-18 | 2020-09-22 | 拉卡拉支付股份有限公司 | User identity identification method and device |
CN114066173A (en) * | 2021-10-26 | 2022-02-18 | 福建正孚软件有限公司 | Capital flow behavior analysis method and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Castro et al. | Likelihood based hierarchical clustering | |
Kimura et al. | Extracting influential nodes on a social network for information diffusion | |
CN109859054A (en) | Network community method for digging, device, computer equipment and storage medium | |
Ritter et al. | Pitfalls of normal-gamma stochastic frontier models | |
CN109598509A (en) | The recognition methods of risk clique and device | |
CN109801073A (en) | Risk subscribers recognition methods, device, computer equipment and storage medium | |
CN106469413B (en) | Data processing method and device for virtual resources | |
James | Bayesian Poisson calculus for latent feature modeling via generalized Indian buffet process priors | |
CN109857893A (en) | Picture retrieval method, device, computer equipment and storage medium | |
CN102135983A (en) | Group dividing method and device based on network user behavior | |
CN110298687B (en) | Regional attraction assessment method and device | |
CN108875761A (en) | A kind of method and device for expanding potential user | |
CN109272402A (en) | Modeling method, device, computer equipment and the storage medium of scorecard | |
Li et al. | Finding most popular indoor semantic locations using uncertain mobility data | |
CN108959516A (en) | Conversation message treating method and apparatus | |
CN110909222A (en) | User portrait establishing method, device, medium and electronic equipment based on clustering | |
CN109033148A (en) | One kind is towards polytypic unbalanced data preprocess method, device and equipment | |
CN110414569A (en) | Cluster realizing method and device | |
CN111984544B (en) | Device performance test method and device, electronic device and storage medium | |
CN112348079A (en) | Data dimension reduction processing method and device, computer equipment and storage medium | |
US10444062B2 (en) | Measuring and diagnosing noise in an urban environment | |
CN111612641A (en) | Method for identifying influential user in social network | |
Rauh et al. | A fast weighted median algorithm based on quickselect | |
CN110348717A (en) | Base station value methods of marking and device based on grid granularity | |
CN108536695A (en) | A kind of polymerization and device of geographical location information point |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |