CN109801073A - Risk subscribers recognition methods, device, computer equipment and storage medium - Google Patents

Risk subscribers recognition methods, device, computer equipment and storage medium Download PDF

Info

Publication number
CN109801073A
CN109801073A CN201811527385.1A CN201811527385A CN109801073A CN 109801073 A CN109801073 A CN 109801073A CN 201811527385 A CN201811527385 A CN 201811527385A CN 109801073 A CN109801073 A CN 109801073A
Authority
CN
China
Prior art keywords
node
knot vector
risk subscribers
cluster
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811527385.1A
Other languages
Chinese (zh)
Inventor
刘波
唐文
林瑜
侯明远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN201811527385.1A priority Critical patent/CN109801073A/en
Publication of CN109801073A publication Critical patent/CN109801073A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses risk subscribers recognition methods, device, computer equipment and storage mediums.This method comprises: obtaining the corresponding node of Claims Resolution data, being embedded in by figure by Node is knot vector;The knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups;The probability distribution of each cluster group is obtained by sampling, if being greater than preset probability threshold value according to the probability value that the probability distribution of cluster group obtains, node included by corresponding cluster group is carried out to the mark of risk subscribers.This method makes parameter setting more reasonable by machine learning model, and improves the accuracy of risk client prediction.

Description

Risk subscribers recognition methods, device, computer equipment and storage medium
Technical field
The present invention relates to fraud identification technology fields more particularly to a kind of risk subscribers recognition methods, device, computer to set Standby and storage medium.
Background technique
Currently, traditional user's fraud scoring method is scored using rule in the anti-fraud analysis based on social networks, The setting of middle rule setting and regular score value be summed up based on business experience come.After user data is made into network, network meeting A certain number of rules of triggering simultaneously calculate the regular total score of triggering, use the insurance fraud risk of total score assessment user.But it is existing Some methods have the disadvantage that
1) rule feature is relatively simple, cannot cover recessive fraud mode;
2) regular score value by virtue of experience provides, inaccurate;
3) traditional statistical method does not have applicating history data learning parameter, and error is larger.
Summary of the invention
The embodiment of the invention provides a kind of risk subscribers recognition methods, device, computer equipment and storage mediums, it is intended to It solves user's fraud scoring method in the prior art to score using rule, wherein the setting of rule setting and regular score value is to be based on Business experience, which sums up, to be come, and recessive fraud mode, and the problem that calculated result accuracy is not high cannot be covered.
In a first aspect, the embodiment of the invention provides a kind of risk subscribers recognition methods comprising:
The corresponding node of Claims Resolution data is obtained, being embedded in by figure by Node is knot vector;
The knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups;
The probability distribution that each cluster group is obtained by sampling, if big according to the probability value that the probability distribution of cluster group obtains In preset probability threshold value, node included by corresponding cluster group is carried out to the mark of risk subscribers.
Second aspect, the embodiment of the invention provides a kind of risk subscribers identification devices comprising:
Knot vector acquiring unit, for obtaining the corresponding node of Claims Resolution data, being embedded in by figure by Node is section Point vector;
Cluster cell obtains multiple for being clustered by Bayes's nonparametric mixed model to the knot vector Cluster group;
Risk identification unit, for obtaining the probability distribution of each cluster group by sampling, if according to the probability of cluster group It is distributed obtained probability value and is greater than preset probability threshold value, node included by corresponding cluster group is carried out to the mark of risk subscribers Know.
The third aspect, the embodiment of the present invention provide a kind of computer equipment again comprising memory, processor and storage On the memory and the computer program that can run on the processor, the processor execute the computer program Risk subscribers recognition methods described in the above-mentioned first aspect of Shi Shixian.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, wherein the computer can It reads storage medium and is stored with computer program, it is above-mentioned that the computer program when being executed by a processor executes the processor Risk subscribers recognition methods described in first aspect.
The embodiment of the invention provides a kind of risk subscribers recognition methods, device, computer equipment and storage mediums.The party Method includes obtaining the corresponding node of Claims Resolution data, and being embedded in by figure by Node is knot vector;Pass through Bayes's nonparametric Mixed model clusters the knot vector, obtains multiple cluster groups;The probability point of each cluster group is obtained by sampling Cloth will be included by corresponding cluster group if being greater than preset probability threshold value according to the probability value that the probability distribution of cluster group obtains Node carry out risk subscribers mark.This method makes parameter setting more reasonable by machine learning model, and improves The accuracy of risk client prediction.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow diagram of risk subscribers recognition methods provided in an embodiment of the present invention;
Fig. 2 is the sub-process schematic diagram of risk subscribers recognition methods provided in an embodiment of the present invention;
Fig. 3 is another sub-process schematic diagram of risk subscribers recognition methods provided in an embodiment of the present invention;
Fig. 4 is another sub-process schematic diagram of risk subscribers recognition methods provided in an embodiment of the present invention;
Fig. 5 is the schematic block diagram of risk subscribers identification device provided in an embodiment of the present invention;
Fig. 6 is the subelement schematic block diagram of risk subscribers identification device provided in an embodiment of the present invention;
Fig. 7 is another subelement schematic block diagram of risk subscribers identification device provided in an embodiment of the present invention;
Fig. 8 is another subelement schematic block diagram of risk subscribers identification device provided in an embodiment of the present invention;
Fig. 9 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
Referring to Fig. 1, Fig. 1 is the flow diagram of risk subscribers recognition methods provided in an embodiment of the present invention, the risk User identification method is applied in intelligent terminal, and this method is executed by the application software being installed in intelligent terminal.
As shown in Figure 1, the method comprising the steps of S110~S130.
S110, the corresponding node of Claims Resolution data is obtained, being embedded in by figure by Node is knot vector.
In the present embodiment, when enterprise obtain magnanimity case data (case data can be understood as Claims Resolution data.Example If the case data under vehicle insurance Claims Resolution scene include driver, reporter, beneficiary and the wounded and repair shop, phone number, inspection Repair the data such as place, GPS information) when, in order to more easily carry out the identification of risk subscribers, section is converted by Claims Resolution data correspondence Point carries out data mining and behavioural analysis by way of figure.
Wherein, it since each data impossible in case data are converted to a node, therefore can selectively select A portion data are as master data and corresponding generation node, and remaining data are then used as master data in above-mentioned generation node Attribute data.Such as reporter is as master data, the telephone number of reporter, identification card number are as its attribute data.
Being embedded in (Graph Embedding) method for Node by figure again later is knot vector, is first to figure It is sampled, the sequence construct model then drawn by sampling.
In one embodiment, as shown in Fig. 2, step S110 includes:
S111, knitmesh is carried out according to the node of Claims Resolution data, obtains initial graph;
S112, the initial graph is sampled by the sampling of Weight, obtains multiple sample sequences, by each sampling Sequence is as a knot vector.
In the present embodiment, knitmesh is carried out according to the node of Claims Resolution data, what is obtained is connected by the connection side of Weight The initial graph connect.By the sampling of Weight, (sampling of Weight specifically such as weighted walk), obtains multiple samplings later Sequence, using each sample sequence as a knot vector.By the sampling of Weight, can effectively excavate in initial graph Existing potential relationship between indirect connection node.
In one embodiment, as shown in figure 3, step S111 includes:
S1111, using the node for data of settling a claim as the start node of figure;
If there are data correlations between S1112, start node, by corresponding node by the connection of connection side, and pass through number According to the corresponding weighted value for obtaining connection side of associated weighted value, with obtain include weight connection side initial graph.
Wherein, the connection side between node and node represents to exist between node and contact, and the weighted value for connecting side indicates section Associated weighted value between point.Pass through
Using Weight the method for sampling when, make sampling as far as possible toward popular node direction migration, such as with the presence of a figure four This four nodes of a node, respectively A, B, C, D, the weighted value that side is connected between A and B connect the power on side between 0.1, A and C Weight values be between 0.7, B and C connect side weighted value connected for 0.4, C and D between side weighted value be 0.8.Assuming that migration 2 Step, from node A, when taking next neighbor node at random, if it is random walk algorithm (i.e. walk random algorithm), Can equiprobable migration but node C can be taken with 7/8 probability to B or C node, then with 8/12 probability migration to node D, It can finally be produced in maximum probability very much and carry out a sequence (A, C, D), for original graph, node A and node D are no associated, but It is the sampling by Weight, can effectively excavates the relationship of egress A and node D.
S120, the knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups.
In the present embodiment, Bayes's nonparametric mixed model is a kind of Bayes being defined on infinite dimension parameter space The size of model, Bayes's nonparametric mixed model can the change of adaptive model with increasing or reducing for data in model Change, can determine model according to how much selection parameters of data.Wherein, common Bayes's nonparametric mixed model is Di Li Cray process mixed model.
Specifically, Di Li Cray process mixed model is defined as follows:
There is one group of data y1, y2... ..., yn, they are independent from each other and from some unknown distributions.yiIt can be with It is polynary, yiValue can be real number, be also possible to classification type.Assuming that yiThese observations are from a mixing point Cloth, form are F (θ), and parameter θ is from Di Li Cray process (Dirichlet Process, DP) G, the i.e. parameter of the distribution Priori be a Di Li Cray process, lumped parameter α, base distribution is G0.So, Di Li Cray process mixed model is such as Under:
yii~F (θi)
θi| G~G
G~DP (G0, α)
Not needing specified categorical measure by Di Li Cray process can infer that categorical measure, classics are realized from data Process can refer to Chinese-style restaurant's process (English abbreviation of Chinese-style restaurant's process is CRP).Assume that there is unlimited table in only one China restaurant Son, first customer are sitting on first desk after arriving.Second customer comes can choose and be sitting on first desk, Also it can choose and be sitting on a new desk, it is assumed that when (n+1)th customer arrives, having had on k desks has customer , n has been sat respectively1, n2... ..., nkA customer, then (n+1)th customer can be n with probabilityi/ (α+n) is sitting in i-th table On son, wherein niFor the Number of Customers on i-th desk;Having probability simultaneously is that α/(α+n) one new desk of selection is sat down.So After n customer settles down, it is clear that this n customer has been divided into K heap, i.e. K cluster by CRP, can prove that CRP is exactly one Di Li Cray process.
In one embodiment, as shown in figure 4, step S120 includes:
The Gaussian mixtures that S121, acquisition are made of the knot vector;
S122, the finite partition for meeting Di Li Cray process in Gaussian mixtures is obtained, obtained corresponding with knot vector Multiple cluster groups.
In the present embodiment, the Gaussian mixtures for obtaining knot vector composition are to obtain F (θ), when obtaining After this prior distribution (i.e. Gaussian mixtures) of F (θ), if any one denumerable division can be found, (denumerable division is denoted as T1, T2... ..., Tk, denumerable division also is understood as finite partition) make:
G(T1) ... ..., G (Tk)~Dir (α G0(T1) ... ..., α G0(Tk));
So, finite partition T1, T2... ..., TkIt can be considered multiple cluster groups such as corresponding with knot vector.Pass through Di Sharp Cray process mixed model can quickly cluster multiple knot vectors to realize grouping, convenient for more efficiently excavating Connection between knot vector.
S130, the probability distribution that each cluster group is obtained by sampling, if being obtained according to the probability distribution of cluster group general Rate value is greater than preset probability threshold value, and node included by corresponding cluster group is carried out to the mark of risk subscribers.
In the present embodiment, since each cluster group can obtain corresponding probability distribution (such as Poisson point by sampling Cloth), after the probability distribution for obtaining each Probability Group, the probability value for calculating each probability distribution can be corresponded to.If it exists certain The probability distribution of one or more cluster group corresponds to probability value greater than the probability threshold value, indicates that there are risks in corresponding cluster group The Probability maximum of user needs that the mark that node included by cluster group carries out risk subscribers will be corresponded at this time.
In one embodiment, the probability distribution of each cluster group is obtained in step S130 by sampling, comprising:
The probability distribution of each cluster group is obtained by Gibbs model.
In statistics and statistical physics, gibbs sampling (gibbs sampling is Gibbs model) is that Markov chain is covered It is used to obtain in special karr theoretical (Markov Chain Monte Carlo, abbreviation MCMC) and a series of is approximately equal to specified multidimensional The algorithm of probability distribution (such as joint probability distribution of 2 or multiple variables immediately) observation sample.Wherein Markov chain It is the set of one group of event, in this set, event occurs one by one, and the generation of next event, only It is determined by the event currently occurred.The probability distribution of each cluster group can be obtained, quickly by Gibbs model so as to count The probability value of cluster group is calculated to carry out the accurate identification of risk client.
This method crosses machine learning model and makes parameter setting more reasonable, and improves the accurate of risk client prediction Degree.
The embodiment of the present invention also provides a kind of risk subscribers identification device, and the risk subscribers identification device is aforementioned for executing Any embodiment of risk subscribers recognition methods.Specifically, referring to Fig. 5, Fig. 5 is risk subscribers provided in an embodiment of the present invention The schematic block diagram of identification device.The risk subscribers identification device 100 can be configured in intelligent terminal.
As shown in figure 5, risk subscribers identification device 100 includes knot vector acquiring unit 110, cluster cell 120, risk Recognition unit 130.
Knot vector acquiring unit 110 is used to obtain the corresponding node of Claims Resolution data, is by Node by figure insertion Knot vector.
In the present embodiment, when enterprise obtain magnanimity case data (case data can be understood as Claims Resolution data.Example If the case data under vehicle insurance Claims Resolution scene include driver, reporter, beneficiary and the wounded and repair shop, phone number, inspection Repair the data such as place, GPS information) when, in order to more easily carry out the identification of risk subscribers, section is converted by Claims Resolution data correspondence Point carries out data mining and behavioural analysis by way of figure.
Wherein, it since each data impossible in case data are converted to a node, therefore can selectively select A portion data are as master data and corresponding generation node, and remaining data are then used as master data in above-mentioned generation node Attribute data.Such as reporter is as master data, the telephone number of reporter, identification card number are as its attribute data.
Being embedded in (Graph Embedding) method for Node by figure again later is knot vector, is first to figure It is sampled, the sequence construct model then drawn by sampling.
In one embodiment, as shown in fig. 6, knot vector acquiring unit 110 includes:
Initial knitmesh unit 111 obtains initial graph for carrying out knitmesh according to the node of Claims Resolution data;
Weight sampling unit 112 samples the initial graph for the sampling by Weight, obtains multiple samplings Sequence, using each sample sequence as a knot vector.
In the present embodiment, knitmesh is carried out according to the node of Claims Resolution data, what is obtained is connected by the connection side of Weight The initial graph connect.By the sampling of Weight, (sampling of Weight specifically such as weighted walk), obtains multiple samplings later Sequence, using each sample sequence as a knot vector.By the sampling of Weight, can effectively excavate in initial graph Existing potential relationship between indirect connection node.
In one embodiment, as shown in fig. 7, initial knitmesh unit 111 includes:
Node builds unit 1111, start node of the node for the data that will settle a claim as figure;
Node connection unit 1112, if corresponding node is passed through connection for there are data correlations between start node Side connection, and by the corresponding weighted value for obtaining connection side of the weighted value of data correlation, it include weight connection side to obtain Initial graph.
Wherein, the connection side between node and node represents to exist between node and contact, and the weighted value for connecting side indicates section Associated weighted value between point.Pass through
Using Weight the method for sampling when, make sampling as far as possible toward popular node direction migration, such as with the presence of a figure four This four nodes of a node, respectively A, B, C, D, the weighted value that side is connected between A and B connect the power on side between 0.1, A and C Weight values be between 0.7, B and C connect side weighted value connected for 0.4, C and D between side weighted value be 0.8.Assuming that migration 2 Step, from node A, when taking next neighbor node at random, if it is random walk algorithm (i.e. walk random algorithm), Can equiprobable migration but node C can be taken with 7/8 probability to B or C node, then with 8/12 probability migration to node D, It can finally be produced in maximum probability very much and carry out a sequence (A, C, D), for original graph, node A and node D are no associated, but It is the sampling by Weight, can effectively excavates the relationship of egress A and node D.
Cluster cell 120 obtains more for being clustered by Bayes's nonparametric mixed model to the knot vector A cluster group.
In the present embodiment, Bayes's nonparametric mixed model is a kind of Bayes being defined on infinite dimension parameter space The size of model, Bayes's nonparametric mixed model can the change of adaptive model with increasing or reducing for data in model Change, can determine model according to how much selection parameters of data.Wherein, common Bayes's nonparametric mixed model is Di Li Cray process mixed model.
Specifically, Di Li Cray process mixed model is defined as follows:
There is one group of data y1, y2... ..., yn, they are independent from each other and from some unknown distributions.yiIt can be with It is polynary, yiValue can be real number, be also possible to classification type.Assuming that yiThese observations are from a mixing point Cloth, form are F (θ), and parameter θ is from Di Li Cray process (Dirichlet Process, DP) G, the i.e. parameter of the distribution Priori be a Di Li Cray process, lumped parameter α, base distribution is G0.So, Di Li Cray process mixed model is such as Under:
yii~F (θi)
θi| G~G
G~DP (G0, α)
Not needing specified categorical measure by Di Li Cray process can infer that categorical measure, classics are realized from data Process can refer to Chinese-style restaurant's process (English abbreviation of Chinese-style restaurant's process is CRP).
In one embodiment, as shown in figure 8, cluster cell 120 includes:
Prior distribution acquiring unit 121, for obtaining the Gaussian mixtures being made of the knot vector;
Finite partition unit 122 is obtained for obtaining the finite partition for meeting Di Li Cray process in Gaussian mixtures Multiple cluster groups corresponding with knot vector.
In the present embodiment, the Gaussian mixtures for obtaining knot vector composition are to obtain F (θ), when obtaining After this prior distribution (i.e. Gaussian mixtures) of F (θ), if any one denumerable division can be found, (denumerable division is denoted as T1, T2... ..., Tk, denumerable division also is understood as finite partition) make:
G(T1) ... ..., G (Tk)~Dir (α G0(T1) ... ..., α G0(Tk));
So, finite partition T1, T2... ..., TkIt can be considered multiple cluster groups such as corresponding with knot vector.Pass through Di Sharp Cray process mixed model can quickly cluster multiple knot vectors to realize grouping, convenient for more efficiently excavating Connection between knot vector.
Risk identification unit 130, for obtaining the probability distribution of each cluster group by sampling, if according to the general of cluster group The probability value that rate is distributed is greater than preset probability threshold value, and node included by corresponding cluster group is carried out risk subscribers Mark.
In the present embodiment, since each cluster group can obtain corresponding probability distribution (such as Poisson point by sampling Cloth), after the probability distribution for obtaining each Probability Group, the probability value for calculating each probability distribution can be corresponded to.If it exists certain The probability distribution of one or more cluster group corresponds to probability value greater than the probability threshold value, indicates that there are risks in corresponding cluster group The Probability maximum of user needs that the mark that node included by cluster group carries out risk subscribers will be corresponded at this time.
In one embodiment, risk identification unit 130 is also used to:
The probability distribution of each cluster group is obtained by Gibbs model.
In statistics and statistical physics, gibbs sampling (gibbs sampling is Gibbs model) is that Markov chain is covered It is used to obtain in special karr theoretical (Markov Chain Monte Carlo, abbreviation MCMC) and a series of is approximately equal to specified multidimensional The algorithm of probability distribution (such as joint probability distribution of 2 or multiple variables immediately) observation sample.Wherein Markov chain It is the set of one group of event, in this set, event occurs one by one, and the generation of next event, only It is determined by the event currently occurred.The probability distribution of each cluster group can be obtained, quickly by Gibbs model so as to count The probability value of cluster group is calculated to carry out the accurate identification of risk client.
The device makes parameter setting more reasonable by machine learning model, and improves the accurate of risk client prediction Degree.
Above-mentioned risk subscribers identification device can be implemented as the form of computer program, which can such as scheme It is run in computer equipment shown in 9.
Referring to Fig. 9, Fig. 9 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.
Refering to Fig. 9, which includes processor 502, memory and the net connected by system bus 501 Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.
The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program 5032 are performed, and processor 502 may make to execute risk subscribers recognition methods.
The processor 502 supports the operation of entire computer equipment 500 for providing calculating and control ability.
The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, should When computer program 5032 is executed by processor 502, processor 502 may make to execute risk subscribers recognition methods.
The network interface 505 is for carrying out network communication, such as the transmission of offer data information.Those skilled in the art can To understand, structure shown in Fig. 9, only the block diagram of part-structure relevant to the present invention program, is not constituted to this hair The restriction for the computer equipment 500 that bright scheme is applied thereon, specific computer equipment 500 may include than as shown in the figure More or fewer components perhaps combine certain components or with different component layouts.
Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize following function Can: the corresponding node of Claims Resolution data is obtained, being embedded in by figure by Node is knot vector;It is mixed by Bayes's nonparametric Model clusters the knot vector, obtains multiple cluster groups;The probability distribution of each cluster group is obtained by sampling, if It is greater than preset probability threshold value according to the probability value that the probability distribution of cluster group obtains, by node included by corresponding cluster group Carry out the mark of risk subscribers.
In one embodiment, processor 502 is the step of executing described by figure insertion is knot vector by Node When, it performs the following operations: knitmesh being carried out according to the node of Claims Resolution data, obtains initial graph;By the sampling of Weight to described Initial graph is sampled, and multiple sample sequences are obtained, using each sample sequence as a knot vector.
In one embodiment, processor 502 is executing the node progress knitmesh according to Claims Resolution data, obtains initial graph Step when, perform the following operations: using the node for data of settling a claim as the start node of figure;If there are data between start node Association, by corresponding node by the connection of connection side, and the corresponding weighted value for obtaining connection side of weighted value for passing through data correlation, With obtain include weight connection side initial graph.
In one embodiment, processor 502 execute it is described by Bayes's nonparametric mixed model to the node to Amount is clustered, and when obtaining the step of multiple cluster groups, performs the following operations: it is high to obtain the mixing being made of the knot vector This distribution;The finite partition for meeting Di Li Cray process in Gaussian mixtures is obtained, is obtained corresponding with knot vector multiple Cluster group.
In one embodiment, processor 502 is in the step for executing the probability distribution for obtaining each cluster group by sampling It when rapid, performs the following operations: obtaining the probability distribution of each cluster group by Gibbs model.
It will be understood by those skilled in the art that the embodiment of computer equipment shown in Fig. 9 is not constituted to computer The restriction of equipment specific composition, in other embodiments, computer equipment may include components more more or fewer than diagram, or Person combines certain components or different component layouts.For example, in some embodiments, computer equipment can only include depositing Reservoir and processor, in such embodiments, the structure and function of memory and processor are consistent with embodiment illustrated in fig. 9, Details are not described herein.
It should be appreciated that in embodiments of the present invention, processor 502 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic Device, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or Person's processor is also possible to any conventional processor etc..
Computer readable storage medium is provided in another embodiment of the invention.The computer readable storage medium can be with For non-volatile computer readable storage medium.The computer-readable recording medium storage has computer program, wherein calculating Machine program performs the steps of acquisition Claims Resolution data corresponding node when being executed by processor, be embedded in by figure by Node For knot vector;The knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups;Pass through Sampling obtains the probability distribution of each cluster group, if being greater than preset probability according to the probability value that the probability distribution of cluster group obtains Node included by corresponding cluster group is carried out the mark of risk subscribers by threshold value.
In one embodiment, described be embedded in by figure by Node is knot vector, comprising: according to the section of Claims Resolution data Point carries out knitmesh, obtains initial graph;The initial graph is sampled by the sampling of Weight, obtains multiple sample sequences, Using each sample sequence as a knot vector.
In one embodiment, the node according to Claims Resolution data carries out knitmesh, obtains initial graph, comprising: by number of settling a claim According to start node of the node as figure;If there are data correlations between start node, corresponding node is connected by connection side It connects, and includes the initial of weight connection side to obtain by the corresponding weighted value for obtaining connection side of the weighted value of data correlation Figure.
In one embodiment, described that the knot vector is clustered by Bayes's nonparametric mixed model, it obtains Multiple cluster groups, comprising: obtain the Gaussian mixtures being made of the knot vector;It obtains in Gaussian mixtures and meets Di The finite partition of sharp Cray process obtains multiple cluster groups corresponding with knot vector.
In one embodiment, the probability distribution that each cluster group is obtained by sampling, comprising: pass through Gibbs model Obtain the probability distribution of each cluster group.
It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is set The specific work process of standby, device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein. Those of ordinary skill in the art may be aware that unit described in conjunction with the examples disclosed in the embodiments of the present disclosure and algorithm Step can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and software Interchangeability generally describes each exemplary composition and step according to function in the above description.These functions are studied carefully Unexpectedly the specific application and design constraint depending on technical solution are implemented in hardware or software.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.
In several embodiments provided by the present invention, it should be understood that disclosed unit and method, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only logical function partition, there may be another division manner in actual implementation, can also will be with the same function Unit set is at a unit, such as multiple units or components can be combined or can be integrated into another system or some Feature can be ignored, or not execute.In addition, shown or discussed mutual coupling, direct-coupling or communication connection can Be through some interfaces, the indirect coupling or communication connection of device or unit, be also possible to electricity, mechanical or other shapes Formula connection.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of unit therein can be selected to realize the embodiment of the present invention according to the actual needs Purpose.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existing The all or part of part or the technical solution that technology contributes can be embodied in the form of software products, should Computer software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be Personal computer, server or network equipment etc.) execute all or part of step of each embodiment the method for the present invention Suddenly.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk or The various media that can store program code such as person's CD.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims (10)

1. a kind of risk subscribers recognition methods characterized by comprising
The corresponding node of Claims Resolution data is obtained, being embedded in by figure by Node is knot vector;
The knot vector is clustered by Bayes's nonparametric mixed model, obtains multiple cluster groups;
The probability distribution of each cluster group is obtained by sampling, if being greater than according to the probability value that the probability distribution of cluster group obtains pre- If probability threshold value, by it is corresponding cluster group included by node carry out risk subscribers mark.
2. risk subscribers recognition methods according to claim 1, which is characterized in that described to be embedded in by figure by Node For knot vector, comprising:
Knitmesh is carried out according to the node of Claims Resolution data, obtains initial graph;
The initial graph is sampled by the sampling of Weight, obtains multiple sample sequences, using each sample sequence as One knot vector.
3. risk subscribers recognition methods according to claim 2, which is characterized in that it is described according to Claims Resolution data node into Row knitmesh, obtains initial graph, comprising:
Using the node for data of settling a claim as the start node of figure;
If there are data correlations between start node, corresponding node is connected by connection side, and the power for passing through data correlation The corresponding weighted value for obtaining connection side of weight values, with obtain include weight connection side initial graph.
4. risk subscribers recognition methods according to claim 1, which is characterized in that described to be mixed by Bayes's nonparametric Model clusters the knot vector, obtains multiple cluster groups, comprising:
Obtain the Gaussian mixtures being made of the knot vector;
The finite partition for meeting Di Li Cray process in Gaussian mixtures is obtained, multiple clusters corresponding with knot vector are obtained Group.
5. risk subscribers recognition methods according to claim 1, which is characterized in that described to obtain each cluster by sampling The probability distribution of group, comprising:
The probability distribution of each cluster group is obtained by Gibbs model.
6. a kind of risk subscribers identification device characterized by comprising
Knot vector acquiring unit, for obtaining the corresponding node of Claims Resolution data, by figure insertion by Node be node to Amount;
Cluster cell obtains multiple clusters for clustering by Bayes's nonparametric mixed model to the knot vector Group;
Risk identification unit, for obtaining the probability distribution of each cluster group by sampling, if according to the probability distribution of cluster group Obtained probability value is greater than preset probability threshold value, and node included by corresponding cluster group is carried out to the mark of risk subscribers.
7. risk subscribers identification device according to claim 6, which is characterized in that the knot vector acquiring unit, packet It includes:
Initial knitmesh unit obtains initial graph for carrying out knitmesh according to the node of Claims Resolution data;
Weight sampling unit samples the initial graph for the sampling by Weight, obtains multiple sample sequences, will Each sample sequence is as a knot vector.
8. risk subscribers identification device according to claim 6, which is characterized in that the cluster cell, comprising:
Prior distribution acquiring unit, for obtaining the Gaussian mixtures being made of the knot vector;
Finite partition unit obtains and node for obtaining the finite partition for meeting Di Li Cray process in Gaussian mixtures The corresponding multiple cluster groups of vector.
9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 5 when executing the computer program Any one of described in risk subscribers recognition methods.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence, the computer program make the processor execute such as wind described in any one of claim 1 to 5 when being executed by a processor Dangerous user identification method.
CN201811527385.1A 2018-12-13 2018-12-13 Risk subscribers recognition methods, device, computer equipment and storage medium Pending CN109801073A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811527385.1A CN109801073A (en) 2018-12-13 2018-12-13 Risk subscribers recognition methods, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811527385.1A CN109801073A (en) 2018-12-13 2018-12-13 Risk subscribers recognition methods, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN109801073A true CN109801073A (en) 2019-05-24

Family

ID=66556635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811527385.1A Pending CN109801073A (en) 2018-12-13 2018-12-13 Risk subscribers recognition methods, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109801073A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428337A (en) * 2019-06-14 2019-11-08 南京泛函智能技术研究院有限公司 Vehicle insurance cheats recognition methods and the device of clique
CN111694969A (en) * 2020-06-18 2020-09-22 拉卡拉支付股份有限公司 User identity identification method and device
CN114066173A (en) * 2021-10-26 2022-02-18 福建正孚软件有限公司 Capital flow behavior analysis method and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100121792A1 (en) * 2007-01-05 2010-05-13 Qiong Yang Directed Graph Embedding
KR20110116563A (en) * 2010-04-19 2011-10-26 목포대학교산학협력단 Fuzzy clustering method using principal component analysis and markov chain monte carlos
US20150012255A1 (en) * 2013-07-03 2015-01-08 International Business Machines Corporation Clustering based continuous performance prediction and monitoring for semiconductor manufacturing processes using nonparametric bayesian models
CN105426911A (en) * 2015-11-13 2016-03-23 浙江大学 Dirichlet process mixture model based TAC clustering method
CN106355405A (en) * 2015-07-14 2017-01-25 阿里巴巴集团控股有限公司 Method and device for identifying risks and system for preventing and controlling same
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN108334897A (en) * 2018-01-22 2018-07-27 上海海事大学 A kind of floating marine object trajectory predictions method based on adaptive GMM
CN108446273A (en) * 2018-03-15 2018-08-24 哈工大机器人(合肥)国际创新研究院 Kalman filtering term vector learning method based on Di's formula process
CN108734479A (en) * 2018-04-12 2018-11-02 阿里巴巴集团控股有限公司 Data processing method, device, equipment and the server of Insurance Fraud identification

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100121792A1 (en) * 2007-01-05 2010-05-13 Qiong Yang Directed Graph Embedding
KR20110116563A (en) * 2010-04-19 2011-10-26 목포대학교산학협력단 Fuzzy clustering method using principal component analysis and markov chain monte carlos
US20150012255A1 (en) * 2013-07-03 2015-01-08 International Business Machines Corporation Clustering based continuous performance prediction and monitoring for semiconductor manufacturing processes using nonparametric bayesian models
CN106355405A (en) * 2015-07-14 2017-01-25 阿里巴巴集团控股有限公司 Method and device for identifying risks and system for preventing and controlling same
CN105426911A (en) * 2015-11-13 2016-03-23 浙江大学 Dirichlet process mixture model based TAC clustering method
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN108334897A (en) * 2018-01-22 2018-07-27 上海海事大学 A kind of floating marine object trajectory predictions method based on adaptive GMM
CN108446273A (en) * 2018-03-15 2018-08-24 哈工大机器人(合肥)国际创新研究院 Kalman filtering term vector learning method based on Di's formula process
CN108734479A (en) * 2018-04-12 2018-11-02 阿里巴巴集团控股有限公司 Data processing method, device, equipment and the server of Insurance Fraud identification

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘正铭;马宏;刘树新;杨奕卓;李星;: "一种融合节点文本属性信息的网络表示学习算法", 计算机工程, no. 11, 25 September 2018 (2018-09-25), pages 165 - 171 *
周建英;王飞跃;曾大军;: "分层Dirichlet过程及其应用综述", 自动化学报, no. 04, pages 389 - 407 *
张媛媛;: "一种基于非参数贝叶斯模型的聚类算法", 宁波大学学报(理工版), no. 04, pages 24 - 28 *
高悦;王文贤;杨淑贤;: "一种基于狄利克雷过程混合模型的文本聚类算法", 信息网络安全, no. 11, pages 60 - 65 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428337A (en) * 2019-06-14 2019-11-08 南京泛函智能技术研究院有限公司 Vehicle insurance cheats recognition methods and the device of clique
CN110428337B (en) * 2019-06-14 2023-01-20 南京极谷人工智能有限公司 Vehicle insurance fraud group partner identification method and device
CN111694969A (en) * 2020-06-18 2020-09-22 拉卡拉支付股份有限公司 User identity identification method and device
CN114066173A (en) * 2021-10-26 2022-02-18 福建正孚软件有限公司 Capital flow behavior analysis method and storage medium

Similar Documents

Publication Publication Date Title
Castro et al. Likelihood based hierarchical clustering
Kimura et al. Extracting influential nodes on a social network for information diffusion
CN109859054A (en) Network community method for digging, device, computer equipment and storage medium
Ritter et al. Pitfalls of normal-gamma stochastic frontier models
CN109598509A (en) The recognition methods of risk clique and device
CN109801073A (en) Risk subscribers recognition methods, device, computer equipment and storage medium
CN106469413B (en) Data processing method and device for virtual resources
James Bayesian Poisson calculus for latent feature modeling via generalized Indian buffet process priors
CN109857893A (en) Picture retrieval method, device, computer equipment and storage medium
CN102135983A (en) Group dividing method and device based on network user behavior
CN110298687B (en) Regional attraction assessment method and device
CN108875761A (en) A kind of method and device for expanding potential user
CN109272402A (en) Modeling method, device, computer equipment and the storage medium of scorecard
Li et al. Finding most popular indoor semantic locations using uncertain mobility data
CN108959516A (en) Conversation message treating method and apparatus
CN110909222A (en) User portrait establishing method, device, medium and electronic equipment based on clustering
CN109033148A (en) One kind is towards polytypic unbalanced data preprocess method, device and equipment
CN110414569A (en) Cluster realizing method and device
CN111984544B (en) Device performance test method and device, electronic device and storage medium
CN112348079A (en) Data dimension reduction processing method and device, computer equipment and storage medium
US10444062B2 (en) Measuring and diagnosing noise in an urban environment
CN111612641A (en) Method for identifying influential user in social network
Rauh et al. A fast weighted median algorithm based on quickselect
CN110348717A (en) Base station value methods of marking and device based on grid granularity
CN108536695A (en) A kind of polymerization and device of geographical location information point

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination