CN109784636A

CN109784636A - Fraudulent user recognition methods, device, computer equipment and storage medium

Info

Publication number: CN109784636A
Application number: CN201811527398.9A
Authority: CN
Inventors: 唐文; 张密; 卢宁; 马建明
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2019-05-21

Abstract

The invention discloses fraudulent user recognition methods, device, computer equipment and storage mediums.This method comprises: acquired node corresponding with Claims Resolution data is carried out data cleansing, cleaning posterior nodal point is obtained；By spectral clustering by the cleaning posterior nodal point parallel patition be multiple subgraphs；Multiple subgraphs are clustered respectively, obtain include multiple clustering clusters network community；According to the node label of the initial setting up in the network community, is propagated as label and obtain destination node corresponding to the network community medium or high risk user tag；If there is the identical feature vector of target feature vector corresponding with the destination node in the feature vector of the network community, obtains corresponding network community and carry out the mark of fraud corporations.This method reduces network size by clustering algorithm slicing network, optimizes network structure, and improve the accuracy of risk identification, precise positioning fraudulent user and corporations.

Description

Fraudulent user recognition methods, device, computer equipment and storage medium

Technical field

The present invention relates to fraudulent user identification technology field more particularly to a kind of fraudulent user recognition methods, device, calculating Machine equipment and storage medium.

Background technique

Currently, social activity analysis starts the business scenario for being applied to consumer's risk analysis during the Claims Resolution of insurance industry In, data are mostly manual entry, and there are abnormal datas, and service logic is complex, and the analysis of single algorithm is clearly present Following deficiency:

1) there are larger networks, and since the quantity that network is matched to greatly rule is more, risk score is extremely high；

2) lack risk subscribers energy transmission model, risk co-agulation analy cannot be covered；

3) lack the overall situation to consider, Manual definition's feature does not include network structure information.

Summary of the invention

The embodiment of the invention provides a kind of fraudulent user recognition methods, device, computer equipment and storage mediums, it is intended to Social analysis in the prior art is solved to be applied in the business scenario of consumer's risk analysis using single parser, it is big because existing It is inaccurate that scale network leads to risk evaluation result, and due to a lack of risk subscribers energy transmission, cannot cover risk co-agulation analy The problem of.

In a first aspect, the embodiment of the invention provides a kind of fraudulent user recognition methods comprising:

Acquired node corresponding with Claims Resolution data is subjected to data cleansing, obtains cleaning posterior nodal point；

By spectral clustering by the cleaning posterior nodal point parallel patition be multiple subgraphs；

Multiple subgraphs are clustered respectively, obtain include multiple clustering clusters network community；

According to the node label of the initial setting up in the network community, is propagated by label and obtain height in the network community Destination node corresponding to risk subscribers label；Wherein, it is at least wrapped in the node label of initial setting up in the network community Include a high risk user tag；And

If it is identical to there is target feature vector corresponding with the destination node in the feature vector of the network community Feature vector obtains corresponding network community and carries out the mark of fraud corporations.

Second aspect, the embodiment of the invention provides a kind of fraudulent user identification devices comprising:

Node cleaning unit is cleaned for acquired node corresponding with Claims Resolution data to be carried out data cleansing Posterior nodal point；

Subgraph division unit is multiple subgraphs for passing through spectral clustering for the cleaning posterior nodal point parallel patition；

Cluster cell, for multiple subgraphs to be clustered respectively, obtain include multiple clustering clusters network community；

Label propagation unit is obtained for the node label according to the initial setting up in the network community by label propagation Take destination node corresponding to the network community medium or high risk user tag；Wherein, the initial setting up in the network community Node label in include at least a high risk user tag；And

Corporations' recognition unit is cheated, if corresponding with the destination node for existing in the feature vector of the network community The identical feature vector of target feature vector, obtain corresponding network community and carry out fraud corporations mark.

The third aspect, the embodiment of the present invention provide a kind of computer equipment again comprising memory, processor and storage On the memory and the computer program that can run on the processor, the processor execute the computer program Fraudulent user recognition methods described in the above-mentioned first aspect of Shi Shixian.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, wherein the computer can It reads storage medium and is stored with computer program, it is above-mentioned that the computer program when being executed by a processor executes the processor Fraudulent user recognition methods described in first aspect.

The embodiment of the invention provides a kind of fraudulent user recognition methods, device, computer equipment and storage mediums.The party Method by Claims Resolution data node divided and clustered after, obtain include multiple clustering clusters network community；According to institute It is right to propagate the acquisition network community medium or high risk user tag institute by label for the node label for stating initial setting up in network community The destination node answered；If there is target feature vector phase corresponding with the destination node in the feature vector of the network community Same feature vector obtains corresponding network community and carries out the mark of fraud corporations.This method cuts net by clustering algorithm Network reduces network size, optimizes network structure, and improve the accuracy of risk identification, precise positioning fraudulent user and society Group.

Detailed description of the invention

Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is the flow diagram of fraudulent user recognition methods provided in an embodiment of the present invention；

Fig. 2 is the sub-process schematic diagram of fraudulent user recognition methods provided in an embodiment of the present invention；

Fig. 3 is another sub-process schematic diagram of fraudulent user recognition methods provided in an embodiment of the present invention；

Fig. 4 is another sub-process schematic diagram of fraudulent user recognition methods provided in an embodiment of the present invention；

Fig. 5 is another flow diagram of fraudulent user recognition methods provided in an embodiment of the present invention；

Fig. 6 is the schematic block diagram of fraudulent user identification device provided in an embodiment of the present invention；

Fig. 7 is the subelement schematic block diagram of fraudulent user identification device provided in an embodiment of the present invention；

Fig. 8 is another subelement schematic block diagram of fraudulent user identification device provided in an embodiment of the present invention；

Fig. 9 is another subelement schematic block diagram of fraudulent user identification device provided in an embodiment of the present invention；

Figure 10 is another schematic block diagram of fraudulent user identification device provided in an embodiment of the present invention；

Figure 11 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.

It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.

It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

Referring to Fig. 1, Fig. 1 is the flow diagram of fraudulent user recognition methods provided in an embodiment of the present invention, the fraud User identification method is applied in management server, and this method is held by the application software being installed in management server Row.

As shown in Figure 1, the method comprising the steps of S110~S150.

S110, acquired node corresponding with Claims Resolution data is subjected to data cleansing, obtains cleaning posterior nodal point.

In the present embodiment, fraudulent user identification is carried out to Claims Resolution data in the management server.When management server connects Received magnanimity case data (such as the case data under vehicle insurance Claims Resolution scene include driver, reporter, beneficiary and the wounded, And the data such as repair shop, phone number, maintenance place, GPS information), since Claims Resolution data inputting is abnormal, mistake record phenomenon, lead It causes high frequency points to occur, and there are the case data that limit of exceeding the time limit occurs in data of settling a claim, that is, data are corresponding reports a case to the security authorities for Claims Resolution The interval of time and present system time has exceeded the long period, these data can all influence subsequent data analysis and relationship It excavates, therefore needs first to carry out data scrubbing to the corresponding node of Claims Resolution data.

Wherein, it since each data impossible in case data are converted to a node, therefore can selectively select A portion data are as master data and corresponding generation node, and remaining data are then used as master data in above-mentioned generation node Attribute data.Such as reporter is as master data, the telephone number of reporter, identification card number are as its attribute data.

In one embodiment, as shown in Fig. 2, step S110 includes:

High frequency section in S111, the corresponding node of judgement Claims Resolution data with the presence or absence of the frequency more than preset frequency threshold value Point carries out the high frequency node if paying in the corresponding node of data there are the high frequency node that the frequency is more than the frequency threshold value It deletes, the node after obtaining high frequency cleaning；

S112, judge in the node after high frequency is cleared up with the presence or absence of data generation time beyond preset period section Node, if high frequency cleaning after node in there are data generation time exceed the period section node, data are generated Time is deleted beyond the node in the period section, obtains cleaning posterior nodal point.

In the present embodiment, since Claims Resolution data inputting is abnormal, mistake record phenomenon, cause high frequency points to occur, asked for this kind of Topic, carries out the detection of high frequency points, that is, judges the high frequency node in node with the presence or absence of the frequency more than preset times, if it exists the frequency More than the high frequency node of preset times, next step data processing is carried out again after high frequency node can be rejected.If existing in Claims Resolution data The case data that limit of exceeding the time limit occurs, that is, Claims Resolution data corresponding report a case to the security authorities time and present system time interval have exceeded compared with (node data generation time is more specifically interpreted as beyond preset period section) for a long time, in order to reduce the complexity of network Degree is usually deleted node data generation time away from modern more long node, it can also be ensured that the timeliness of data.

S120, by spectral clustering by the cleaning posterior nodal point parallel patition be multiple subgraphs.

In the present embodiment, the division in region is carried out to the node of magnanimity by spectral clustering, so that different subgraph (sons Figure can be considered as one piece of region, and including in the region includes multiple nodes) in node between connection weight smaller (do not surpass Cross preset connection weight threshold), and the connection weight between the node in same subgraph is larger (i.e. more than preset connection power Weight threshold value).The corresponding nodal parallel of the Claims Resolution data quickly can be divided into multiple subgraphs by spectral clustering.

In one embodiment, as shown in figure 3, step S120 includes:

S121, inputted similarity matrix and target clusters number are obtained；

S122, the corresponding similar matrix of corresponding with Claims Resolution data node is constructed according to the similarity matrix；

S123, adjacency matrix and diagonal matrix are constructed according to the similar matrix, by the diagonal matrix and the adjoining The difference of matrix obtains Laplacian Matrix；

S124, the feature that ranking in multiple characteristic values of the Laplacian Matrix is located at before default rank threshold is obtained The corresponding feature vector of value, to obtain target feature vector set；

S125, it is column vector by feature vector transposition each in target feature vector set and successively combines, obtains mesh Mark vector matrix；

S126, row vector each in object vector matrix is clustered by k-means algorithm, is obtained poly- with the target The same number of sub- group of class.

In the present embodiment, spectral clustering is a kind of clustering method based on graph theory, passes through the Laplce to sample data The feature vector of matrix is clustered, to achieve the purpose that cluster sample data.Spectral clustering can be understood as higher-dimension sky Between data be mapped to low-dimensional, then clustered in lower dimensional space with other clustering algorithms (such as k-means).

It, need to be by the corresponding node of the Claims Resolution data in order to realize that the Claims Resolution data to higher dimensional space are mapped to lower dimensional space The building of similar matrix is first carried out according to formula (1):

Wherein, n is to pay for the corresponding node number of data, x_iAnd x_jAny one node is respectively indicated, σ indicates the mark of node Poor, the s of standard_ijThen constitute similar matrix.

The corresponding similar matrix of corresponding with Claims Resolution data node is constructed by the similarity matrix inputted ∈- Neighbouring method, K is adjacent to method and full connection method.For example, the calculation formula such as formula 1 of full connection method.

Diagonal matrix is calculated according to formula 2 later, formula 2 is specific as follows:

Wherein, d_iThe sum of the element for indicating every a line in similar matrix, by d_iForm diagonal matrix w_ijThen indicate similar square The element of i-th row jth column in battle array.

After the difference by the diagonal matrix and the adjacency matrix obtains Laplacian Matrix, it can Laplce's square Corresponding each feature vector transposition is column vector in battle array, to form object vector matrix.It will finally by k-means algorithm Each row vector is clustered in object vector matrix, obtains sub- group identical with the target clusters number, passes through spectral clustering reality Show the quick discovery that the full dose data being made of Claims Resolution data are carried out to corporations, and realizes real-time knitmesh.

S130, multiple subgraphs are clustered respectively, obtain include multiple clustering clusters network community.

In the present embodiment, initial Claims Resolution social networks topological diagram corresponding with multiple subgraphs is carried out by corporations' detection Cluster, obtains network community.

Be after multiple regions form multiple subgraphs by initial node division by spectral clustering form multiple scales compared with Small figure needs each subgraph carrying out knitmesh at this time, obtains social networks topological diagram of initially settling a claim.It is detected later by corporations Algorithm can cluster initial Claims Resolution social networks topological diagram, obtain network community.

Corporations' detection seeks to (open up comprising the initial Claims Resolution social networks in vertex and side, such as step 1 in a figure Flutter figure) on find community structure, that is, the node in figure is clustered, constitutes corporations one by one.About corporations (community), there is presently no exact definition, it is considered that and the connection between point inside corporations is relatively dense, without It is relatively sparse with the connection between the point of corporations.

For example, a kind of society can be exported after handling by corporations' detection algorithm after the initial Claims Resolution social networks topological diagram of input Group divides, namely cuts the network after figure.The modularity (Modularity) of network after cutting figure is that one community network of assessment is drawn Divide the measure of quality, it is meant that company's number of edges of community's interior nodes and the difference of the number of edges under random case, modularity Value range be [- 1/2,1).In corporations' detection algorithm, modularity algorithm mainly assesses the compact concentration of node, Ke Yigeng Fast help carries out fixed-focus.

S140, according to the node label of the initial setting up in the network community, propagated by label and obtain the network society Destination node corresponding to group's medium or high risk user tag；Wherein, in the network community in the node label of initial setting up Including at least a high risk user tag.

In the present embodiment, the basic thought of label propagation algorithm is: by number in the label of the neighbor node of a node Measure label of most labels as the node itself.To each node addition label (label) to represent the community belonging to it, And " community " structure of same label is formed by " propagation " of label.

To each node addition label (label) to represent the community belonging to it, and formed together by " propagations " of label " community " structure of one label.The label of one node depends on the label of its neighbor node: assuming that the neighbor node of node z has z₁To z_k, then the most z of neighbor node which community includes z just belong to that community (in other words comprising which in the neighbours of z The label of community is most, which community z just belongs to).Advantage is that convergent cycle is short, (is not required to refer in advance without any Study first Determine community's number and size), it does not need to calculate any community's index in algorithm implementation procedure.

In one embodiment, as shown in figure 4, step S140 includes:

S141, the node label of node each in the network community is propagated to the reception section for having side to be connected with the node Point；

S142, iteration execute each node label received according to the receiving node, with received each section The highest node label of the frequency is the step of receiving node assigns new label in point label, until meeting preset label Until propagating termination condition；

S143, destination node corresponding to the network community medium or high risk user tag is obtained.

In the present embodiment, under the process of label propagation algorithm enters:

1) when initial, one unique label of each node is given；

2) each node updates the label of itself using label most in the label of its neighbor node；

3) step is executed 2) repeatedly, until the label of each node is no longer changed.

The update of a node label, which can be divided into, during an iteration synchronizes and asynchronous two kinds.So-called synchronized update, That is node z resulting label when the label of the t times iteration is dependent on its neighbor node in the t-1 times iteration；It is asynchronous more Newly, i.e. node z in the label of the t times iteration is dependent on the t times iteration the node of updated label and the t times iteration has not been Label of the node of updated label in the t-1 times iteration.Wherein, the number of iterations sets a threshold value, can prevent Spend operation.

Since an initial labels can be respectively provided with to node each in network community, then further according to label propagation algorithm, no The disconnected label for updating each node, until obtaining label then when the label of each node is no longer changed as high risk use The corresponding node of family label.The corporations that the corresponding node of high risk user tag is located at this time, it may be possible to cheat society Group, needs further to judge.

If there is target feature vector phase corresponding with the destination node in the feature vector of S150, the network community Same feature vector obtains corresponding network community and carries out the mark of fraud corporations.

It in the present embodiment, can be by the network community being located at by the corresponding destination node of high risk user tag The relationship of the destination node and neighbor node is analyzed, effectively to analyse whether to form fraud corporations.Institute can be first obtained at this time The corresponding target feature vector of network community where stating destination node, is then excavated in each network community, if having network The feature vector of corporations is identical as the target feature vector, to quickly filter out fraud corporations, and carries out fraud society to it The mark of group.

In one embodiment, as shown in figure 5, before step S150 further include:

S1501, the corresponding corporations of the destination node are sampled by the sampling of Weight, is obtained and destination node Corresponding target feature vector；

S1502, each corporations in the network community are sampled by the sampling of Weight, is obtained and each corporations Corresponding feature vector.

Using Weight the method for sampling (sampling of Weight specifically such as weighted walk) when, keep sampling past as far as possible Popular node direction migration, such as with the presence of four nodes of a figure, respectively this four nodes of A, B, C, D, connected between A and B The weighted value that the weighted value that the weighted value on side connects side between 0.1, A and C connects side between 0.7, B and C is 0.4, C and D Between connect side weighted value be 0.8.Assuming that 2 step of migration, from node A, when taking next neighbor node at random, if it is Random walk algorithm (i.e. walk random algorithm), the equiprobable migration of meeting can be taken to B or C node with 7/8 probability Node C, then with 8/12 probability migration to node D, can finally be produced very in maximum probability and carry out a sequence (A, C, D), for original Begin figure, and node A and node D are no associated, but by the sampling of Weight, can effectively excavate egress A and section The relationship of point D.

This method reduces network size by clustering algorithm slicing network, optimizes network structure, and improves risk knowledge Other accuracy, precise positioning fraudulent user and corporations.

The embodiment of the present invention also provides a kind of fraudulent user identification device, and the fraudulent user identification device is aforementioned for executing Any embodiment of fraudulent user recognition methods.Specifically, referring to Fig. 5, Fig. 5 is fraudulent user provided in an embodiment of the present invention The schematic block diagram of identification device.The fraudulent user identification device 100 can be configured in management server.

As shown in fig. 6, fraudulent user identification device 100 includes node cleaning unit 110, subgraph division unit 120, cluster Unit 130, label propagation unit 140, fraud corporations' recognition unit 150.

Node cleaning unit 110 obtains clear for acquired node corresponding with Claims Resolution data to be carried out data cleansing Wash posterior nodal point.

In one embodiment, as shown in fig. 7, node cleaning unit 110 includes:

High frequency node cleaning unit 111, for judging in the corresponding node of Claims Resolution data with the presence or absence of the frequency more than default Frequency threshold value high frequency node, if paying in the corresponding node of data there are the high frequency node that the frequency is more than the frequency threshold value, The high frequency node is deleted, the node after obtaining high frequency cleaning；

Overtime node clears up unit 112, super with the presence or absence of data generation time in the node after high frequency is cleared up for judging The node in preset period section out, if high frequency cleaning after node in there are data generation time exceed the period area Between node, data generation time is deleted beyond the node in the period section, obtains cleaning posterior nodal point.

Subgraph division unit 120 is multiple subgraphs for passing through spectral clustering for the cleaning posterior nodal point parallel patition.

In one embodiment, as shown in figure 8, subgraph division unit 120 includes:

Initial typing unit 121, for obtaining inputted similarity matrix and target clusters number；

Similar matrix construction unit 122, for constructing section corresponding with the Claims Resolution data according to the similarity matrix The corresponding similar matrix of point；

Laplacian Matrix construction unit 123, for constructing adjacency matrix and diagonal matrix according to the similar matrix, by The difference of the diagonal matrix and the adjacency matrix obtains Laplacian Matrix；

Target feature vector set acquiring unit 124 is arranged in multiple characteristic values for obtaining the Laplacian Matrix Name is located at feature vector corresponding to the characteristic value before presetting rank threshold, to obtain target feature vector set；

Object vector matrix acquiring unit 125, for being column by feature vector transposition each in target feature vector set Vector simultaneously successively combines, to obtain object vector matrix；

Matrix Cluster unit 126, for row vector each in object vector matrix to be clustered by k-means algorithm, Obtain sub- group identical with the target clusters number.

Cluster cell 130, for multiple subgraphs to be clustered respectively, obtain include multiple clustering clusters network community.

Label propagation unit 140 is propagated for the node label according to the initial setting up in the network community by label Obtain destination node corresponding to the network community medium or high risk user tag；Wherein, it is initially set in the network community A high risk user tag is included at least in the node label set.

In one embodiment, as shown in figure 9, the label propagation unit 140, comprising:

Label transmission unit 141, for propagating to and the node node label of node each in the network community The receiving node for having side connected；

Iteration execution unit 142 executes each node label received according to the receiving node for iteration, with institute The highest node label of the frequency is the step of receiving node assigns new label in each node label received, until Until meeting preset label propagation termination condition；

Destination node acquiring unit 143, for obtaining target corresponding to the network community medium or high risk user tag Node.

1) when initial, one unique label of each node is given；

Corporations' recognition unit 150 is cheated, if for existing and the destination node in the feature vector of the network community The identical feature vector of corresponding target feature vector obtains corresponding network community and carries out the mark of fraud corporations.

In one embodiment, as shown in Figure 10, fraudulent user identification device 100 further include:

Target feature vector acquiring unit 1501, for the sampling by Weight to the corresponding corporations of the destination node It is sampled, obtains target feature vector corresponding with destination node；

Corporations' feature vector acquiring unit 1502, for the sampling by Weight to each corporations in the network community It is sampled, obtains feature vector corresponding with each corporations.

The device reduces network size by clustering algorithm slicing network, optimizes network structure, and improves risk knowledge Other accuracy, precise positioning fraudulent user and corporations.

Above-mentioned fraudulent user identification device can be implemented as the form of computer program, which can such as scheme It is run in computer equipment shown in 11.

Figure 11 is please referred to, Figure 11 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.The computer is set Standby 500 management is server, and management is that server can be independent server, is also possible to the service of multiple server compositions Device cluster.

Refering to fig. 11, which includes processor 502, memory and the net connected by system bus 501 Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.

The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program 5032 are performed, and processor 502 may make to execute fraudulent user recognition methods.

The processor 502 supports the operation of entire computer equipment 500 for providing calculating and control ability.

The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, should When computer program 5032 is executed by processor 502, processor 502 may make to execute fraudulent user recognition methods.

The network interface 505 is for carrying out network communication, such as the transmission of offer data information.Those skilled in the art can To understand, structure shown in Figure 11, only the block diagram of part-structure relevant to the present invention program, is not constituted to this hair The restriction for the computer equipment 500 that bright scheme is applied thereon, specific computer equipment 500 may include than as shown in the figure More or fewer components perhaps combine certain components or with different component layouts.

Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize following function Can: acquired node corresponding with Claims Resolution data is subjected to data cleansing, obtains cleaning posterior nodal point；It will be described by spectral clustering Cleaning posterior nodal point parallel patition is multiple subgraphs；Multiple subgraphs are clustered respectively, obtain include multiple clustering clusters network Corporations；According to the node label of the initial setting up in the network community, is propagated by label and obtain high wind in the network community Destination node corresponding to dangerous user tag；Wherein, it is included at least in the node label of initial setting up in the network community One high risk user tag；And if there is target corresponding with the destination node in the feature vector of the network community The identical feature vector of feature vector obtains corresponding network community and carries out the mark of fraud corporations.

In one embodiment, processor 502 described is counted acquired with the corresponding node of Claims Resolution data executing It is performed the following operations when obtaining the step of cleaning posterior nodal point according to cleaning: with the presence or absence of frequency in the corresponding node of judgement Claims Resolution data The secondary high frequency node more than preset frequency threshold value, if paying in the corresponding node of data, there are the frequencys more than the frequency threshold value High frequency node deletes the high frequency node, the node after obtaining high frequency cleaning；It is in node after judging high frequency cleaning No there are the nodes that data generation time exceeds preset period section, if there are data generations in the node after high frequency cleaning Time exceeds the node in the period section, and data generation time is deleted beyond the node in the period section, Obtain cleaning posterior nodal point.

In one embodiment, processor 502 is described in execution is by the cleaning posterior nodal point parallel patition by spectral clustering It when the step of multiple subgraphs, performs the following operations: obtaining inputted similarity matrix and target clusters number；According to the phase The corresponding similar matrix of corresponding with Claims Resolution data node is constructed like degree matrix；Adjacent square is constructed according to the similar matrix Battle array and diagonal matrix, obtain Laplacian Matrix by the difference of the diagonal matrix and the adjacency matrix；Obtain the La Pula Ranking is located at feature vector corresponding to the characteristic value before presetting rank threshold in multiple characteristic values of this matrix, to obtain mesh Mark feature vector set；It is column vector by feature vector transposition each in target feature vector set and successively combines, obtains Object vector matrix；Row vector each in object vector matrix is clustered by k-means algorithm, is obtained poly- with the target The same number of sub- group of class.

In one embodiment, processor 502 execute it is described multiple subgraphs are clustered respectively, obtain include it is multiple gather It when the step of the network community of class cluster, performs the following operations: will initial Claims Resolution society corresponding with multiple subgraphs by corporations' detection It hands over network topological diagram to be clustered, obtains network community.

In one embodiment, processor 502 obtains the network community medium or high risk use in described propagated by label of execution It when the step of destination node corresponding to the label of family, performs the following operations: by the node mark of node each in the network community Label propagate to the receiving node for having side to be connected with the node；Iteration executes each node mark received according to the receiving node Label are the step that the receiving node assigns new label with the highest node label of the frequency in received each node label Suddenly, until meeting preset label and propagating termination condition；It obtains corresponding to the network community medium or high risk user tag Destination node.

In one embodiment, if processor 502 in the feature vector for executing the network community in the presence of with it is described The identical feature vector of the corresponding target feature vector of destination node obtains corresponding network community and carries out the mark of fraud corporations It before the step of knowledge, also performs the following operations: the corresponding corporations of the destination node being sampled by the sampling of Weight, Obtain target feature vector corresponding with destination node；Corporations each in the network community are carried out by the sampling of Weight Sampling, obtains feature vector corresponding with each corporations.

It will be understood by those skilled in the art that the embodiment of computer equipment shown in Figure 11 is not constituted to computer The restriction of equipment specific composition, in other embodiments, computer equipment may include components more more or fewer than diagram, or Person combines certain components or different component layouts.For example, in some embodiments, computer equipment can only include depositing Reservoir and processor, in such embodiments, the structure and function of memory and processor are consistent with embodiment illustrated in fig. 11, Details are not described herein.

It should be appreciated that in embodiments of the present invention, processor 502 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devices Part, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or The processor is also possible to any conventional processor etc..

Computer readable storage medium is provided in another embodiment of the invention.The computer readable storage medium can be with For non-volatile computer readable storage medium.The computer-readable recording medium storage has computer program, wherein calculating Machine program performs the steps of when being executed by processor carries out data cleansing for acquired node corresponding with Claims Resolution data, Obtain cleaning posterior nodal point；By spectral clustering by the cleaning posterior nodal point parallel patition be multiple subgraphs；By multiple subgraphs respectively into Row cluster, obtain include multiple clustering clusters network community；According to the node label of the initial setting up in the network community, by Label, which is propagated, obtains destination node corresponding to the network community medium or high risk user tag；Wherein, in the network community A high risk user tag is included at least in the node label of middle initial setting up；And if the feature vector of the network community It is middle to there is the identical feature vector of target feature vector corresponding with the destination node, it obtains corresponding network community and carries out Cheat the mark of corporations.

In one embodiment, described that acquired node corresponding with Claims Resolution data is subjected to data cleansing, it is cleaned Posterior nodal point, comprising: the high frequency node in the corresponding node of judgement Claims Resolution data with the presence or absence of the frequency more than preset frequency threshold value, If paying in the corresponding node of data there are the high frequency node that the frequency is more than the frequency threshold value, the high frequency node is deleted It removes, the node after obtaining high frequency cleaning；With the presence or absence of data generation time beyond preset in node after judging high frequency cleaning The node in period section, if high frequency cleaning after node in there are data generation time exceed the period section section Data generation time is deleted beyond the node in the period section, obtains cleaning posterior nodal point by point.

In one embodiment, it is described by spectral clustering by the cleaning posterior nodal point parallel patition be multiple subgraphs, comprising: obtain Take inputted similarity matrix and target clusters number；It is constructed according to the similarity matrix corresponding with the Claims Resolution data The corresponding similar matrix of node；Adjacency matrix and diagonal matrix are constructed according to the similar matrix, by the diagonal matrix and institute The difference for stating adjacency matrix obtains Laplacian Matrix；Obtain in multiple characteristic values of the Laplacian Matrix ranking be located at it is default Feature vector corresponding to characteristic value before rank threshold, to obtain target feature vector set；By target feature vector collection Each feature vector transposition is column vector and successively combines in conjunction, to obtain object vector matrix；By k-means algorithm by mesh Each row vector is clustered in mark vector matrix, obtains sub- group identical with the target clusters number.

In one embodiment, described to cluster multiple subgraphs respectively, obtain include multiple clustering clusters network community, Include: to detect to cluster initial Claims Resolution social networks topological diagram corresponding with multiple subgraphs by corporations, obtains network society Group.

In one embodiment, described propagated as label obtains mesh corresponding to the network community medium or high risk user tag Mark node, comprising: the node label of node each in the network community is propagated to the reception section for having side to be connected with the node Point；Iteration executes each node label received according to the receiving node, with received each node label intermediate frequency Secondary highest node label is the step of receiving node assigns new label, terminates item until meeting preset label and propagating Until part；Obtain destination node corresponding to the network community medium or high risk user tag.

In one embodiment, if there is mesh corresponding with the destination node in the feature vector of the network community Mark the identical feature vector of feature vector, obtain corresponding network community and carry out fraud corporations mark before, further includes: it is logical The sampling for crossing Weight samples the corresponding corporations of the destination node, obtain target signature corresponding with destination node to Amount；Each corporations in the network community are sampled by the sampling of Weight, obtain feature corresponding with each corporations Vector.

It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is set The specific work process of standby, device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein. Those of ordinary skill in the art may be aware that unit described in conjunction with the examples disclosed in the embodiments of the present disclosure and algorithm Step can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and software Interchangeability generally describes each exemplary composition and step according to function in the above description.These functions are studied carefully Unexpectedly the specific application and design constraint depending on technical solution are implemented in hardware or software.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In several embodiments provided by the present invention, it should be understood that disclosed unit and method, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only logical function partition, there may be another division manner in actual implementation, can also will be with the same function Unit set is at a unit, such as multiple units or components can be combined or can be integrated into another system or some Feature can be ignored, or not execute.In addition, shown or discussed mutual coupling, direct-coupling or communication connection can Be through some interfaces, the indirect coupling or communication connection of device or unit, be also possible to electricity, mechanical or other shapes Formula connection.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of unit therein can be selected to realize the embodiment of the present invention according to the actual needs Purpose.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existing The all or part of part or the technical solution that technology contributes can be embodied in the form of software products, should Computer software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be Personal computer, server or network equipment etc.) execute all or part of step of each embodiment the method for the present invention Suddenly.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk or The various media that can store program code such as person's CD.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. a kind of fraudulent user recognition methods characterized by comprising

According to the node label of the initial setting up in the network community, is propagated by label and obtain the network community medium or high risk Destination node corresponding to user tag；Wherein, one is included at least in the node label of initial setting up in the network community A high risk user tag；And

If there is the identical feature of target feature vector corresponding with the destination node in the feature vector of the network community Vector obtains corresponding network community and carries out the mark of fraud corporations.

2. fraudulent user recognition methods according to claim 1, which is characterized in that it is described by it is acquired with Claims Resolution data Corresponding node carries out data cleansing, obtains cleaning posterior nodal point, comprising:

High frequency node in the corresponding node of judgement Claims Resolution data with the presence or absence of the frequency more than preset frequency threshold value, if paying for data There are the high frequency node that the frequency is more than the frequency threshold value in corresponding node, the high frequency node is deleted, height is obtained Node after frequency cleaning；

The node for exceeding preset period section in node after judging high frequency cleaning with the presence or absence of data generation time, if high There are the nodes that data generation time exceeds the period section in node after frequency cleaning, and data generation time is exceeded institute The node for stating period section is deleted, and cleaning posterior nodal point is obtained.

3. fraudulent user recognition methods according to claim 1, which is characterized in that it is described by spectral clustering by the cleaning Posterior nodal point parallel patition is multiple subgraphs, comprising:

Obtain inputted similarity matrix and target clusters number；

The corresponding similar matrix of corresponding with Claims Resolution data node is constructed according to the similarity matrix；

Adjacency matrix and diagonal matrix are constructed according to the similar matrix, is obtained by the difference of the diagonal matrix and the adjacency matrix To Laplacian Matrix；

It obtains corresponding to the characteristic value before ranking in multiple characteristic values of the Laplacian Matrix is located at default rank threshold Feature vector, to obtain target feature vector set；

It is column vector by feature vector transposition each in target feature vector set and successively combines, obtains object vector square Battle array；

Row vector each in object vector matrix is clustered by k-means algorithm, is obtained and the target clusters number phase Same son group.

4. fraudulent user recognition methods according to claim 1, which is characterized in that described to gather multiple subgraphs respectively Class, obtain include multiple clustering clusters network community, comprising:

It is detected by corporations and clusters initial Claims Resolution social networks topological diagram corresponding with multiple subgraphs, obtain network society Group.

5. fraudulent user recognition methods according to claim 1, which is characterized in that described propagated by label obtains the net Destination node corresponding to network corporations medium or high risk user tag, comprising:

The node label of node each in the network community is propagated into the receiving node for having side to be connected with the node；

Iteration executes each node label received according to the receiving node, with received each node label intermediate frequency Secondary highest node label is the step of receiving node assigns new label, terminates item until meeting preset label and propagating Until part；

Obtain destination node corresponding to the network community medium or high risk user tag.

6. fraudulent user recognition methods according to claim 1, which is characterized in that if the feature of the network community There is the identical feature vector of target feature vector corresponding with the destination node in vector, obtains corresponding network community simultaneously Before the mark for carrying out fraud corporations, further includes:

The corresponding corporations of the destination node are sampled by the sampling of Weight, obtain target corresponding with destination node Feature vector；

Each corporations in the network community are sampled by the sampling of Weight, obtain feature corresponding with each corporations Vector.

7. a kind of fraudulent user identification device characterized by comprising

Node cleaning unit obtains cleaning deutomerite for acquired node corresponding with Claims Resolution data to be carried out data cleansing Point；

Label propagation unit is propagated by label for the node label according to the initial setting up in the network community and obtains institute State destination node corresponding to network community medium or high risk user tag；Wherein, in the network community initial setting up section A high risk user tag is included at least in point label；And

Corporations' recognition unit is cheated, if for there is mesh corresponding with the destination node in the feature vector of the network community The identical feature vector of feature vector is marked, corresponding network community is obtained and carries out the mark of fraud corporations.

8. fraudulent user identification device according to claim 7, which is characterized in that the label propagation unit, comprising:

Label transmission unit has Bian Xianglian with the node for propagating to the node label of node each in the network community Receiving node；

Iteration execution unit executes each node label received according to the receiving node for iteration, with received Each node label in the highest node label of the frequency be receiving node the step of assigning new label, until meeting pre- If label propagate termination condition until；

Destination node acquiring unit, for obtaining destination node corresponding to the network community medium or high risk user tag.

9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 6 when executing the computer program Any one of described in fraudulent user recognition methods.

10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence, the computer program make the processor execute such as claimed in any one of claims 1 to 6 take advantage of when being executed by a processor Cheat user identification method.