CN110472050A - A kind of clique's clustering method and device - Google Patents

A kind of clique's clustering method and device Download PDF

Info

Publication number
CN110472050A
CN110472050A CN201910671479.4A CN201910671479A CN110472050A CN 110472050 A CN110472050 A CN 110472050A CN 201910671479 A CN201910671479 A CN 201910671479A CN 110472050 A CN110472050 A CN 110472050A
Authority
CN
China
Prior art keywords
client
behavior
behavior sequence
clique
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910671479.4A
Other languages
Chinese (zh)
Inventor
李怀松
潘健民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910671479.4A priority Critical patent/CN110472050A/en
Publication of CN110472050A publication Critical patent/CN110472050A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Technology Law (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a kind of clique's clustering method and device, and wherein method includes: to obtain behavior sequence collection according to the correlation behavior information between each client, and each of described behavior sequence collection behavior sequence indicates that the behavior between multiple clients is associated with;Based on the behavior sequence collection, the corresponding customer vector of each client is obtained using text model training;The corresponding customer vector of each client is clustered, identification obtains clique, includes the similar multiple clients of behavior in a clique.

Description

A kind of clique's clustering method and device
Technical field
This application involves data processing technique, in particular to a kind of clique's clustering method and device.
Background technique
The formulation and implementation of anti money washing are of great significance to the healthy and orderly development of China's economic society, find in time With monitoring money-laundering, crime gained is traced and confiscated, can safeguard economic security and social stability, eliminates money laundering behavior to finance The potential financial risks of mechanism bring and legal risk.Money laundering means are carried out in the form of clique at present, such as multiple level marketing, illegally Fund-raising, gambling etc., thus, it is found that money laundering clique is the most important thing of anti money washing project.
The prior art is mainly to select the feature of client to form customer vector according to artificial experience, reuses clustering algorithm pair Customer vector cluster, recognizes clique.But the Clustering Effect of which is to be improved.
Summary of the invention
In view of this, the application provides a kind of clique's clustering method and device, to improve the Clustering Effect of clique's discovery.
Specifically, the application is achieved by the following technical solution:
In a first aspect, providing a kind of clique's clustering method, which comprises
According to the correlation behavior information between each client, behavior sequence collection, each of described behavior sequence collection are obtained Behavior sequence indicates that the behavior between multiple clients is associated with;
Based on the behavior sequence collection, the corresponding customer vector of each client is obtained using text model training;
The corresponding customer vector of each client is clustered, identification obtains clique, includes in a clique The similar multiple clients of behavior.
Second aspect, provides a kind of clique's clustering apparatus, and described device includes:
Sequence generating module, for obtaining behavior sequence collection, the behavior according to the correlation behavior information between each client Each of sequence sets behavior sequence indicates that the behavior between multiple clients is associated with;
It is right respectively to obtain each client using text model training for being based on the behavior sequence collection for vector training module The customer vector answered;
Clustering processing module, for clustering to the corresponding customer vector of each client, identification is rolled into a ball Group, it include the similar multiple clients of behavior in a clique.
The third aspect, provides a kind of data processing equipment, and the equipment includes memory, processor and is stored in memory Computer program that is upper and can running on a processor, the processor realize the disclosure any embodiment when executing described program The step of described clique's clustering method.
Clique's clustering method and device provided by the present application, by training customer vector according to the behavior sequence of client, So that the customer vector can be good at expressing the behavior relation between client, clustered according to these customer vectors, it can be more smart Quasi- discovery clique.
Detailed description of the invention
Fig. 1 is a kind of flow chart of clique's clustering method shown in one exemplary embodiment of the application;
Fig. 2 is a kind of flow chart of clique's clustering method shown in one exemplary embodiment of the application;
Fig. 3 is a kind of structural schematic diagram of clique's clustering apparatus shown in one exemplary embodiment of the application.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.
Clique's clustering method that the disclosure provides can be used for the discovery to the similar clique of behavior.
For example, the member in a clique always shows similar behavior in money laundering clique.For example, from a collection of visitor It is transferred accounts again after collecting fund at family to the same person or a group of people (gambling banker), these people are given to financial carry out fund point again Match.I.e. the behavior of transferring accounts of the member of money laundering clique is with similitude.
Based on this, clique's clustering method that the disclosure provides passes through the similitude based on customer action each in clique The mode of text model training obtains the customer vector that can express the behavior relation between client, then poly- to these customer vectors Class realizes preferable clique's discovery effect.
As follows by taking the discovery of money laundering clique as an example, the process of description clique cluster, but it is understood that, the disclosure Method is not limited to the scene of money laundering clique discovery.As shown in Figure 1, clique cluster includes:
In step 100, according to the transfer transaction data between multiple clients in a period of time, behavior sequence is generated Collection, each described behavior sequence indicate the behavior of transferring accounts between client.
For example, have collected the transfer transaction data of many clients in one month, for example, client Xiao Zhang transfer accounts it is small to client King, client Xiao Wang, which transfers accounts, gives client little Qi, and client little Qi also transfers accounts to client Xiao Li.
According to above-mentioned transfer transaction data, the behavior sequence of generation be can be " A-B-C-D ", and this behavior sequence is text This form, is referred to as text behavior sequence, subsequent to be trained with text model.Wherein, above-mentioned behavior sequence In A indicate Xiao Zhang, B indicate Xiao Wang, C indicate little Qi, D indicate Xiao Li.Including A, B, C, D, these are related in behavior sequence The client arrived, also, in the sequence A-B-C-D put in order it is corresponding with the relationship of transferring accounts of actual transfer transaction data, visitor Family A, which transfers accounts to transfer accounts to client B, client B, gives this relationship successively transferred accounts such as client C.
Similarly, a large amount of behavior sequence can be obtained according to many transfer transaction datas being collected into, forms behavior sequence Collection.Each behavior sequence includes the client that is related in the behavior of transferring accounts an of chain type, and the sequence illustrates respectively Behavior association of transferring accounts between a client, for example, being transferred accounts by A to B, is transferred accounts by B to C.
In a step 102, the normal clients filtering removal behavior sequence concentrated.
Each client in the behavior sequence of above-mentioned acquisition will participate in identifying which Customer clustering is agglomerating in the next steps Group.However, some clients are the normal clients that can determine, these normal clients do not need to participate in clique's identification again.Therefore, originally In step, the known normal client that can will be determined in behavior sequence by removing.
For example, can determine that client B is normal clients in behavior sequence " A-B-C-D ", then can be by the client B by sequence It is removed in column, obtains behavior sequence " A-C-D ".
At step 104, for each behavior sequence, select one of client as positive example centre word;And by described Behavior sequence concentration carries out negative sampling, obtains the negative example centre word of behavior sequence.
For example, behavior sequence concentration may include multiple behavior sequences such as " A-C-D ", " E-F-G-H ", " J-K-M-N ".
In this step, by taking Word2Vec algorithm as an example, Word therein is equivalent to client Customer in the present embodiment, Analogy, the Word2Vec algorithm being applied in this scene can also be known as Cust2Vec algorithm.It is calculated by the Cust2Vec Each client is switched to vector by method.
For each behavior sequence, determines the corresponding positive example centre word of the sequence and negative example centre word, usually can be used One positive example centre word and neg negative example centre words.For example, can choose therein one by taking behavior sequence " E-F-G-H " as an example A client is as positive example centre word, e.g., selects " F " as positive example centre word, which is indicated with w0;And subordinate act sequence The method choice that column are concentrated through negative sampling (Negative Sampling) goes out neg negative example centre words, is expressed as wi, wherein i =1,2 ... ..neg.
After determining positive example centre word and negative example centre word, so that it may obtain behavior when follow-up text model training The corresponding positive example of sequence and negative example.
For example, (Context (w0), w0) can be a positive example, by taking behavior sequence " E-F-G-H " as an example, w0 is F, Context (w0) is " E-G-H ", which is equivalent to the context of w0.(Context (w0), wi) can be one Negative example, wi therein are the above-mentioned negative example centre words sampled out, and Context (w0) is still " E-G-H ".
In other examples, when sampling negative example centre word, following formula is can be used in the present embodiment:
Each client that behavior sequence is concentrated is selected as the probability of negative example centre word are as follows:
Wherein c indicates that each client, counter (c) indicate that the number that client occurs in behavior sequence sets, D indicate Behavior sequence concentrates all clients, and the number that P (c) then indicates that client occurs in behavior sequence sets is more, the selected example that is negative The probability of centre word is smaller.
For example, always having one or two of core customer in clique in the identification scene of money laundering clique, other clients are with the core Heart customers as center is got together, and the negative method of sampling of above-mentioned formula (1) can reduce the punishment to core customer, is conducive to more Good discovery clique, improves the discovery rate of clique.
In step 106, the customer vector of other clients in the behavior sequence other than positive example centre word is made Pass through gradient ascent iterations using positive example centre word and negative example centre word as the output of text model for the input of text model Process carries out gradient updating processing to the customer vector of input text model, obtains the customer vector of client.
Text model training is begun through in this step obtains the customer vector of client.
, can be with all parameters of random initializtion model before starting iteration, and the vector of each client is also carried out Initialization.For example, at (Context (w0), w0), w0 therein is F, and Context (w0) is in the example of " E-G-H ", can be with E, G, H, F are arranged to an initialization vector, for example, the initialization vector of little Chen E be a sextuple vector (s1, s2, s3…….s6).E, the input of G, H as model, F are the output of model.
Carrying out gradient ascent iterations process can be according to the sequence pair for each of behavior sequence collection sequence The positive example answered and negative example (Context (w0), w0, w1, w2 ... wneg) carry out following gradient updating processing.When gradient restrains When, terminate Gradient Iteration, obtains the vector of each client;Otherwise, if gradient not yet restrains, continue Gradient Iteration mistake Journey.It is simply described as follows the gradient ascent iterations process, specific iterative process is identical as the iterative process of Word2Vec algorithm, no It is described in detail again.
A) e=0 is calculated
B) for j=0 to neg calculate (j=0 to neg here refers to for w0, w1, w2 ... wneg this A little positive examples and negative example):
G=(yj-f)η
E=e+g θwj
θwjwj+gxw0
C) for each of context (w) customer vector xk(total 2c) are updated:
xk=xk+e
Wherein, σ is excitation function, such as sigmoid function, tanh function;yjIt is whether current word wj is positive example, is just Example duration be 1, be negative example duration be 0;F is the f function in preceding formula;η is learning rate.
In step 108, the customer vector of each client is clustered, identifies and obtains clique, in a clique Including the similar multiple clients of behavior.
In this step, according to the vector for each client that training obtains, clustered using clustering algorithm.
It is clustered for example, can use HDBSCAN algorithm, obtains multiple cliques.It may also be carried out to customer vector When clustering for the first time, only identification obtains a part of clique (being properly termed as the first clique), some vectors are not classified, then can be with Remaining customer vector is continued to cluster, obtains some cliques (being properly termed as the second clique).Wherein the first clique can be , that is, there is a clique of strong similitude in core clique, such as the banker in gambling, finance.Second clique can be relatively weak relationship Clique, as gambler is more dispersed.By the result clustered twice together with, so that it may obtain the classification of each client, such as on The first clique and the second clique stated is all the clique of cluster discovery, obtains the client for including in these cliques.
After cluster obtains clique, it can analyze the similar features between the client in each clique, to obtain the group The key property of partner.It for example, most of each client in a clique is male, and is Cantonese mostly, these It can serve as the feature of this clique, preferably recognize the clique.
Clique's clustering method of the present embodiment, by training customer vector according to the behavior sequence of client, so that the visitor Family vector can be good at expressing the behavior relation between client, is clustered, can more accurately be found according to these customer vectors Clique.
In addition, above-mentioned example be by taking Word2Vec algorithm is applied to training by behavior sequence to customer vector as an example, but It is that the model is not limited in actual implementation, the text models such as other models such as FastText may be used to customer vector Training, same effect can be reached.Also, the behavior of client is also not limited to money transfer transactions behavior, in other scenes, Behavior sequence can be to be generated according to other customer action information.
It can be with furthermore when the customer vector to each client initializes, in the customer vector of initialization Non- behavior sequence information including client, for example, the age of client, home address, Account Registration time etc., these information are simultaneously It is not as obtained in behavior sequence.When customer vector based on the initialization is trained, in finally obtained customer vector not only The information obtained by behavior sequence is contained, the attribute information of clients such as age, home address itself is also included.Customer vector In it is rich comprising information, the accuracy of clique's identification will be improved so that preferably portray the client.For example, some cliques Not only behavior is similar, and the client properties of itself are also similar, and clique out poly- so not only has the characteristics that " behavior is similar ", There can be other features similar, so customer information abundant can make clique's Clustering Effect more preferable.
By at least one above-mentioned embodiment it is found that clique's clustering method of the disclosure, is the behavior sequence based on client It trains to obtain customer vector, so that customer vector can be good at the behavior relation reflected between client, thus poly- in vector By the similar Customer clustering of behavior to together, realizing that good clique finds effect when class.As shown in Fig. 2, the cluster side, clique Method includes:
In step 200, according to the behavioural information of multiple clients, the behavior sequence collection of multiple clients, the behavior are obtained Each behavior sequence of sequence sets indicates that the behavior between multiple clients is associated with.
In step 202, according to the behavior sequence collection, using text model training obtain the client of each client to Amount.Wherein, text model is the model for being capable of handling text sequence.
In step 204, the customer vector of each client is clustered, identifies and obtains clique, in a clique Including the similar multiple clients of behavior.
It should be noted that clique's clustering method can be applied to a variety of clique's identification scenes, it is not limited to money laundering The application of equal funds transactions scene, if there is clique's property and clique's feature includes there is similar member between behavior association, Clique's clustering method of the disclosure can be used to be identified.
Fig. 3 provides clique's clustering apparatus of at least one embodiment of the disclosure, which can be applied to execute this public affairs Clique's clustering method of any embodiment is opened, as shown in figure 3, the apparatus may include: sequence generating module 31, vector training mould Block 32 and clustering processing module 33.
Sequence generating module 31, for obtaining behavior sequence collection, the row according to the correlation behavior information between each client Indicate that the behavior between multiple clients is associated with for each of sequence sets behavior sequence.
Vector training module 32 obtains each client using text model training and distinguishes for being based on the behavior sequence collection Corresponding customer vector.
Clustering processing module 33, for clustering to the corresponding customer vector of each client, identification is obtained Clique includes the similar multiple clients of behavior in one clique.
In one example, vector training module 32, is specifically used for: for each behavior sequence, selecting one of visitor Family is as positive example centre word;And negative sampling is carried out by behavior sequence concentration, obtain the negative example centre word of the behavior sequence; Using the customer vector of other clients in the behavior sequence other than positive example centre word as the input of text model, by institute State the output of positive example centre word and negative example centre word as text model;By gradient ascent iterations process, to the input text The customer vector of this model carries out gradient updating processing, obtains the customer vector of the client.
In one example, vector training module 32 is obtaining institute for carrying out negative sampling by behavior sequence concentration When stating the negative example centre word of behavior sequence, comprising: the client that the behavior sequence concentrates frequency of occurrence higher is sampled as negative example The probability of centre word is smaller.
In one example, vector training module 32, is specifically used for: to described each before using text model training The customer vector of client initializes, and includes the non-behavior sequence information of the client in the customer vector of initialization;It is based on The customer vector of initialization obtains the customer vector of each client by text model training.
At least one embodiment of this specification additionally provides a kind of data processing equipment, and the equipment includes memory, place The computer program managing device and storage on a memory and can running on a processor, the processor execute real when described program Processing step in clique's clustering method described in existing this specification any embodiment.
At least one embodiment of this specification additionally provides a kind of computer readable storage medium, and meter is stored on the medium Calculation machine program when the program is executed by processor, may be implemented in clique's clustering method described in this specification any embodiment Processing step.
Each step in process shown in above method embodiment, execution sequence are not limited to suitable in flow chart Sequence.In addition, the description of each step, can be implemented as software, hardware or its form combined, for example, those skilled in the art Member can implement these as the form of software code, can be can be realized the computer of the corresponding logic function of the step can It executes instruction.When it is realized in the form of software, the executable instruction be can store in memory, and by equipment Processor execute.
The device or module that above-described embodiment illustrates can specifically realize by computer chip or entity, or by having The product of certain function is realized.A kind of typically to realize that equipment is computer, the concrete form of computer can be personal meter Calculation machine, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation are set It is any several in standby, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various modules when description apparatus above with function to describe respectively.Certainly, implementing this The function of each module can be realized in the same or multiple software and or hardware when specification one or more embodiment.
It should be understood by those skilled in the art that, this specification one or more embodiment can provide for method, system or Computer program product.Therefore, complete hardware embodiment can be used in this specification one or more embodiment, complete software is implemented The form of example or embodiment combining software and hardware aspects.Moreover, this specification one or more embodiment can be used one It is a or it is multiple wherein include computer usable program code computer-usable storage medium (including but not limited to disk storage Device, CD-ROM, optical memory etc.) on the form of computer program product implemented.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
This specification one or more embodiment can computer executable instructions it is general on It hereinafter describes, such as program module.Generally, program module includes executing particular task or realization particular abstract data type Routine, programs, objects, component, data structure etc..Can also practice in a distributed computing environment this specification one or Multiple embodiments, in these distributed computing environments, by being executed by the connected remote processing devices of communication network Task.In a distributed computing environment, the local and remote computer that program module can be located at including storage equipment is deposited In storage media.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.It is adopted especially for data For collecting equipment or data processing equipment embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.
The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.

Claims (12)

1. a kind of clique's clustering method, which comprises
According to the correlation behavior information between each client, behavior sequence collection, the behavior of each of described behavior sequence collection are obtained Sequence indicates that the behavior between multiple clients is associated with;
Based on the behavior sequence collection, the corresponding customer vector of each client is obtained using text model training;
The corresponding customer vector of each client is clustered, identification obtains clique, includes behavior in a clique Similar multiple clients.
2. according to the method described in claim 1, the correlation behavior information according between each client, obtains behavior sequence Collection, each of described behavior sequence collection behavior sequence indicate that the behavior between multiple clients is associated with, comprising:
According to the transfer transaction data between multiple clients in a period of time, behavior sequence collection, each described behavior are generated Sequence indicates the behavior of transferring accounts between multiple clients;It also, include: each client for participating in transferring accounts, institute in the behavior sequence Put in order expression of each client in behavior sequence is stated to transfer accounts sequentially.
3. obtaining each visitor using text model training according to the method described in claim 1, described be based on the behavior sequence collection The corresponding customer vector in family, comprising:
For each behavior sequence, select one of client as positive example centre word;And it is concentrated and is carried out by the behavior sequence Negative sampling, obtains the negative example centre word of the behavior sequence;
Using the customer vector of other clients in the behavior sequence other than positive example centre word as the input of text model, Using the positive example centre word and negative example centre word as the output of text model;
By gradient ascent iterations process, gradient updating processing is carried out to the customer vector for inputting the text model, obtains institute State the customer vector of client.
4. obtaining the behavior sequence according to the method described in claim 3, described carry out negative sampling by behavior sequence concentration The negative example centre word of column, comprising:
The client that the behavior sequence concentrates frequency of occurrence higher, the probability for being sampled as negative example centre word are smaller.
5. according to the method described in claim 1, after the acquisition behavior sequence collection, the method also includes: by the row Removal is filtered for the normal clients in sequence sets.
6. knowing according to the method described in claim 1, described cluster the corresponding customer vector of each client Do not obtain clique, comprising:
First time cluster is carried out to the customer vector of each client, identification obtains the first clique;
Remaining customer vector except first clique is continued to cluster, obtains the second clique.
7. obtaining each visitor using text model training according to the method described in claim 1, described be based on the behavior sequence collection The corresponding customer vector in family, comprising:
The customer vector of each client is initialized, it is non-including the client in the customer vector of initialization Behavior sequence information;
The customer vector based on initialization obtains the customer vector of each client by text model training.
8. a kind of clique's clustering apparatus, described device include:
Sequence generating module, for obtaining behavior sequence collection, the behavior sequence according to the correlation behavior information between each client Each of collection behavior sequence indicates that the behavior between multiple clients is associated with;
It is corresponding to obtain each client using text model training for being based on the behavior sequence collection for vector training module Customer vector;
Clustering processing module, for clustering to the corresponding customer vector of each client, identification obtains clique, and one It include the similar multiple clients of behavior in a clique.
9. device according to claim 8, the vector training module, are specifically used for:
For each behavior sequence, select one of client as positive example centre word;And it is concentrated and is carried out by the behavior sequence Negative sampling, obtains the negative example centre word of the behavior sequence;
Using the customer vector of other clients in the behavior sequence other than positive example centre word as the input of text model, Using the positive example centre word and negative example centre word as the output of text model;
By gradient ascent iterations process, gradient updating processing is carried out to the customer vector for inputting the text model, obtains institute State the customer vector of client.
10. device according to claim 9,
The vector training module is obtaining the negative example of the behavior sequence for carrying out negative sampling by behavior sequence concentration When centre word, comprising: the client that the behavior sequence concentrates frequency of occurrence higher, the probability for being sampled as negative example centre word are got over It is small.
11. device according to claim 8,
The vector training module, is specifically used for: to the customer vector of each client before using text model training It is initialized, includes the non-behavior sequence information of the client in the customer vector of initialization;The visitor based on initialization Family vector obtains the customer vector of each client by text model training.
12. a kind of data processing equipment, the equipment include memory, processor and storage on a memory and can be in processor The computer program of upper operation, the processor realize claim 1 to 7 any clique's cluster when executing described program The step of method.
CN201910671479.4A 2019-07-24 2019-07-24 A kind of clique's clustering method and device Pending CN110472050A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910671479.4A CN110472050A (en) 2019-07-24 2019-07-24 A kind of clique's clustering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910671479.4A CN110472050A (en) 2019-07-24 2019-07-24 A kind of clique's clustering method and device

Publications (1)

Publication Number Publication Date
CN110472050A true CN110472050A (en) 2019-11-19

Family

ID=68508824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910671479.4A Pending CN110472050A (en) 2019-07-24 2019-07-24 A kind of clique's clustering method and device

Country Status (1)

Country Link
CN (1) CN110472050A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461225A (en) * 2020-04-01 2020-07-28 支付宝(杭州)信息技术有限公司 Clustering system and method thereof
CN111598714A (en) * 2020-07-24 2020-08-28 北京淇瑀信息科技有限公司 Two-stage unsupervised group partner identification method and device and electronic equipment
CN113506113A (en) * 2021-06-02 2021-10-15 北京顶象技术有限公司 Credit card cash-registering group-partner mining method and system based on associated network
CN113836370A (en) * 2021-11-25 2021-12-24 上海观安信息技术股份有限公司 User group classification method and device, storage medium and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809448A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Account transaction clustering method and system thereof
CN107835113A (en) * 2017-07-05 2018-03-23 中山大学 Abnormal user detection method in a kind of social networks based on network mapping
CN108280755A (en) * 2018-02-28 2018-07-13 阿里巴巴集团控股有限公司 The recognition methods of suspicious money laundering clique and identification device
CN109858024A (en) * 2019-01-04 2019-06-07 中山大学 A kind of source of houses term vector training method and device based on word2vec
CN109919198A (en) * 2019-02-13 2019-06-21 北京航空航天大学 A kind of new network insertion learning method for restarting formula random walk

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809448A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Account transaction clustering method and system thereof
CN107835113A (en) * 2017-07-05 2018-03-23 中山大学 Abnormal user detection method in a kind of social networks based on network mapping
CN108280755A (en) * 2018-02-28 2018-07-13 阿里巴巴集团控股有限公司 The recognition methods of suspicious money laundering clique and identification device
CN109858024A (en) * 2019-01-04 2019-06-07 中山大学 A kind of source of houses term vector training method and device based on word2vec
CN109919198A (en) * 2019-02-13 2019-06-21 北京航空航天大学 A kind of new network insertion learning method for restarting formula random walk

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461225A (en) * 2020-04-01 2020-07-28 支付宝(杭州)信息技术有限公司 Clustering system and method thereof
CN111598714A (en) * 2020-07-24 2020-08-28 北京淇瑀信息科技有限公司 Two-stage unsupervised group partner identification method and device and electronic equipment
CN113506113A (en) * 2021-06-02 2021-10-15 北京顶象技术有限公司 Credit card cash-registering group-partner mining method and system based on associated network
CN113506113B (en) * 2021-06-02 2022-02-11 北京顶象技术有限公司 Credit card cash-registering group-partner mining method and system based on associated network
CN113836370A (en) * 2021-11-25 2021-12-24 上海观安信息技术股份有限公司 User group classification method and device, storage medium and computer equipment
CN113836370B (en) * 2021-11-25 2022-03-01 上海观安信息技术股份有限公司 User group classification method and device, storage medium and computer equipment

Similar Documents

Publication Publication Date Title
TWI788529B (en) Credit risk prediction method and device based on LSTM model
CN110472050A (en) A kind of clique's clustering method and device
CN108960833B (en) Abnormal transaction identification method, equipment and storage medium based on heterogeneous financial characteristics
WO2017140222A1 (en) Modelling method and device for machine learning model
CN112600810B (en) Ether house phishing fraud detection method and device based on graph classification
CN110008984B (en) Target fraud transaction model training method and device based on multitasking samples
CN110232373A (en) Face cluster method, apparatus, equipment and storage medium
CN109472626B (en) Intelligent financial risk control method and system for mobile phone leasing service
CN112214499B (en) Graph data processing method and device, computer equipment and storage medium
EP3655893A1 (en) Machine learning system for various computer applications
Yeh et al. Deep belief networks for predicting corporate defaults
CN111325619A (en) Credit card fraud detection model updating method and device based on joint learning
CN107403311B (en) Account use identification method and device
CN109118053A (en) It is a kind of steal card risk trade recognition methods and device
Ray Fraud detection in e-Commerce using machine learning
CN112116245A (en) Credit risk assessment method, credit risk assessment device, computer equipment and storage medium
Gul et al. A systematic analysis of link prediction in complex network
Hong et al. Selective residual learning for visual question answering
CN107885754B (en) Method and device for extracting credit variable from transaction data based on LDA model
CN109522317A (en) A kind of anti-fraud method for early warning and system
Gavrilov et al. Convolutional neural networks: Estimating relations in the ising model on overfitting
Bonetta et al. Retrieval-augmented Transformer-XL for close-domain dialog generation
CN114638704A (en) Illegal fund transfer identification method and device, electronic equipment and storage medium
Ramos et al. SwiftFace: real-time face detection
Bhowmik et al. Dbnex: Deep belief network and explainable ai based financial fraud detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200924

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200924

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191119