WO2018103456A1

WO2018103456A1 - Method and apparatus for grouping communities on the basis of feature matching network, and electronic device

Info

Publication number: WO2018103456A1
Application number: PCT/CN2017/105985
Authority: WO
Inventors: 李旭瑞; 邱雪涛; 赵金涛; 钟毅; 胡奕
Original assignee: 中国银联股份有限公司
Priority date: 2016-12-06
Filing date: 2017-10-13
Publication date: 2018-06-14
Also published as: CN106709800A; TWI662421B; CN106709800B; TW201822022A

Abstract

A method and an apparatus for grouping communities on the basis of a feature matching network, and an electronic device. The method comprises: according to preset K hash functions, determining a K-bit hash vector corresponding to the information of each account (S101); sequentially dividing the hash vector corresponding to the information of each account into m=K/k classes of sub hash vectors (S102); for each class, grouping the information of accounts with the same sub hash vector into the same group (S103); calculating the similarities among the information of respective accounts of the same group (S104); if the similarities among the information of respective accounts are greater than a threshold, establishing interconnected edges among the information of respective accounts to form a feature matching network (S105); and according to the feature matching network, performing community grouping of the information of respective accounts (S106). The method can analyze grouped communities to discover an abnormal community.

Description

Community division method, device and electronic device based on feature matching network

This application claims priority to Chinese Patent Application No. 2016-1110731.7, entitled "A Method and Apparatus for Community Classification Based on Feature Matching Network", filed on December 6, 2016, the entire contents of which are hereby incorporated by reference. Combined in this application.

Technical field

Embodiments of the present invention relate to the field of data processing, and in particular, to a community partitioning method, apparatus, and electronic device based on a feature matching network.

Background technique

At present, the risk situation facing the domestic credit card market is increasingly severe, and cases such as credit card cashing, fraudulent card fraud, and card fraud are increasing. Specifically, credit card cashing refers to cardholders obtaining cash after fraudulent consumer transactions or conspiring with merchants to swipe their cards. After the refund or purchase, it is easy to realize the goods and then sell and obtain cash. The fake card fraud refers to the fraudulent behavior of writing magnetic, embossed or lithographically forged a real and valid bank card according to the magnetic stripe information format of the bank card; Fraud refers to the fraudster getting some or all of the information of the real cardholder and impersonating the actual cardholder's change of the account's information for fraudulent purposes. Credit card crimes are constantly moving toward high-tech, group, and professional development. The implementation of the case is more concealed and the methods are constantly being refurbished. This poses a threat to the financial security of banks and cardholders and has become an important factor restricting the long-term healthy development of the credit card industry.

In the face of various fraudulent means, in the prior art, clustering is usually adopted to deal with it. However, there are various defects in adopting this method. For example, on the one hand, if data is added to the anti-fraud model later, The anti-fraud model makes it difficult to update the data. On the other hand, after clustering, although the nodes can be divided into several classes, the structure within the group and the relationship between the structures are still difficult to describe.

In summary, in the prior art, if data is added to the anti-fraud model in the future, it is difficult to update the data by the anti-fraud model; after clustering, the structure within the group and the relationship between the structures are still difficult to describe, so Effective measures need to be taken to solve the above problems.

Summary of the invention

The embodiment of the invention provides a method, a device and an electronic device for classifying a community based on a feature matching network, which are used to solve the problem in the prior art that if the data is added to the anti-fraud model, the anti-fraud model is difficult to update data and is clustered. After that, the structure within the group and the relationship between the structures are still difficult to describe.

An embodiment of the present invention provides a community division method based on a feature matching network, including:

Determining a K-bit hash vector corresponding to each account information according to a preset K hash function;

The hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors;

For each class, the same account information of the sub-hash vector is divided into the same group;

Calculate the similarity between each account information in the same group;

If the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network;

And performing community division on each account information according to the feature matching network.

Optionally, calculating the similarity between each account information in the same group, including:

If the i-th account information and the j-th account information are in the same group of n, the n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the The j-th account information is any one of the account information.

If the i-th account information and the j-th account information are in the same group, the number of the hash vector of the i-th account information and the hash vector of the j-th account information are the same and the hash vector value is the same. h; the i-th account information and the j-th account information are any one of the account information;

The similarity between the i-th account information and the j-th account information is s=h/K.

Optionally, determining a K-bit hash vector corresponding to each account information according to the preset K hash functions, including:

Determining a K-bit hash vector corresponding to each account information according to formula (1)

Where 2'b represents

Is a binary number,

Is one of the preset K hash functions,

a feature vector indicating account information, wherein

c ₁ , c ₂ ..., c _d represent the characteristic attributes of the account information,

Represents a non-zero vector randomly selected,

Optionally, according to the feature matching network, performing community division on each account information, including:

(1) dividing each account information into different communities in the feature matching network;

(2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;

(3) for each account information, from the row in which the account information is located in the node similarity strength matrix, try to classify the account information into other communities in order of similar strength; After the account information is divided into a positive number from the pth community to the qth community, the account information is divided into the qth community and ends;

(4) Repeat until the community structure is no longer changed.

Optionally, the calculating the similarity strength of each account information according to the similarity between the account information, including:

Calculating a similarity intensity s _i,j between the i-th account information and the j-th account information according to formula (2);

Where Γ(i) represents a neighbor set of the i-th account information, and Γ(i)∩Γ(j) represents a common neighbor set of the i-th account information and the j-th account information, w _ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.

The embodiment of the invention further provides a community division device based on a feature matching network, comprising:

a determining unit, configured to determine a K-bit hash vector corresponding to each account information according to a preset K hash function;

a first dividing unit, configured to sequentially divide a hash vector corresponding to each account information into m=K/k sub-hash vectors;

a second dividing unit, configured to divide account information of the same sub-hash vector into the same group for each class;

a calculating unit, configured to calculate a similarity between each account information in the same group;

Forming a network unit, if the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network;

The third dividing unit is configured to perform community division on the account information according to the feature matching network.

Optionally, the calculating unit is specifically configured to: if the i-th account information and the j-th account information are in the same group of n, use n/m as the similarity between the i-th account information and the j-th account information. And the i-th account information and the j-th account information are any one of the account information.

Optionally, the calculating unit is further configured to: if the i-th account information and the j-th account information are in the same group, the hash vector of the i-th account information and the hash vector of the j-th account information are located a number h of the same bit and a hash vector value; the i-th account information and the j-th account information are any one of the account information;

Optionally, the determining unit is configured to determine, according to formula (3), a K-bit hash vector corresponding to each account information.

Where 2'b represents

Is a binary number,

Is one of the preset K hash functions,

a feature vector indicating account information, wherein

Represents a non-zero vector randomly selected,

Optionally, the third dividing unit is specifically configured to: (1) divide each account information into different communities in the feature matching network;

(4) Repeat until the community structure is no longer changed.

Optionally, the calculating unit is further configured to calculate, according to formula (4), a similar strength s _i,j between the i-th account information and the j-th account information;

An embodiment of the present invention further provides an electronic device, including:

At least one processor; and,

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to: determine according to a predetermined K hash functions a K-bit hash vector corresponding to each account information; the hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors; for each class, the sub-hash vectors have the same account number The information is divided into the same group; the similarity between the account information in the same group is calculated; if the similarity between the account information is greater than the threshold, the information is established between the account information. Forming a feature matching network by establishing an interconnection edge; and performing community division on the account information according to the feature matching network.

In the embodiment of the present invention, a community division method, device, and electronic device based on a feature matching network are provided, and a K-bit hash vector corresponding to each account information is determined according to a preset K hash function; The hash vector corresponding to the information is sequentially divided into m=K/k sub-hash vectors; for each class, the same account information of the sub-hash vectors is divided into the same group; and each account information in the same group is calculated. The similarity degree; if the similarity between the account information is greater than the threshold, an interconnection edge is established between each account information to form a feature matching network; and the account information is grouped according to the feature matching network. In the embodiment of the present invention, the K-bit hash vector corresponding to each account information is first determined according to the preset K hash functions. For a large number of account information in the network, only two hash values are generated. The Greek function is not enough, so it is determined that the K-bit hash vector corresponding to each account information can cope with complex network account information. Then, for each class, the same account information of the sub-hash vector is divided into a group, and the similarity between any account information in the same group is calculated, which can avoid the calculation of similarity between any account information in the entire network. The technical solution of the present invention can effectively reduce the calculation of the similarity between the account information, and only calculate the similarity between the account information in the same group. Finally, according to the determination that the similarity between the account information is greater than the threshold, an interconnection edge is established between the account information to form a feature matching network; according to the feature matching network, the account information is divided into groups, which can more accurately target each account. The information is divided into associations, which not only makes the association relationship between the associations clear, but also analyzes the classified associations, finds out the abnormal associations, and then performs abnormal account checking on the accounts in the abnormal associations, and more specifically looks for them. Raise fraudulent accounts and improve the efficiency of responding to fraudulent accounts. In addition, if you need to add account information to the classified community, you only need to repeat the above simple steps for the added account information, and update the added account information to the corresponding location, and it will not cause update difficulties. problem.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention, Those skilled in the art can also obtain other drawings based on these drawings without paying for inventive labor.

FIG. 1 is a schematic flowchart of a community partitioning method based on a feature matching network according to an embodiment of the present invention;

2 is a flowchart of an overall schematic diagram of the present invention according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a community division apparatus based on a feature matching network according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

detailed description

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be understood that the technical solution of the embodiments of the present invention can be applied to various network fraud scenarios of various banks, such as fraud of credit card products, fraud of bank card products, fraudulent card fraud, fake card fraud, cash fraud, and the like. . The application scenario of the technical solution of the embodiment of the present invention may also be the discovery of the abnormal account information community, the commonality of discovering specific types of fraud, the discovery of other fraudulent account information according to the fraudulent account information sample, and the help of discovering unknown fraud types.

FIG. 1 is a schematic flowchart showing a method for community partitioning based on a feature matching network according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:

Step S101: Determine a K-bit hash vector corresponding to each account information according to the preset K hash functions;

Step S102: sequentially dividing the hash vector corresponding to each account information into m=K/k sub-hash vectors;

Step S103: For each class, divide the account information with the same sub-hash vector into the same group;

Step S104: Calculate the similarity between each account information in the same group;

Step S105: If the similarity between the account information is greater than the threshold, an interconnection edge is established between each account information to form a feature matching network.

Step S106: Perform community division on each account information according to the feature matching network.

In step S101, a K-bit hash vector corresponding to each account information is determined according to a preset K hash function. Specifically, a hash vector can be obtained by processing each preset hash function. Then, according to the preset K hash functions, a K-bit hash vector can be generated, and each account information corresponds to a K-bit hash vector. In the specific implementation, each account information includes multiple feature attributes. If only one hash function is used in the prior art, only one hash function is used, there is a disadvantage that it is insufficient to express a plurality of feature attributes of an account information. Therefore, this step can effectively avoid this disadvantage. The value of K can be set according to the specific situation of each account information in the specific implementation. For example, if K can be set to 4, the account information can be represented as a 4-bit hash vector.

Step S102: The hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors. Specifically, for example, K=4, k=2, then each account information is used. The 4-bit hash vector is divided into two types of sub-hash vectors. The advantage of the division is to reduce the computational complexity for the subsequent calculation of the similarity between the accounts, so as to avoid the hash vector of the account information is not divided in the prior art. However, there is a disadvantage that the calculation of the similarity is directly performed on any two of the account information directly.

Step S103: For each class, the same account information of the sub-hash vector is divided into the same group, specifically, After each account information is divided into various types, the account information with the same sub-hash vector is divided into the same group for each class to be divided. For example, if K=4, k=2, in the first category, all accounts are The first two digits of the 4-bit hash vector in the message are the same group. Similarly, in the second class, the same account information of the last two digits of the 4-bit hash vector in all account information is a group. The purpose of this division is also to reduce the computational complexity of the computational similarity, and only calculate the similarity between the account information of the same sub-hash vectors between the classes.

Step S104: Calculate the similarity between the account information in the same group. In a specific implementation, the ratio of the number of bits of the hash vector of each account information in the same group to the size of the bit may be counted, for example, account information. The hash vector of 1 is 0010, the hash vector of account information 2 is 0011, and according to K=4, k=2, then the two account information are in the same group in the first category, then two accounts in the same group are determined. The similarity of the information; then, the number of bits of the hash vector of the two account information is 3, and the size of the bit is 4 bits, so the similarity between the two account information is 3/4, also The similarity between any two account information in the same group can be calculated according to the calculation formula for the similarity. For example, the calculation formula of the similarity can be the Euclidean distance, the cosine distance, the Jaccard distance formula, and the like. On the one hand, compared to calculating the similarity of any two account information in all account information, only calculating the similarity between any two account information in the same group can greatly reduce the amount of calculation. For example, if N account information samples are taken, then N account information samples are grouped into 2 ^k groups, and the number of account information samples in each group is N/2 ^k , and any two account information is performed in each group. The number of similarity calculations is

The number of similarities calculated by 2 ^k groups for any two account information is

Therefore, the number of times that all classes need to be similarly calculated is

among them,

Is the number of divided classes, this value is a constant that can be controlled according to the actual situation, and the traditional method calculates any two account information in all accounts for similarity calculation needs to be performed

In summary, it can be seen that the calculation amount of the similarity between any two account information in the same group using the present invention is calculated by the conventional method to calculate the similarity between any two account information in all accounts. Reduce the multiple of the 2 ^k level. On the other hand, the similarity of the account information in each group is relatively large, so the similarity calculation of the account information in the same group can also improve the efficiency and accuracy of network establishment.

Step S105: If the similarity between the account information is greater than the threshold, an interconnection edge is established between each account information to form a feature matching network. Specifically, if the similarity between any two account information is greater than a threshold, An interconnection edge is established between any two account information, and the weight of the edge is the similarity value between the two account information, and finally forms a feature matching network. In the specific implementation, the threshold value can be selected to select a higher value. The feature matching network is easy to perform subsequent calculations. In addition, the value of the threshold can be adjusted according to actual conditions.

Step S106: Perform community division on each account information according to the feature matching network. Specifically, according to the similarity value between the calculated account information, the closer the similarity value is, the easier it is to be divided into the same community. After dividing the community, it is easier to check the fraudulent accounts in the network, and the proportion of fraudulent account samples in each community can be calculated. If the proportion is larger, the possibility that the community is an abnormal community is greater, and it can be based on business needs. Conduct relevant investigations, and then calculate the accounts in the abnormal community according to some indicators, find out representative accounts, and then conduct related cases investigation on these representative accounts. Some indicators may be account information in the community. Degree centrality, close centrality, feature vector centrality, etc.; or character re-analysis of account information within the community, in order to discover the characteristics of some common behaviors of the association, and conduct targeted fraud prevention. In addition, if the newly added account information forms a new community, it can be compared according to the abnormal community detected earlier, which is beneficial for the detection and prevention of unknown fraud.

Calculating the similarity between each account information in the same group can be calculated in the following two ways:

Method 1: Optionally, calculating the similarity between each account information in the same group, including: if the i-th account information and the j-th account information are in the same group of n, the n/m is used as the i-th account information. The similarity degree with the j-th account information; the i-th account information and the j-th account information are any one of the account information, specifically, two account information, such as account information, are randomly selected in all the account information. 1 and account information 2, m take 3, that is, account information 1 and account information 2 are in 3 categories, these 3 categories are called the first category, the second category, the third category, assuming these two account information In the same group of the first class and the third class, then the similarity of the two account information in the three categories is 2/3.

Method 2: Optionally, calculating the similarity between each account information in the same group, including: if the i-th account information and the j-th account information are in the same group, counting the hash vector of the i-th account information and the j-th The number H of the hash information of the account information is the same and the hash vector value is the same; the i-th account information and the j-th account information are any one of the account information; the i-th account information is similar to the j-th account information. Degree s=h/K, specifically, if any two account information in all account information, account information 1 and account information 2 are in the same group, and account information 1 and account information 2 are both 4 digits, that is, K is 4, and the first three digits of the account information 1 and the account information 2 are identical, and the fourth digit is different. Then, the similarity s of the account information 1 and the account information 2 is 3/4.

The above two methods for calculating the similarity between the account information in the same group can be concluded that the first method is the similarity between the calculated two account information in each class, and the second method is the calculation. The similarity between the two account information in the same group in each category can be seen. In the two methods, compared with the second method, the first method is to roughly calculate two accounts. The similarity between the class and the class to which the information belongs, and the similarity between the two account information calculated in the second category is more accurate. However, both methods calculate the similarity between any two account information in all account information in the network by using the Euclidean distance formula or the like in the prior art. Significant improvements have been made to further accelerate the establishment of the network.

Optionally, determining a K-bit hash vector corresponding to each account information according to the preset K hash functions, including: determining a K-bit hash vector corresponding to each account information according to formula (1)

Where 2'b represents

Is a binary number,

Is one of the preset K hash functions,

a feature vector indicating account information, wherein

Represents a non-zero vector randomly selected,

Specifically, the default hash function is

Is any one of the preset K hash functions, the hash function

The value is represented by 0 or 1. That is to say, such a hash function can only generate two hash values, which is obviously insufficient for a large amount of account information, so it is determined according to such a hash function. K-bit hash vector for each account

Is a K-bit binary number, for example, can be a 6-bit binary number, specifically 010110, then,

among them,

a feature vector indicating account information,

c ₁ , c ₂ ..., c _d represent characteristic attributes of the account information, and the specific account information characteristic attributes may be transaction amount, transaction time, transaction place, number of transaction places, transfer place, transfer amount, transfer number, and the like. The feature vector of each account information may be screened to obtain a set of theoretically best feature vectors in a specific implementation. Specifically, the fraud account information sample and the normal account information sample are extracted in a certain period of time, and the extracted The fraudulent account information sample and the normal account information sample are combined into one overall account information sample. After the steps of data preprocessing, feature screening and attribute correlation analysis of the overall account information according to the business experience, a batch of theoretically best results are selected. Feature vector. According to the preset K hash functions, the K-bit hash vector corresponding to each account information is determined, and the feature attribute of each account information can be fully extracted and represented by the feature vector, which can cope with the huge amount of account information in the complex network. Happening. In addition, it should be noted that, first, the K-bit hash vector corresponding to each account information

The determination is actually obtained through a hash random mapping process.

After hash mapping

The main purpose of using hash random mapping here is to enable the feature vector of the account information to be mapped to a uniform representation of 0 or 1, for subsequent processing, rather than simple dimensionality reduction; second, the original feature vector

Mapping to the new hash space will make the data with similar eigenvectors similar in the new hash space. The probability is:

A monotonically increasing mapping relationship from the similarity s to the probability p.

In the above embodiment, the K-bit hash vector corresponding to each account information and the hash vector order corresponding to each account information are sequentially divided into m=K/k sub-hash vectors, and the following is a table manner. Shown it, Table 1 exemplarily shows the relationship between the account information sample and the class, as shown in Table 1:

Table 1: Relationship between account information samples and classes

In Table 1, the relationship between the account information sample and the class can be expressed as a matrix of K rows and N columns, N represents the number of account information samples taken, c ₁ to c _N represents N account information samples, and N account information is obtained. The samples are divided into m=K/k classes, where each row below the first row represents a class, and N account information samples are divided into 2 ^k groups.

Optionally, according to the feature matching network, the community information is divided into account information, including:

(3) For each account information, the line of the account information in the node similarity strength matrix is similarly intensityd. The order of large to small attempts to transfer the account information to other communities; if the module information is positive from the p-th community to the q-th community, the account information is divided into the q-th community and ends;

(4) Repeat until the community structure is no longer changed.

Optionally, calculating the similarity strength of each account information according to the similarity between the accounts, including:

Calculating the similarity intensity s _i,j between the i-th account information and the j-th account information according to formula (2);

Where Γ(i) represents the neighbor set of the i-th account information, Γ(i)∩Γ(j) represents the common neighbor set of the i-th account information and the j-th account information, w _ai,z is any account information ai and the first The weight of the side between the z account information.

In a specific implementation, in step (1), the feature matching network is initialized, and each account information is divided into different communities, and the division in this step may be randomly divided; in step (2), according to formula (2) To calculate the similarity strength of each account information, specifically, if the common neighbor of the account information 1 and the account information 2 is the account information 3, the account information 1 and the account information 2 are combined with the weight of the side of the account information 3 is 5, then, The weight of any account information ai and the account information 3 is 5, and thus, the similarity between the account information 1 and the account information 2 is 1/5. Similarly, other account information is also calculated by this method. If four account information samples are taken, after calculation, a 4*4 matrix is formed. If the matrix is

It can be seen from this matrix that the similarity between the account information 1 and the account information 2 is 0.25, the similarity between the account information 1 and the account information 3 is 0.7, and the similarity between the account information 2 and the account information 3 is 0.4; (3) Steps, from the row of the account information in the similarity strength matrix, try to transfer the account information to other communities in order of similar strength, for example, from the first line of the similarity matrix, you want to put the account When the information 1 is divided into other communities, the community information of the account information 3 with the similarity (the largest in the first row) is preferentially selected. If ΔQ < 0, the account information 1 is attempted to be divided into the community in which the account information 4 (0.4 times in the first line) is located. If ΔQ < 0, the account information 1 is attempted to be divided into the community in which the account information 2 is located. If ΔQ<0 is still present, the account information 1 is reserved as an independent community, the matrix is not updated, and the calculation of the second line is performed. If ΔQ>0 is found during the above-mentioned attempt, for example, the account 1 is preferentially divided into the community where the account information 3 with the similarity is large (the largest in the first row) is located, and ΔQ>0 is found, then The attempt is successful and the first line of calculation ends. Since the status of the account 1 has changed at this time, all the data in the first column of the first row in the matrix is deleted, indicating that the subsequent account information is no longer compared with the account information 1, that is,

become

Then, a new round of trial calculation is started in the same process, that is, the account information 2 is divided into associations. Among them, the calculation formula of the module degree difference ΔQ:

To verify whether the above-mentioned attempt to divide the account information is correct, where n represents all the weights in the network, k _i represents the weight of the edge connected to the vertex i, and k _{i, in} represents the weight of the account information i within the community. And, Σ _in indicates the edge weight of the community, Σ _tot indicates the weight of the edge connected to the account information inside the community, including the side inside the community and the side outside the community. If ΔQ is a positive number, then the division is accepted. If it is not a positive number, give up this division. Through the calculation of the similarity strength matrix of the account information, the account information is preferentially divided into the community of the neighbor account information that is most similar to it, which greatly saves the number of attempts of the community division, further improves the speed of the algorithm, and further attempts on the account information. Whether the division is reasonable or not is verified by the modularity difference formula, which more effectively ensures the rationality and accuracy of the attempted division.

In order to better understand the technical solution of the present invention, FIG. 2 exemplarily shows a whole schematic flow chart of the present invention, as shown in FIG. 2:

Step S201: mapping the feature attribute of each account information to a multi-bit hash map vector by using a hash mapping method;

Step S202: classify the hash map vector of each account information.

Step S203: For each class, divide the same account information of the hash mapping vector into a group;

Step S204: Perform similarity calculation on any two account information in each group;

Step S205: If the similarity of any two account information in each group is greater than a threshold, the interconnection edge between the two account information is established, and the weight of the edge is similarity, thereby forming a feature matching network, wherein the formed The feature matching network is a sparse feature matching network;

Step S206: Perform community division on the feature matching network according to the similarity strength matrix of each account information in the feature matching network.

Compared with the prior art, in the embodiment of the present invention, first, the feature attribute of each account information is mapped into a new hash space by a random hash mapping method to form a hash mapping vector of each account information. The hash map vector of each account information is classified, and an edge can be established between the account information of high similarity, which effectively avoids the calculation of the similarity between a large number of any two account information, and efficiently establishes for each edge. The credible weight value can improve the accuracy and speed of subsequent community division. Secondly, the feature matching network is established according to the similarity of each account information, and then The similarity strength matrix of each account information in the network divides the feature matching network into associations, which not only can effectively detect abnormal communities and carry out targeted measures, but also can detect unknown fraud types, and match the feature matching network through similar strength matrix. For community division, the account information is preferentially divided into the community with the neighbor account information that is most similar to it, which greatly saves the number of community division attempts and further improves the speed of the algorithm. Third, through the formation of feature matching network, related accounts The similarity between the information is permanently stored as the weight of the edge. Even if more new account information comes in, it will not affect the original interconnection edge in the network. It only needs to insert the new account information into the original feature matching. In the network. When adding new account information to the original feature matching network map, the random hash mapping method is first used to classify each account information, and then the similarity calculation is performed with the account information in the class. If the similarity is greater than the threshold, then Add a new edge. Subsequent only need to perform a smaller but more accurate community partitioning algorithm to achieve the function. At the same time, the structure of the feature matching network can more clearly display the association structure within the community and between the communities, which cannot be achieved by the traditional clustering method.

Based on the same concept, a community matching device based on a feature matching network is provided in the embodiment of the present invention. As shown in FIG. 3, the device includes a determining unit 301, a first dividing unit 302, a second dividing unit 303, and a calculating unit 304. A network unit 305 and a third dividing unit 306 are formed. among them:

a determining unit 301: configured to determine a K-bit hash vector corresponding to each account information according to the preset K hash functions;

The first dividing unit 302 is configured to sequentially divide the hash vector corresponding to each account information into m=K/k sub-hash vectors;

a second dividing unit 303: configured to divide, for each class, account information with the same sub-hash vector into the same group;

The calculating unit 304 is configured to calculate a similarity between each account information in the same group;

Forming a network unit 305: configured to establish an interconnection edge between each account information to form a feature matching network if the similarity between the account information is greater than a threshold;

The third dividing unit 306 is configured to perform community division on each account information according to the feature matching network.

Optionally, the calculating unit 304 is specifically configured to:

If the i-th account information and the j-th account information are in the same group of n categories, n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the j-th account information are accounts. Any of the information.

Optionally, the calculating unit 304 is further specifically configured to:

If the i-th account information and the j-th account information are in the same group, the hash number of the i-th account information and the hash vector of the j-th account information are the same number and the hash vector value is the same number h; The account information and the j-th account information are any one of the account information;

Optionally, the determining unit 301 is configured to:

Determine the K-bit hash vector corresponding to each account information according to formula (3)

Where 2'b represents

Is a binary number,

Is one of the preset K hash functions,

a feature vector indicating account information, wherein

Represents a non-zero vector randomly selected,

Optionally, the third dividing unit 306 is specifically configured to:

(3) For each account information, from the row where the account information in the node similarity strength matrix is located, try to assign the account information to other communities in order of similar strength; if the account information is divided from the p community to If the module difference after the qth community is a positive number, the account information is divided into the qth community and ends;

(4) Repeat until the community structure is no longer changed.

Optionally, the calculating unit 304 is further specifically configured to:

Calculating the similarity intensity s _i,j between the i-th account information and the j-th account information according to formula (4);

Referring to FIG. 4, an electronic device according to an embodiment of the present invention is applicable to the above embodiment of the present invention.

The electronic device may include one or more processors 410 and a memory 420, and one processor 410 is exemplified in FIG.

The apparatus for performing the community matching method based on the feature matching network may further include: an input device 430 and an output device 440.

The processor 410, the memory 420, the input device 430, and the output device 440 may be through a bus or other means Connection, as shown in Figure 4 by bus connection.

The memory 420 is a non-volatile computer readable storage medium, and can be used for storing a non-volatile software program, a non-volatile computer executable program, and a module, such as a feature matching network-based community partitioning in the embodiment of the present application. The corresponding program instruction/module. The processor 410 executes various functional applications and data processing of the server by running non-volatile software programs, instructions, and modules stored in the memory 420, that is, implementing the community partitioning method based on the feature matching network of the above method embodiments.

The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store the creation according to the use of the feature matching network based community division device Data, etc. Moreover, memory 420 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 420 can optionally include memory remotely disposed relative to the processor 410, which can be connected to the feature matching network based community partitioning device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 430 can receive the input digital or character information and generate key signal inputs related to user settings and function control of the feature matching network based community partitioning device. Output device 440 can include a display device such as a display screen.

The one or more modules are stored in the memory 420, and when executed by the one or more processors 410, perform a feature matching network based community partitioning method in any of the above method embodiments.

The processor 410 is configured with one or more executable programs, and the one or more executable programs are configured to perform the following process: determining K bits corresponding to each account information according to preset K hash functions. a hash vector; the hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors; for each class, the same account information of the sub-hash vectors is divided into the same group; a similarity between each account information; if the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network; matching the network according to the feature , the community division of each account information.

Preferably, the processor 410 is specifically configured to:

If the i-th account information and the j-th account information are in the same group, the number of the hash vector of the i-th account information and the hash vector of the j-th account information are the same and the hash vector value is the same. h; the i-th account information and The jth account information is any one of the account information;

Preferably, the processor 410 is specifically configured to:

Where 2'b represents

Is a binary number,

Is one of the preset K hash functions,

a feature vector indicating account information, wherein

Represents a non-zero vector randomly selected,

Preferably, the processor 410 is specifically configured to:

(4) Repeat until the community structure is no longer changed.

Preferably, the processor 410 is specifically configured to:

It can be seen from the above that: in the embodiment of the present invention, a community division device based on a feature matching network is provided, and a K-bit hash vector corresponding to each account information is determined according to a preset K hash function; The hash vector corresponding to the account information is sequentially divided into class sub-hash vectors; for each class, the account information with the same sub-hash vector is divided into the same group; and the similarity between the account information in the same group is calculated; If the similarity between each account information is greater than the threshold For the value, an interconnection edge is established between each account information to form a feature matching network. According to the feature matching network, the community information of each account information is divided according to the similarity between the account information, and the account information is divided into associations. In the embodiment of the present invention, the K-bit hash vector corresponding to each account information is first determined according to the preset K hash functions. For a large number of account information in the network, only two hash values are generated. The Greek function is not enough, so it is determined that the K-bit hash vector corresponding to each account information can cope with complex network account information. Then, for each class, the same account information of the sub-hash vector is divided into a group, and the similarity between any account information in the same group is calculated, which can avoid the calculation of similarity between any account information in the entire network. The technical solution of the present invention can effectively reduce the calculation of the similarity between the account information, and only calculate the similarity between the account information in the same group. Finally, according to the determination that the similarity between the account information is greater than the threshold, an interconnection edge is established between the account information to form a feature matching network; according to the feature matching network, the account information is divided into groups, which can more accurately target each account. The information is divided into associations, which not only makes the association relationship between the associations clear, but also analyzes the classified associations, finds out the abnormal associations, and then performs abnormal account checking on the accounts in the abnormal associations, and more specifically looks for them. Raise fraudulent accounts and improve the efficiency of responding to fraudulent accounts. In addition, if you need to add account information to the classified community, you only need to repeat the above simple steps for the added account information, and update the added account information to the corresponding location, and it will not cause update difficulties. problem.

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, or a computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

While the preferred embodiment of the invention has been described, it will be understood that Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and

It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Claims

A community division method based on feature matching network, which is characterized in that it comprises:

Determining a K-bit hash vector corresponding to each account information according to a preset K hash function;

The hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors;

For each class, the same account information of the sub-hash vector is divided into the same group;

Calculate the similarity between each account information in the same group;

If the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network;

And performing community division on each account information according to the feature matching network.
The method according to claim 1, wherein calculating the similarity between each account information in the same group comprises:

If the i-th account information and the j-th account information are in the same group of n, the n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the The j-th account information is any one of the account information.
The method according to claim 1, wherein calculating the similarity between each account information in the same group comprises:

If the i-th account information and the j-th account information are in the same group, the number of the hash vector of the i-th account information and the hash vector of the j-th account information are the same and the hash vector value is the same. h; the i-th account information and the j-th account information are any one of the account information;

The similarity between the i-th account information and the j-th account information is s=h/K.
The method of claim 1, wherein the K-bit hash vector corresponding to each account information is determined according to a preset K hash function, including:

Determining a K-bit hash vector corresponding to each account information according to formula (1)

Where 2'b represents
Is a binary number,
Is one of the preset K hash functions,

a feature vector indicating account information, wherein
c 1 , c 2 ..., c d represent the characteristic attributes of the account information,
Represents a non-zero vector randomly selected,
The method according to any one of claims 1 to 4, wherein the community information is divided into the account information according to the feature matching network, including:

(1) dividing each account information into different communities in the feature matching network;

(2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;

(3) for each account information, from the row in which the account information is located in the node similarity strength matrix, try to classify the account information into other communities in order of similar strength; After the account information is divided into a positive number from the pth community to the qth community, the account information is divided into the qth community and ends;

(4) Repeat until the community structure is no longer changed.
The method according to claim 5, wherein the calculating the similarity strength of each account information according to the similarity between the account information includes:

Calculating a similarity intensity s i,j between the i-th account information and the j-th account information according to formula (2);

Where w(z)=w ai,z formula (2)

Where Γ(i) represents a neighbor set of the i-th account information, and Γ(i)∩Γ(j) represents a common neighbor set of the i-th account information and the j-th account information, w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
A community division device based on a feature matching network, comprising:

a determining unit, configured to determine a K-bit hash vector corresponding to each account information according to a preset K hash function;

a first dividing unit, configured to sequentially divide a hash vector corresponding to each account information into m=K/k sub-hash vectors;

a second dividing unit, configured to divide account information of the same sub-hash vector into the same group for each class;

a calculating unit, configured to calculate a similarity between each account information in the same group;

Forming a network unit, if the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network;

The third dividing unit is configured to perform community division on the account information according to the feature matching network.
The device of claim 7 wherein:

The calculating unit is specifically configured to use n/m as the similarity between the i-th account information and the j-th account information if the i-th account information and the j-th account information are in the same group of n categories; The i-th account information and the j-th account information are any one of the account information.
The device of claim 7 wherein:

The calculating unit is further configured to: if the i-th account information and the j-th account information are in the same group, the hash vector of the i-th account information is located in the same place as the hash vector of the j-th account information. And the hash vector value is the same number h; the i-th account information and the j-th account information are any one of the account information;

The similarity between the i-th account information and the j-th account information is s=h/K.
The apparatus according to claim 7, wherein said determining unit is configured to determine a K-bit hash vector corresponding to each account information according to formula (3)

Where 2'b represents
Is a binary number,
Is one of the preset K hash functions,

a feature vector indicating account information, wherein
c 1 , c 2 ..., c d represent the characteristic attributes of the account information,
Represents a non-zero vector randomly selected,
A device according to any one of claims 7 to 10, characterized in that

The third dividing unit is specifically configured to: (1) divide each account information into different communities in the feature matching network;

(2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;

(3) for each account information, from the row in which the account information is located in the node similarity strength matrix, try to classify the account information into other communities in order of similar strength; After the account information is divided into a positive number from the pth community to the qth community, the account information is divided into the qth community and ends;

(4) Repeat until the community structure is no longer changed.
The device of claim 11 wherein:

The calculating unit is further configured to calculate a similarity intensity s i,j between the i-th account information and the j-th account information according to formula (4);

Where w(z)=w ai,z formula (4)

Where Γ(i) represents a neighbor set of the i-th account information, and Γ(i)∩Γ(j) represents a common neighbor set of the i-th account information and the j-th account information, w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
An electronic device, comprising:

At least one processor; and,

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6. method.
A non-volatile computer storage medium for storing computer-executable instructions for causing the computer to perform any of claims 1 to 6 The method described.
A computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to execute a claim The method of any of 1 to 6.