WO2018103456A1 - Method and apparatus for grouping communities on the basis of feature matching network, and electronic device - Google Patents

Method and apparatus for grouping communities on the basis of feature matching network, and electronic device Download PDF

Info

Publication number
WO2018103456A1
WO2018103456A1 PCT/CN2017/105985 CN2017105985W WO2018103456A1 WO 2018103456 A1 WO2018103456 A1 WO 2018103456A1 CN 2017105985 W CN2017105985 W CN 2017105985W WO 2018103456 A1 WO2018103456 A1 WO 2018103456A1
Authority
WO
WIPO (PCT)
Prior art keywords
account information
similarity
hash
community
matching network
Prior art date
Application number
PCT/CN2017/105985
Other languages
French (fr)
Chinese (zh)
Inventor
李旭瑞
邱雪涛
赵金涛
钟毅
胡奕
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Publication of WO2018103456A1 publication Critical patent/WO2018103456A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • Embodiments of the present invention relate to the field of data processing, and in particular, to a community partitioning method, apparatus, and electronic device based on a feature matching network.
  • credit card cashing refers to cardholders obtaining cash after fraudulent consumer transactions or conspiring with merchants to swipe their cards. After the refund or purchase, it is easy to realize the goods and then sell and obtain cash.
  • the fake card fraud refers to the fraudulent behavior of writing magnetic, embossed or lithographically forged a real and valid bank card according to the magnetic stripe information format of the bank card; Fraud refers to the fraudster getting some or all of the information of the real cardholder and impersonating the actual cardholder's change of the account's information for fraudulent purposes.
  • Credit card crimes are constantly moving toward high-tech, group, and professional development. The implementation of the case is more concealed and the methods are constantly being refurbished. This poses a threat to the financial security of banks and cardholders and has become an important factor restricting the long-term healthy development of the credit card industry.
  • clustering is usually adopted to deal with it.
  • there are various defects in adopting this method For example, on the one hand, if data is added to the anti-fraud model later, The anti-fraud model makes it difficult to update the data.
  • the nodes can be divided into several classes, the structure within the group and the relationship between the structures are still difficult to describe.
  • the embodiment of the invention provides a method, a device and an electronic device for classifying a community based on a feature matching network, which are used to solve the problem in the prior art that if the data is added to the anti-fraud model, the anti-fraud model is difficult to update data and is clustered. After that, the structure within the group and the relationship between the structures are still difficult to describe.
  • An embodiment of the present invention provides a community division method based on a feature matching network, including:
  • the same account information of the sub-hash vector is divided into the same group
  • calculating the similarity between each account information in the same group including:
  • the n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the The j-th account information is any one of the account information.
  • calculating the similarity between each account information in the same group including:
  • the i-th account information and the j-th account information are in the same group, the number of the hash vector of the i-th account information and the hash vector of the j-th account information are the same and the hash vector value is the same. h; the i-th account information and the j-th account information are any one of the account information;
  • determining a K-bit hash vector corresponding to each account information according to the preset K hash functions including:
  • a feature vector indicating account information wherein c 1 , c 2 ..., c d represent the characteristic attributes of the account information, Represents a non-zero vector randomly selected,
  • performing community division on each account information including:
  • the calculating the similarity strength of each account information according to the similarity between the account information including:
  • w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
  • the embodiment of the invention further provides a community division device based on a feature matching network, comprising:
  • a determining unit configured to determine a K-bit hash vector corresponding to each account information according to a preset K hash function
  • a second dividing unit configured to divide account information of the same sub-hash vector into the same group for each class
  • a calculating unit configured to calculate a similarity between each account information in the same group
  • Forming a network unit if the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network;
  • the third dividing unit is configured to perform community division on the account information according to the feature matching network.
  • the calculating unit is specifically configured to: if the i-th account information and the j-th account information are in the same group of n, use n/m as the similarity between the i-th account information and the j-th account information. And the i-th account information and the j-th account information are any one of the account information.
  • the calculating unit is further configured to: if the i-th account information and the j-th account information are in the same group, the hash vector of the i-th account information and the hash vector of the j-th account information are located a number h of the same bit and a hash vector value; the i-th account information and the j-th account information are any one of the account information;
  • the determining unit is configured to determine, according to formula (3), a K-bit hash vector corresponding to each account information.
  • a feature vector indicating account information wherein c 1 , c 2 ..., c d represent the characteristic attributes of the account information, Represents a non-zero vector randomly selected,
  • the third dividing unit is specifically configured to: (1) divide each account information into different communities in the feature matching network;
  • the calculating unit is further configured to calculate, according to formula (4), a similar strength s i,j between the i-th account information and the j-th account information;
  • w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
  • An embodiment of the present invention further provides an electronic device, including:
  • At least one processor and,
  • the information is divided into the same group; the similarity between the account information in the same group is calculated; if the similarity between the account information is greater than the threshold, the information is established between the account information. Forming a feature matching network by establishing an interconnection edge; and performing community division on the account information according to the feature matching network.
  • a community division method, device, and electronic device based on a feature matching network are provided, and a K-bit hash vector corresponding to each account information is determined according to a preset K hash function;
  • the similarity degree if the similarity between the account information is greater than the threshold, an interconnection edge is established between each account information to form a feature matching network; and the account information is grouped according to the feature matching network.
  • the K-bit hash vector corresponding to each account information is first determined according to the preset K hash functions. For a large number of account information in the network, only two hash values are generated. The Greek function is not enough, so it is determined that the K-bit hash vector corresponding to each account information can cope with complex network account information. Then, for each class, the same account information of the sub-hash vector is divided into a group, and the similarity between any account information in the same group is calculated, which can avoid the calculation of similarity between any account information in the entire network.
  • the technical solution of the present invention can effectively reduce the calculation of the similarity between the account information, and only calculate the similarity between the account information in the same group.
  • an interconnection edge is established between the account information to form a feature matching network; according to the feature matching network, the account information is divided into groups, which can more accurately target each account.
  • the information is divided into associations, which not only makes the association relationship between the associations clear, but also analyzes the classified associations, finds out the abnormal associations, and then performs abnormal account checking on the accounts in the abnormal associations, and more specifically looks for them. Raise fraudulent accounts and improve the efficiency of responding to fraudulent accounts.
  • you need to add account information to the classified community you only need to repeat the above simple steps for the added account information, and update the added account information to the corresponding location, and it will not cause update difficulties. problem.
  • FIG. 1 is a schematic flowchart of a community partitioning method based on a feature matching network according to an embodiment of the present invention
  • FIG. 2 is a flowchart of an overall schematic diagram of the present invention according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a community division apparatus based on a feature matching network according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
  • the technical solution of the embodiments of the present invention can be applied to various network fraud scenarios of various banks, such as fraud of credit card products, fraud of bank card products, fraudulent card fraud, fake card fraud, cash fraud, and the like.
  • the application scenario of the technical solution of the embodiment of the present invention may also be the discovery of the abnormal account information community, the commonality of discovering specific types of fraud, the discovery of other fraudulent account information according to the fraudulent account information sample, and the help of discovering unknown fraud types.
  • FIG. 1 is a schematic flowchart showing a method for community partitioning based on a feature matching network according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:
  • Step S101 Determine a K-bit hash vector corresponding to each account information according to the preset K hash functions
  • Step S103 For each class, divide the account information with the same sub-hash vector into the same group;
  • Step S104 Calculate the similarity between each account information in the same group
  • Step S105 If the similarity between the account information is greater than the threshold, an interconnection edge is established between each account information to form a feature matching network.
  • Step S106 Perform community division on each account information according to the feature matching network.
  • a K-bit hash vector corresponding to each account information is determined according to a preset K hash function. Specifically, a hash vector can be obtained by processing each preset hash function. Then, according to the preset K hash functions, a K-bit hash vector can be generated, and each account information corresponds to a K-bit hash vector.
  • each account information includes multiple feature attributes. If only one hash function is used in the prior art, only one hash function is used, there is a disadvantage that it is insufficient to express a plurality of feature attributes of an account information. Therefore, this step can effectively avoid this disadvantage.
  • the value of K can be set according to the specific situation of each account information in the specific implementation. For example, if K can be set to 4, the account information can be represented as a 4-bit hash vector.
  • the 4-bit hash vector is divided into two types of sub-hash vectors. The advantage of the division is to reduce the computational complexity for the subsequent calculation of the similarity between the accounts, so as to avoid the hash vector of the account information is not divided in the prior art. However, there is a disadvantage that the calculation of the similarity is directly performed on any two of the account information directly.
  • Step S104 Calculate the similarity between the account information in the same group.
  • the ratio of the number of bits of the hash vector of each account information in the same group to the size of the bit may be counted, for example, account information.
  • the hash vector of 1 is 0010
  • the similarity of the information then, the number of bits of the hash vector of the two account information is 3, and the size of the bit is 4 bits, so the similarity between the two account information is 3/4, also
  • the similarity between any two account information in the same group can be calculated according to the calculation formula for the similarity.
  • the calculation formula of the similarity can be the Euclidean distance, the cosine distance, the Jaccard distance formula, and the like.
  • only calculating the similarity between any two account information in the same group can greatly reduce the amount of calculation.
  • N account information samples are taken, then N account information samples are grouped into 2 k groups, and the number of account information samples in each group is N/2 k , and any two account information is performed in each group.
  • the number of similarity calculations is The number of similarities calculated by 2 k groups for any two account information is Therefore, the number of times that all classes need to be similarly calculated is among them, Is the number of divided classes, this value is a constant that can be controlled according to the actual situation, and the traditional method calculates any two account information in all accounts for similarity calculation needs to be performed.
  • the calculation amount of the similarity between any two account information in the same group using the present invention is calculated by the conventional method to calculate the similarity between any two account information in all accounts. Reduce the multiple of the 2 k level.
  • the similarity of the account information in each group is relatively large, so the similarity calculation of the account information in the same group can also improve the efficiency and accuracy of network establishment.
  • Step S105 If the similarity between the account information is greater than the threshold, an interconnection edge is established between each account information to form a feature matching network. Specifically, if the similarity between any two account information is greater than a threshold, An interconnection edge is established between any two account information, and the weight of the edge is the similarity value between the two account information, and finally forms a feature matching network.
  • the threshold value can be selected to select a higher value.
  • the feature matching network is easy to perform subsequent calculations.
  • the value of the threshold can be adjusted according to actual conditions.
  • Step S106 Perform community division on each account information according to the feature matching network. Specifically, according to the similarity value between the calculated account information, the closer the similarity value is, the easier it is to be divided into the same community. After dividing the community, it is easier to check the fraudulent accounts in the network, and the proportion of fraudulent account samples in each community can be calculated. If the proportion is larger, the possibility that the community is an abnormal community is greater, and it can be based on business needs. Conduct relevant investigations, and then calculate the accounts in the abnormal community according to some indicators, find out representative accounts, and then conduct related cases investigation on these representative accounts. Some indicators may be account information in the community.
  • Method 1 Optionally, calculating the similarity between each account information in the same group, including: if the i-th account information and the j-th account information are in the same group of n, the n/m is used as the i-th account information.
  • the similarity degree with the j-th account information; the i-th account information and the j-th account information are any one of the account information, specifically, two account information, such as account information, are randomly selected in all the account information.
  • Method 2 Optionally, calculating the similarity between each account information in the same group, including: if the i-th account information and the j-th account information are in the same group, counting the hash vector of the i-th account information and the j-th The number H of the hash information of the account information is the same and the hash vector value is the same; the i-th account information and the j-th account information are any one of the account information; the i-th account information is similar to the j-th account information.
  • Degree s h/K, specifically, if any two account information in all account information, account information 1 and account information 2 are in the same group, and account information 1 and account information 2 are both 4 digits, that is, K is 4, and the first three digits of the account information 1 and the account information 2 are identical, and the fourth digit is different. Then, the similarity s of the account information 1 and the account information 2 is 3/4.
  • the above two methods for calculating the similarity between the account information in the same group can be concluded that the first method is the similarity between the calculated two account information in each class, and the second method is the calculation.
  • the similarity between the two account information in the same group in each category can be seen.
  • the first method is to roughly calculate two accounts.
  • the similarity between the class and the class to which the information belongs, and the similarity between the two account information calculated in the second category is more accurate.
  • both methods calculate the similarity between any two account information in all account information in the network by using the Euclidean distance formula or the like in the prior art. Significant improvements have been made to further accelerate the establishment of the network.
  • determining a K-bit hash vector corresponding to each account information according to the preset K hash functions including: determining a K-bit hash vector corresponding to each account information according to formula (1)
  • a feature vector indicating account information wherein c 1 , c 2 ..., c d represent the characteristic attributes of the account information, Represents a non-zero vector randomly selected,
  • the default hash function is Is any one of the preset K hash functions, the hash function The value is represented by 0 or 1. That is to say, such a hash function can only generate two hash values, which is obviously insufficient for a large amount of account information, so it is determined according to such a hash function.
  • K-bit hash vector for each account Is a K-bit binary number, for example, can be a 6-bit binary number, specifically 010110, then, among them, a feature vector indicating account information, c 1 , c 2 ..., c d represent characteristic attributes of the account information, and the specific account information characteristic attributes may be transaction amount, transaction time, transaction place, number of transaction places, transfer place, transfer amount, transfer number, and the like.
  • the feature vector of each account information may be screened to obtain a set of theoretically best feature vectors in a specific implementation. Specifically, the fraud account information sample and the normal account information sample are extracted in a certain period of time, and the extracted The fraudulent account information sample and the normal account information sample are combined into one overall account information sample.
  • Feature vector According to the preset K hash functions, the K-bit hash vector corresponding to each account information is determined, and the feature attribute of each account information can be fully extracted and represented by the feature vector, which can cope with the huge amount of account information in the complex network. happensing.
  • K-bit hash vector corresponding to each account information The determination is actually obtained through a hash random mapping process.
  • hash mapping The main purpose of using hash random mapping here is to enable the feature vector of the account information to be mapped to a uniform representation of 0 or 1, for subsequent processing, rather than simple dimensionality reduction; second, the original feature vector Mapping to the new hash space will make the data with similar eigenvectors similar in the new hash space.
  • the probability is: A monotonically increasing mapping relationship from the similarity s to the probability p.
  • Table 1 Relationship between account information samples and classes
  • the relationship between the account information sample and the class can be expressed as a matrix of K rows and N columns, N represents the number of account information samples taken, c 1 to c N represents N account information samples, and N account information is obtained.
  • the community information is divided into account information, including:
  • the line of the account information in the node similarity strength matrix is similarly intensityd.
  • the order of large to small attempts to transfer the account information to other communities; if the module information is positive from the p-th community to the q-th community, the account information is divided into the q-th community and ends;
  • calculating the similarity strength of each account information according to the similarity between the accounts including:
  • ⁇ (i) represents the neighbor set of the i-th account information
  • ⁇ (i) ⁇ (j) represents the common neighbor set of the i-th account information and the j-th account information
  • w ai,z is any account information ai and the first The weight of the side between the z account information.
  • step (1) the feature matching network is initialized, and each account information is divided into different communities, and the division in this step may be randomly divided; in step (2), according to formula (2)
  • step (2) according to formula (2)
  • To calculate the similarity strength of each account information specifically, if the common neighbor of the account information 1 and the account information 2 is the account information 3, the account information 1 and the account information 2 are combined with the weight of the side of the account information 3 is 5, then, The weight of any account information ai and the account information 3 is 5, and thus, the similarity between the account information 1 and the account information 2 is 1/5.
  • other account information is also calculated by this method. If four account information samples are taken, after calculation, a 4*4 matrix is formed.
  • the matrix is It can be seen from this matrix that the similarity between the account information 1 and the account information 2 is 0.25, the similarity between the account information 1 and the account information 3 is 0.7, and the similarity between the account information 2 and the account information 3 is 0.4; (3) Steps, from the row of the account information in the similarity strength matrix, try to transfer the account information to other communities in order of similar strength, for example, from the first line of the similarity matrix, you want to put the account When the information 1 is divided into other communities, the community information of the account information 3 with the similarity (the largest in the first row) is preferentially selected. If ⁇ Q ⁇ 0, the account information 1 is attempted to be divided into the community in which the account information 4 (0.4 times in the first line) is located.
  • the account information 1 is attempted to be divided into the community in which the account information 2 is located. If ⁇ Q ⁇ 0 is still present, the account information 1 is reserved as an independent community, the matrix is not updated, and the calculation of the second line is performed. If ⁇ Q>0 is found during the above-mentioned attempt, for example, the account 1 is preferentially divided into the community where the account information 3 with the similarity is large (the largest in the first row) is located, and ⁇ Q>0 is found, then The attempt is successful and the first line of calculation ends.
  • the calculation formula of the module degree difference ⁇ Q To verify whether the above-mentioned attempt to divide the account information is correct, where n represents all the weights in the network, k i represents the weight of the edge connected to the vertex i, and k i, in represents the weight of the account information i within the community.
  • ⁇ in indicates the edge weight of the community
  • ⁇ tot indicates the weight of the edge connected to the account information inside the community, including the side inside the community and the side outside the community. If ⁇ Q is a positive number, then the division is accepted. If it is not a positive number, give up this division.
  • the account information is preferentially divided into the community of the neighbor account information that is most similar to it, which greatly saves the number of attempts of the community division, further improves the speed of the algorithm, and further attempts on the account information. Whether the division is reasonable or not is verified by the modularity difference formula, which more effectively ensures the rationality and accuracy of the attempted division.
  • FIG. 2 exemplarily shows a whole schematic flow chart of the present invention, as shown in FIG. 2:
  • Step S201 mapping the feature attribute of each account information to a multi-bit hash map vector by using a hash mapping method
  • Step S202 classify the hash map vector of each account information.
  • Step S203 For each class, divide the same account information of the hash mapping vector into a group;
  • Step S204 Perform similarity calculation on any two account information in each group
  • Step S205 If the similarity of any two account information in each group is greater than a threshold, the interconnection edge between the two account information is established, and the weight of the edge is similarity, thereby forming a feature matching network, wherein the formed
  • the feature matching network is a sparse feature matching network
  • Step S206 Perform community division on the feature matching network according to the similarity strength matrix of each account information in the feature matching network.
  • the feature attribute of each account information is mapped into a new hash space by a random hash mapping method to form a hash mapping vector of each account information.
  • the hash map vector of each account information is classified, and an edge can be established between the account information of high similarity, which effectively avoids the calculation of the similarity between a large number of any two account information, and efficiently establishes for each edge.
  • the credible weight value can improve the accuracy and speed of subsequent community division.
  • the feature matching network is established according to the similarity of each account information, and then The similarity strength matrix of each account information in the network divides the feature matching network into associations, which not only can effectively detect abnormal communities and carry out targeted measures, but also can detect unknown fraud types, and match the feature matching network through similar strength matrix.
  • the account information is preferentially divided into the community with the neighbor account information that is most similar to it, which greatly saves the number of community division attempts and further improves the speed of the algorithm.
  • feature matching network through the formation of feature matching network, related accounts The similarity between the information is permanently stored as the weight of the edge. Even if more new account information comes in, it will not affect the original interconnection edge in the network. It only needs to insert the new account information into the original feature matching. In the network.
  • the random hash mapping method is first used to classify each account information, and then the similarity calculation is performed with the account information in the class. If the similarity is greater than the threshold, then Add a new edge. Subsequent only need to perform a smaller but more accurate community partitioning algorithm to achieve the function. At the same time, the structure of the feature matching network can more clearly display the association structure within the community and between the communities, which cannot be achieved by the traditional clustering method.
  • the device includes a determining unit 301, a first dividing unit 302, a second dividing unit 303, and a calculating unit 304.
  • a network unit 305 and a third dividing unit 306 are formed. among them:
  • a determining unit 301 configured to determine a K-bit hash vector corresponding to each account information according to the preset K hash functions;
  • a second dividing unit 303 configured to divide, for each class, account information with the same sub-hash vector into the same group;
  • the calculating unit 304 is configured to calculate a similarity between each account information in the same group;
  • Forming a network unit 305 configured to establish an interconnection edge between each account information to form a feature matching network if the similarity between the account information is greater than a threshold;
  • the third dividing unit 306 is configured to perform community division on each account information according to the feature matching network.
  • the calculating unit 304 is specifically configured to:
  • n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the j-th account information are accounts. Any of the information.
  • the calculating unit 304 is further specifically configured to:
  • the hash number of the i-th account information and the hash vector of the j-th account information are the same number and the hash vector value is the same number h;
  • the account information and the j-th account information are any one of the account information;
  • the determining unit 301 is configured to:
  • a feature vector indicating account information wherein c 1 , c 2 ..., c d represent the characteristic attributes of the account information, Represents a non-zero vector randomly selected,
  • the third dividing unit 306 is specifically configured to:
  • the calculating unit 304 is further specifically configured to:
  • ⁇ (i) represents the neighbor set of the i-th account information
  • ⁇ (i) ⁇ (j) represents the common neighbor set of the i-th account information and the j-th account information
  • w ai,z is any account information ai and the first The weight of the side between the z account information.
  • an electronic device according to an embodiment of the present invention is applicable to the above embodiment of the present invention.
  • the electronic device may include one or more processors 410 and a memory 420, and one processor 410 is exemplified in FIG.
  • the apparatus for performing the community matching method based on the feature matching network may further include: an input device 430 and an output device 440.
  • the processor 410, the memory 420, the input device 430, and the output device 440 may be through a bus or other means Connection, as shown in Figure 4 by bus connection.
  • the memory 420 is a non-volatile computer readable storage medium, and can be used for storing a non-volatile software program, a non-volatile computer executable program, and a module, such as a feature matching network-based community partitioning in the embodiment of the present application.
  • the corresponding program instruction/module The processor 410 executes various functional applications and data processing of the server by running non-volatile software programs, instructions, and modules stored in the memory 420, that is, implementing the community partitioning method based on the feature matching network of the above method embodiments.
  • the memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store the creation according to the use of the feature matching network based community division device Data, etc.
  • memory 420 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • the memory 420 can optionally include memory remotely disposed relative to the processor 410, which can be connected to the feature matching network based community partitioning device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 430 can receive the input digital or character information and generate key signal inputs related to user settings and function control of the feature matching network based community partitioning device.
  • Output device 440 can include a display device such as a display screen.
  • the one or more modules are stored in the memory 420, and when executed by the one or more processors 410, perform a feature matching network based community partitioning method in any of the above method embodiments.
  • the processor 410 is specifically configured to:
  • the n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the The j-th account information is any one of the account information.
  • the processor 410 is specifically configured to:
  • the i-th account information and the j-th account information are in the same group, the number of the hash vector of the i-th account information and the hash vector of the j-th account information are the same and the hash vector value is the same. h; the i-th account information and The jth account information is any one of the account information;
  • the processor 410 is specifically configured to:
  • a feature vector indicating account information wherein c 1 , c 2 ..., c d represent the characteristic attributes of the account information, Represents a non-zero vector randomly selected,
  • the processor 410 is specifically configured to:
  • the processor 410 is specifically configured to:
  • w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
  • a community division device based on a feature matching network is provided, and a K-bit hash vector corresponding to each account information is determined according to a preset K hash function;
  • the hash vector corresponding to the account information is sequentially divided into class sub-hash vectors; for each class, the account information with the same sub-hash vector is divided into the same group; and the similarity between the account information in the same group is calculated; If the similarity between each account information is greater than the threshold For the value, an interconnection edge is established between each account information to form a feature matching network.
  • the feature matching network the community information of each account information is divided according to the similarity between the account information, and the account information is divided into associations.
  • the K-bit hash vector corresponding to each account information is first determined according to the preset K hash functions. For a large number of account information in the network, only two hash values are generated. The Greek function is not enough, so it is determined that the K-bit hash vector corresponding to each account information can cope with complex network account information. Then, for each class, the same account information of the sub-hash vector is divided into a group, and the similarity between any account information in the same group is calculated, which can avoid the calculation of similarity between any account information in the entire network.
  • the technical solution of the present invention can effectively reduce the calculation of the similarity between the account information, and only calculate the similarity between the account information in the same group.
  • an interconnection edge is established between the account information to form a feature matching network; according to the feature matching network, the account information is divided into groups, which can more accurately target each account.
  • the information is divided into associations, which not only makes the association relationship between the associations clear, but also analyzes the classified associations, finds out the abnormal associations, and then performs abnormal account checking on the accounts in the abnormal associations, and more specifically looks for them. Raise fraudulent accounts and improve the efficiency of responding to fraudulent accounts.
  • you need to add account information to the classified community you only need to repeat the above simple steps for the added account information, and update the added account information to the corresponding location, and it will not cause update difficulties. problem.
  • embodiments of the present invention can be provided as a method, or a computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and an apparatus for grouping communities on the basis of a feature matching network, and an electronic device. The method comprises: according to preset K hash functions, determining a K-bit hash vector corresponding to the information of each account (S101); sequentially dividing the hash vector corresponding to the information of each account into m=K/k classes of sub hash vectors (S102); for each class, grouping the information of accounts with the same sub hash vector into the same group (S103); calculating the similarities among the information of respective accounts of the same group (S104); if the similarities among the information of respective accounts are greater than a threshold, establishing interconnected edges among the information of respective accounts to form a feature matching network (S105); and according to the feature matching network, performing community grouping of the information of respective accounts (S106). The method can analyze grouped communities to discover an abnormal community.

Description

一种基于特征匹配网络的社团划分方法、装置及电子设备Community division method, device and electronic device based on feature matching network
本申请要求在2016年12月6日提交中国专利局、申请号为201611110731.7、发明名称为“一种基于特征匹配网络的社团划分方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 2016-1110731.7, entitled "A Method and Apparatus for Community Classification Based on Feature Matching Network", filed on December 6, 2016, the entire contents of which are hereby incorporated by reference. Combined in this application.
技术领域Technical field
本发明实施例涉及数据处理领域,尤其涉及一种基于特征匹配网络的社团划分方法、装置及电子设备。Embodiments of the present invention relate to the field of data processing, and in particular, to a community partitioning method, apparatus, and electronic device based on a feature matching network.
背景技术Background technique
目前,国内信用卡市场面临的风险形势日益严峻,信用卡套现、伪卡欺诈、盗卡欺诈等案件日益增加,具体的,信用卡套现是指持卡人通过虚假消费交易或与商户合谋刷卡后获取现金,之后退款或购买容易变现商品后变卖获取现金等行为、伪卡欺诈是指按照银行卡的磁条信息格式写磁,凸印或平印伪造真实有效的银行卡进行交易的欺诈行为;盗卡欺诈是指欺诈者获得真实持卡人的部分或者全部信息并假冒真实持卡人对账户的信息进行变更以达到欺诈目的的行为。信用卡犯罪手段不断向着高科技、集团化、专业化发展,案件实施过程更为隐蔽,手法不断翻新,这对银行和持卡人的资金安全构成威胁,成为制约信用卡产业长期健康发展的重要因素。At present, the risk situation facing the domestic credit card market is increasingly severe, and cases such as credit card cashing, fraudulent card fraud, and card fraud are increasing. Specifically, credit card cashing refers to cardholders obtaining cash after fraudulent consumer transactions or conspiring with merchants to swipe their cards. After the refund or purchase, it is easy to realize the goods and then sell and obtain cash. The fake card fraud refers to the fraudulent behavior of writing magnetic, embossed or lithographically forged a real and valid bank card according to the magnetic stripe information format of the bank card; Fraud refers to the fraudster getting some or all of the information of the real cardholder and impersonating the actual cardholder's change of the account's information for fraudulent purposes. Credit card crimes are constantly moving toward high-tech, group, and professional development. The implementation of the case is more concealed and the methods are constantly being refurbished. This poses a threat to the financial security of banks and cardholders and has become an important factor restricting the long-term healthy development of the credit card industry.
面对各种各样的欺诈手段,现有技术中,通常采用聚类的方法来应对,然而采用这种方法存在多种缺陷,例如,一方面,如果后续对反欺诈模型添加数据,会对反欺诈模型更新数据造成困难,另一方面,经过聚类之后,虽然能将节点划分为若干类,但群体内的结构以及结构之间的关联仍然难以描述。In the face of various fraudulent means, in the prior art, clustering is usually adopted to deal with it. However, there are various defects in adopting this method. For example, on the one hand, if data is added to the anti-fraud model later, The anti-fraud model makes it difficult to update the data. On the other hand, after clustering, although the nodes can be divided into several classes, the structure within the group and the relationship between the structures are still difficult to describe.
综上所述,现有技术中存在着如果后续对反欺诈模型添加数据,造成反欺诈模型更新数据困难;经过聚类之后,群体内的结构以及结构之间的关联仍然难以描述的问题,因此,需要采取有效的措施来解决以上问题。In summary, in the prior art, if data is added to the anti-fraud model in the future, it is difficult to update the data by the anti-fraud model; after clustering, the structure within the group and the relationship between the structures are still difficult to describe, so Effective measures need to be taken to solve the above problems.
发明内容Summary of the invention
本发明实施例提供一种基于特征匹配网络的社团划分方法、装置及电子设备,用以解决现有技术中存在着如果后续对反欺诈模型添加数据,造成反欺诈模型更新数据困难、经过聚类之后,群体内的结构以及结构之间的关联仍然难以描述的问题。 The embodiment of the invention provides a method, a device and an electronic device for classifying a community based on a feature matching network, which are used to solve the problem in the prior art that if the data is added to the anti-fraud model, the anti-fraud model is difficult to update data and is clustered. After that, the structure within the group and the relationship between the structures are still difficult to describe.
本发明实施例提供一种基于特征匹配网络的社团划分方法,包括:An embodiment of the present invention provides a community division method based on a feature matching network, including:
根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;Determining a K-bit hash vector corresponding to each account information according to a preset K hash function;
将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量;The hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors;
针对每个类,将子哈希向量相同的账号信息划分为同一组;For each class, the same account information of the sub-hash vector is divided into the same group;
计算同一组内的各账号信息之间的相似度;Calculate the similarity between each account information in the same group;
若所述各账号信息之间的相似度大于阈值,则在所述各账号信息之间建立互连边,形成特征匹配网络;If the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network;
根据所述特征匹配网络,对所述各账号信息进行社团划分。And performing community division on each account information according to the feature matching network.
可选的,计算同一组内的各账号信息之间的相似度,包括:Optionally, calculating the similarity between each account information in the same group, including:
若第i账号信息与第j账号信息位于n类同组中,则将n/m作为所述第i帐号信息与所述第j账号信息之间的相似度;所述第i账号信息与所述第j账号信息为所述各账号信息中的任一个。If the i-th account information and the j-th account information are in the same group of n, the n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the The j-th account information is any one of the account information.
可选的,计算同一组内的各账号信息之间的相似度,包括:Optionally, calculating the similarity between each account information in the same group, including:
若第i账号信息与第j账号信息位于同一组中,统计所述第i账号信息的哈希向量与所述第j账号信息的哈希向量中位于同一位且哈希向量值相同的个数h;所述第i账号信息与所述第j账号信息为所述各账号信息中的任一个;If the i-th account information and the j-th account information are in the same group, the number of the hash vector of the i-th account information and the hash vector of the j-th account information are the same and the hash vector value is the same. h; the i-th account information and the j-th account information are any one of the account information;
所述第i账号信息与所述第j账号信息的相似度s=h/K。The similarity between the i-th account information and the j-th account information is s=h/K.
可选的,根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量,包括:Optionally, determining a K-bit hash vector corresponding to each account information according to the preset K hash functions, including:
根据公式(1)确定所述每个账号信息对应的K位哈希向量
Figure PCTCN2017105985-appb-000001
Determining a K-bit hash vector corresponding to each account information according to formula (1)
Figure PCTCN2017105985-appb-000001
Figure PCTCN2017105985-appb-000002
Figure PCTCN2017105985-appb-000002
其中,2'b表示
Figure PCTCN2017105985-appb-000003
是一个二进制数,
Figure PCTCN2017105985-appb-000004
是预设的K个哈希函数中的一个,
Where 2'b represents
Figure PCTCN2017105985-appb-000003
Is a binary number,
Figure PCTCN2017105985-appb-000004
Is one of the preset K hash functions,
Figure PCTCN2017105985-appb-000005
Figure PCTCN2017105985-appb-000005
Figure PCTCN2017105985-appb-000006
表示账号信息的特征向量,其中,
Figure PCTCN2017105985-appb-000007
c1,c2…,cd表示账号信息的特征属性,
Figure PCTCN2017105985-appb-000008
表示随机选取的一个非零向量,
Figure PCTCN2017105985-appb-000009
Figure PCTCN2017105985-appb-000006
a feature vector indicating account information, wherein
Figure PCTCN2017105985-appb-000007
c 1 , c 2 ..., c d represent the characteristic attributes of the account information,
Figure PCTCN2017105985-appb-000008
Represents a non-zero vector randomly selected,
Figure PCTCN2017105985-appb-000009
可选的,根据所述特征匹配网络,对所述各账号信息进行社团划分,包括:Optionally, according to the feature matching network, performing community division on each account information, including:
(1)将各账号信息划分在所述特征匹配网络中不同的社区中;(1) dividing each account information into different communities in the feature matching network;
(2)根据各账号信息之间的相似度,计算每个账号信息的相似强度,从而生成节点相似强度矩阵; (2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;
(3)针对每个账号信息,从所述节点相似强度矩阵中所述账号信息所在的行,按相似强度从大到小的的顺序尝试将所述账号信息划至其他社区中;若所述账号信息自第p社区划分至第q社区后的模块度差为正数,则将所述账号信息划分至第q社区后结束;(3) for each account information, from the row in which the account information is located in the node similarity strength matrix, try to classify the account information into other communities in order of similar strength; After the account information is divided into a positive number from the pth community to the qth community, the account information is divided into the qth community and ends;
(4)重复执行,直到社区结构不再改变为止。(4) Repeat until the community structure is no longer changed.
可选的,所述根据各账号信息之间的相似度,计算每个账号信息的相似强度,包括:Optionally, the calculating the similarity strength of each account information according to the similarity between the account information, including:
根据公式(2)计算所述第i账号信息与所述第j账号信息之间的相似强度si,jCalculating a similarity intensity s i,j between the i-th account information and the j-th account information according to formula (2);
Figure PCTCN2017105985-appb-000010
Figure PCTCN2017105985-appb-000010
其中,Γ(i)表示所述第i账号信息的邻居集合,Γ(i)∩Γ(j)表示所述第i账号信息与所述第j账号信息的共同邻居集合,wai,z为任意账号信息ai与第z账号信息之间的边的权重和。Where Γ(i) represents a neighbor set of the i-th account information, and Γ(i)∩Γ(j) represents a common neighbor set of the i-th account information and the j-th account information, w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
本发明实施例还提供一种基于特征匹配网络的社团划分装置,包括:The embodiment of the invention further provides a community division device based on a feature matching network, comprising:
确定单元,用于根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;a determining unit, configured to determine a K-bit hash vector corresponding to each account information according to a preset K hash function;
第一划分单元,用于将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量;a first dividing unit, configured to sequentially divide a hash vector corresponding to each account information into m=K/k sub-hash vectors;
第二划分单元,用于针对每个类,将子哈希向量相同的账号信息划分为同一组;a second dividing unit, configured to divide account information of the same sub-hash vector into the same group for each class;
计算单元,用于计算同一组内的各账号信息之间的相似度;a calculating unit, configured to calculate a similarity between each account information in the same group;
形成网络单元,用于若所述各账号信息之间的相似度大于阈值,则在所述各账号信息之间建立互连边,形成特征匹配网络;Forming a network unit, if the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network;
第三划分单元,用于根据所述特征匹配网络,对所述各账号信息进行社团划分。The third dividing unit is configured to perform community division on the account information according to the feature matching network.
可选的,计算单元,具体用于若第i账号信息与第j账号信息位于n类同组中,则将n/m作为所述第i帐号信息与所述第j账号信息之间的相似度;所述第i账号信息与所述第j账号信息为所述各账号信息中的任一个。Optionally, the calculating unit is specifically configured to: if the i-th account information and the j-th account information are in the same group of n, use n/m as the similarity between the i-th account information and the j-th account information. And the i-th account information and the j-th account information are any one of the account information.
可选的,计算单元,具体还用于若第i账号信息与第j账号信息位于同一组中,统计所述第i账号信息的哈希向量与所述第j账号信息的哈希向量中位于同一位且哈希向量值相同的个数h;所述第i账号信息与所述第j账号信息为所述各账号信息中的任一个;Optionally, the calculating unit is further configured to: if the i-th account information and the j-th account information are in the same group, the hash vector of the i-th account information and the hash vector of the j-th account information are located a number h of the same bit and a hash vector value; the i-th account information and the j-th account information are any one of the account information;
所述第i账号信息与所述第j账号信息的相似度s=h/K。The similarity between the i-th account information and the j-th account information is s=h/K.
可选的,确定单元,用于根据公式(3)确定所述每个账号信息对应的K位哈希向量
Figure PCTCN2017105985-appb-000011
Optionally, the determining unit is configured to determine, according to formula (3), a K-bit hash vector corresponding to each account information.
Figure PCTCN2017105985-appb-000011
Figure PCTCN2017105985-appb-000012
Figure PCTCN2017105985-appb-000012
其中,2'b表示
Figure PCTCN2017105985-appb-000013
是一个二进制数,
Figure PCTCN2017105985-appb-000014
是预设的K个哈希函数中的一个,
Where 2'b represents
Figure PCTCN2017105985-appb-000013
Is a binary number,
Figure PCTCN2017105985-appb-000014
Is one of the preset K hash functions,
Figure PCTCN2017105985-appb-000015
Figure PCTCN2017105985-appb-000015
Figure PCTCN2017105985-appb-000016
表示账号信息的特征向量,其中,
Figure PCTCN2017105985-appb-000017
c1,c2…,cd表示账号信息的特征属性,
Figure PCTCN2017105985-appb-000018
表示随机选取的一个非零向量,
Figure PCTCN2017105985-appb-000019
Figure PCTCN2017105985-appb-000016
a feature vector indicating account information, wherein
Figure PCTCN2017105985-appb-000017
c 1 , c 2 ..., c d represent the characteristic attributes of the account information,
Figure PCTCN2017105985-appb-000018
Represents a non-zero vector randomly selected,
Figure PCTCN2017105985-appb-000019
可选的,第三划分单元,具体用于(1)将各账号信息划分在所述特征匹配网络中不同的社区中;Optionally, the third dividing unit is specifically configured to: (1) divide each account information into different communities in the feature matching network;
(2)根据各账号信息之间的相似度,计算每个账号信息的相似强度,从而生成节点相似强度矩阵;(2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;
(3)针对每个账号信息,从所述节点相似强度矩阵中所述账号信息所在的行,按相似强度从大到小的的顺序尝试将所述账号信息划至其他社区中;若所述账号信息自第p社区划分至第q社区后的模块度差为正数,则将所述账号信息划分至第q社区后结束;(3) for each account information, from the row in which the account information is located in the node similarity strength matrix, try to classify the account information into other communities in order of similar strength; After the account information is divided into a positive number from the pth community to the qth community, the account information is divided into the qth community and ends;
(4)重复执行,直到社区结构不再改变为止。(4) Repeat until the community structure is no longer changed.
可选的,计算单元,具体还用于根据公式(4)计算所述第i账号信息与所述第j账号信息之间的相似强度si,jOptionally, the calculating unit is further configured to calculate, according to formula (4), a similar strength s i,j between the i-th account information and the j-th account information;
Figure PCTCN2017105985-appb-000020
Figure PCTCN2017105985-appb-000020
其中,Γ(i)表示所述第i账号信息的邻居集合,Γ(i)∩Γ(j)表示所述第i账号信息与所述第j账号信息的共同邻居集合,wai,z为任意账号信息ai与第z账号信息之间的边的权重和。Where Γ(i) represents a neighbor set of the i-th account information, and Γ(i)∩Γ(j) represents a common neighbor set of the i-th account information and the j-th account information, w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
本发明实施例还提供一种电子设备,包括:An embodiment of the present invention further provides an electronic device, including:
至少一个处理器;以及,At least one processor; and,
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量;针对每个类,将子哈希向量相同的账号信息划分为同一组;计算同一组内的各账号信息之间的相似度;若所述各账号信息之间的相似度大于阈值,则在所述各账号信息之间建 立互连边,形成特征匹配网络;根据所述特征匹配网络,对所述各账号信息进行社团划分。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to: determine according to a predetermined K hash functions a K-bit hash vector corresponding to each account information; the hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors; for each class, the sub-hash vectors have the same account number The information is divided into the same group; the similarity between the account information in the same group is calculated; if the similarity between the account information is greater than the threshold, the information is established between the account information. Forming a feature matching network by establishing an interconnection edge; and performing community division on the account information according to the feature matching network.
本发明实施例中提供了一种基于特征匹配网络的社团划分方法、装置及电子设备,根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量;针对每个类,将子哈希向量相同的账号信息划分为同一组;计算同一组内的各账号信息之间的相似度;若各账号信息之间的相似度大于阈值,则在各账号信息之间建立互连边,形成特征匹配网络;根据特征匹配网络,对各账号信息进行社团划分。本发明实施例中首先通过根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量,对于网络中数量巨大的账号信息来说,仅仅产生两个哈希值的哈希函数是不够的,因此确定每个账号信息对应的K位哈希向量能够应对复杂的网络账号信息。然后针对每个类,将子哈希向量相同的账号信息划分为一组,计算同一组内任意账号信息之间的相似度,能够避免针对整个网络中任意账号信息之间计算相似度而带来的计算量非常大的缺点;本发明技术方案能够有效减少账号信息之间相似度的计算量,仅仅计算同一组内的账号信息之间的相似度。最后根据确定各账号信息之间的相似度大于阈值,在各账号信息之间建立互连边,形成特征匹配网络;根据特征匹配网络,对各账号信息进行社团划分,能够更精准的对各账号信息进行社团划分,这样不仅能够使社团之间的关联关系很清楚,而且能够对划分的社团进行分析,找出异常社团,进而对异常社团内的账号进行异常账号排查,更加有针对性地找出欺诈账号,提高应对欺诈账号的效率。此外,如果需要对划分出的社团添加账号信息,只需要对该添加的账号信息重复以上简单的几个步骤,将所添加的账号信息更新到相应的位置即可,并不会产生更新困难的问题。In the embodiment of the present invention, a community division method, device, and electronic device based on a feature matching network are provided, and a K-bit hash vector corresponding to each account information is determined according to a preset K hash function; The hash vector corresponding to the information is sequentially divided into m=K/k sub-hash vectors; for each class, the same account information of the sub-hash vectors is divided into the same group; and each account information in the same group is calculated. The similarity degree; if the similarity between the account information is greater than the threshold, an interconnection edge is established between each account information to form a feature matching network; and the account information is grouped according to the feature matching network. In the embodiment of the present invention, the K-bit hash vector corresponding to each account information is first determined according to the preset K hash functions. For a large number of account information in the network, only two hash values are generated. The Greek function is not enough, so it is determined that the K-bit hash vector corresponding to each account information can cope with complex network account information. Then, for each class, the same account information of the sub-hash vector is divided into a group, and the similarity between any account information in the same group is calculated, which can avoid the calculation of similarity between any account information in the entire network. The technical solution of the present invention can effectively reduce the calculation of the similarity between the account information, and only calculate the similarity between the account information in the same group. Finally, according to the determination that the similarity between the account information is greater than the threshold, an interconnection edge is established between the account information to form a feature matching network; according to the feature matching network, the account information is divided into groups, which can more accurately target each account. The information is divided into associations, which not only makes the association relationship between the associations clear, but also analyzes the classified associations, finds out the abnormal associations, and then performs abnormal account checking on the accounts in the abnormal associations, and more specifically looks for them. Raise fraudulent accounts and improve the efficiency of responding to fraudulent accounts. In addition, if you need to add account information to the classified community, you only need to repeat the above simple steps for the added account information, and update the added account information to the corresponding location, and it will not cause update difficulties. problem.
附图说明DRAWINGS
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention, Those skilled in the art can also obtain other drawings based on these drawings without paying for inventive labor.
图1为本发明实施例提供了一种基于特征匹配网络的社团划分方法流程示意图;FIG. 1 is a schematic flowchart of a community partitioning method based on a feature matching network according to an embodiment of the present invention;
图2为本发明实施例提供了本发明的整体思路流程图;2 is a flowchart of an overall schematic diagram of the present invention according to an embodiment of the present invention;
图3为本发明实施例提供的一种基于特征匹配网络的社团划分装置结构示意图;FIG. 3 is a schematic structural diagram of a community division apparatus based on a feature matching network according to an embodiment of the present disclosure;
图4为本发明实施例提供的电子设备的结构示意图。 FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
具体实施方式detailed description
为了使本发明的目的、技术方案及有益效果更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
应理解,本发明实施例的技术方案可以应用于各种银行出现的网络欺诈手段的场景,比如可以是信用卡产品的欺诈、银行卡产品的欺诈、盗卡欺诈、伪卡欺诈、套现欺诈等等。本发明实施例的技术方案的应用场景也可以是对异常账号信息社团的发现、发现特定种类欺诈的共性、根据欺诈账号信息样本发现其它欺诈账号信息、帮助发现未知欺诈类型等。It should be understood that the technical solution of the embodiments of the present invention can be applied to various network fraud scenarios of various banks, such as fraud of credit card products, fraud of bank card products, fraudulent card fraud, fake card fraud, cash fraud, and the like. . The application scenario of the technical solution of the embodiment of the present invention may also be the discovery of the abnormal account information community, the commonality of discovering specific types of fraud, the discovery of other fraudulent account information according to the fraudulent account information sample, and the help of discovering unknown fraud types.
图1示例性示出了本发明实施例提供的一种基于特征匹配网络的社团划分方法流程示意图,如图1所示,包括以下步骤:FIG. 1 is a schematic flowchart showing a method for community partitioning based on a feature matching network according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:
步骤S101:根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;Step S101: Determine a K-bit hash vector corresponding to each account information according to the preset K hash functions;
步骤S102:将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量;Step S102: sequentially dividing the hash vector corresponding to each account information into m=K/k sub-hash vectors;
步骤S103:针对每个类,将子哈希向量相同的账号信息划分为同一组;Step S103: For each class, divide the account information with the same sub-hash vector into the same group;
步骤S104:计算同一组内的各账号信息之间的相似度;Step S104: Calculate the similarity between each account information in the same group;
步骤S105:若各账号信息之间的相似度大于阈值,则在各账号信息之间建立互连边,形成特征匹配网络;Step S105: If the similarity between the account information is greater than the threshold, an interconnection edge is established between each account information to form a feature matching network.
步骤S106:根据特征匹配网络,对各账号信息进行社团划分。Step S106: Perform community division on each account information according to the feature matching network.
步骤S101中,根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量,具体来说,经过每个预设的哈希函数的处理都能得到一位哈希向量,那么,根据预设的K个哈希函数,就可以产生K位哈希向量,而每个账号信息对应K位哈希向量,具体实施中,每个账号信息是包含多个特征属性的,如果仅仅使用现有技术中一个账号信息只用一个哈希函数来表示的话,会存在不足以表达一个账号信息的多个特征属性的缺点,所以,本步骤可以有效避免这个缺点。其中,K的取值可以根据具体实施中各账号信息的具体情况来设定,比如,K可以设定为4,那么账号信息就可以表示为一个4位的哈希向量。In step S101, a K-bit hash vector corresponding to each account information is determined according to a preset K hash function. Specifically, a hash vector can be obtained by processing each preset hash function. Then, according to the preset K hash functions, a K-bit hash vector can be generated, and each account information corresponds to a K-bit hash vector. In the specific implementation, each account information includes multiple feature attributes. If only one hash function is used in the prior art, only one hash function is used, there is a disadvantage that it is insufficient to express a plurality of feature attributes of an account information. Therefore, this step can effectively avoid this disadvantage. The value of K can be set according to the specific situation of each account information in the specific implementation. For example, if K can be set to 4, the account information can be represented as a 4-bit hash vector.
步骤S102:将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量,具体来说,比如,K=4,k=2,那么,就将每个账号信息为4位的哈希向量划分为2类子哈希向量,划分的好处是为后续计算账号间的相似度减少计算量,避免出现像现有技术中并没有对账号信息的哈希向量进行划分而出现直接对所有账号信息中的任意两个账号来进行相似度计算而造成的计算量特别大的缺点。Step S102: The hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors. Specifically, for example, K=4, k=2, then each account information is used. The 4-bit hash vector is divided into two types of sub-hash vectors. The advantage of the division is to reduce the computational complexity for the subsequent calculation of the similarity between the accounts, so as to avoid the hash vector of the account information is not divided in the prior art. However, there is a disadvantage that the calculation of the similarity is directly performed on any two of the account information directly.
步骤S103:针对每个类,将子哈希向量相同的账号信息划分为同一组,具体来说,对 每个账号信息划分为各类之后,针对划分的每个类,将子哈希向量相同的账号信息划分为同一组,比如,K=4,k=2的话,在第1类中,所有账号信息中4位哈希向量中前两位相同的为一组,同样,在第2类中,所有账号信息中4位哈希向量中后两位相同的账号信息为一组。这样划分的目的也是为了后面减少计算相似度的计算量,只计算各类之间子哈希向量相同的账号信息之间的相似度。Step S103: For each class, the same account information of the sub-hash vector is divided into the same group, specifically, After each account information is divided into various types, the account information with the same sub-hash vector is divided into the same group for each class to be divided. For example, if K=4, k=2, in the first category, all accounts are The first two digits of the 4-bit hash vector in the message are the same group. Similarly, in the second class, the same account information of the last two digits of the 4-bit hash vector in all account information is a group. The purpose of this division is also to reduce the computational complexity of the computational similarity, and only calculate the similarity between the account information of the same sub-hash vectors between the classes.
步骤S104:计算同一组内的各账号信息之间的相似度,具体实施中,可以统计同一组内各账号信息的哈希向量的位相同的个数与位的大小的比值,比如,账号信息1的哈希向量为0010,账号信息2的哈希向量为0011,按照K=4,k=2,那么两个账号信息在第一类中位于同一组,则确定位于同一组的两个账号信息的相似度;那么,两个账号信息的哈希向量的位相同的个数是3,位的大小是4位的,所以,这两个账号信息之间的相似度为3/4,也可以根据关于相似度的计算公式来计算同一组内的任意两个账号信息之间的相似度,比如相似度的计算公式可以是欧式距离、余弦距离、杰卡德距离公式等。一方面,相比于计算所有账号信息中的任意两个账号信息的相似度,只计算同一组内的任意两个账号信息之间的相似度能够大大减少计算量。比如,取N个账号信息样本,那么N个账号信息样本就被分到了2k个组内,每个组内的账号信息样本数为N/2k,每组内进行任意两个账号信息进行相似度计算的次数为
Figure PCTCN2017105985-appb-000021
2k个组进行任意两个账号信息进行相似度计算的次数为
Figure PCTCN2017105985-appb-000022
因此,所有类需要进行相似度计算的次数就为
Figure PCTCN2017105985-appb-000023
其中,
Figure PCTCN2017105985-appb-000024
是划分的类的个数,这个值是一个根据实际情况可以进行控制的常数,而传统的方法计算所有账号中任意两个账号信息进行相似度计算需要进行
Figure PCTCN2017105985-appb-000025
次,综上可以看出,采用本发明的计算同一组内的任意两个账号信息之间的相似度的计算量比传统的方法计算所有账号中任意两个账号信息的相似度的计算量大约缩减2k级别的倍数。另一方面,每一组内的账号信息的相似度是较大的,所以对同一组内的账号信息进行相似度计算,也能够提高网络建立的效率和准确率。
Step S104: Calculate the similarity between the account information in the same group. In a specific implementation, the ratio of the number of bits of the hash vector of each account information in the same group to the size of the bit may be counted, for example, account information. The hash vector of 1 is 0010, the hash vector of account information 2 is 0011, and according to K=4, k=2, then the two account information are in the same group in the first category, then two accounts in the same group are determined. The similarity of the information; then, the number of bits of the hash vector of the two account information is 3, and the size of the bit is 4 bits, so the similarity between the two account information is 3/4, also The similarity between any two account information in the same group can be calculated according to the calculation formula for the similarity. For example, the calculation formula of the similarity can be the Euclidean distance, the cosine distance, the Jaccard distance formula, and the like. On the one hand, compared to calculating the similarity of any two account information in all account information, only calculating the similarity between any two account information in the same group can greatly reduce the amount of calculation. For example, if N account information samples are taken, then N account information samples are grouped into 2 k groups, and the number of account information samples in each group is N/2 k , and any two account information is performed in each group. The number of similarity calculations is
Figure PCTCN2017105985-appb-000021
The number of similarities calculated by 2 k groups for any two account information is
Figure PCTCN2017105985-appb-000022
Therefore, the number of times that all classes need to be similarly calculated is
Figure PCTCN2017105985-appb-000023
among them,
Figure PCTCN2017105985-appb-000024
Is the number of divided classes, this value is a constant that can be controlled according to the actual situation, and the traditional method calculates any two account information in all accounts for similarity calculation needs to be performed
Figure PCTCN2017105985-appb-000025
In summary, it can be seen that the calculation amount of the similarity between any two account information in the same group using the present invention is calculated by the conventional method to calculate the similarity between any two account information in all accounts. Reduce the multiple of the 2 k level. On the other hand, the similarity of the account information in each group is relatively large, so the similarity calculation of the account information in the same group can also improve the efficiency and accuracy of network establishment.
步骤S105:若各账号信息之间的相似度大于阈值,则在各账号信息之间建立互连边,形成特征匹配网络,具体来说,如果任意两个账号信息之间的相似度大于阈值,就在任意两个账号信息之间建立一条互连边,边的权重就是两个账号信息之间的相似度值,最终形成特征匹配网络。具体实施中,阈值的选取可以选择较高的值没这样最终可以生成较为稀 疏的特征匹配网络,便于后续的计算,另外,阈值的取值可以根据实际情况进行调整。Step S105: If the similarity between the account information is greater than the threshold, an interconnection edge is established between each account information to form a feature matching network. Specifically, if the similarity between any two account information is greater than a threshold, An interconnection edge is established between any two account information, and the weight of the edge is the similarity value between the two account information, and finally forms a feature matching network. In the specific implementation, the threshold value can be selected to select a higher value. The feature matching network is easy to perform subsequent calculations. In addition, the value of the threshold can be adjusted according to actual conditions.
步骤S106:根据特征匹配网络,对各账号信息进行社团划分,具体来说,根据计算出来的各账号信息之间的相似度值,相似度值越接近的越容易被划分到同一个社团中。划分社团之后,对于网络中的欺诈账号更容易去排查,可以计算欺诈账号样本在每个社团中的比例,比例较大的,则该社团为异常社团的可能性就越大,可以根据业务需要进行相关调查,再对异常社团内的账号根据一些指标来进行计算,找出具有代表性的账号,对这些具有代表性的账号再进行相关案件排查,其中,一些指标可以是社团内账号信息的度中心性、紧密中心性、特征向量中心性等;或者也可以对社团内的账号信息进行特征再分析,以期发现该社团的一些共同行为的特征,进行有针对性地欺诈预防。此外,如果新加入的账号信息形成新的社团,则可以根据前面查出来的异常社团进行比对,这对于未知欺诈的侦测与预防是大有裨益的。Step S106: Perform community division on each account information according to the feature matching network. Specifically, according to the similarity value between the calculated account information, the closer the similarity value is, the easier it is to be divided into the same community. After dividing the community, it is easier to check the fraudulent accounts in the network, and the proportion of fraudulent account samples in each community can be calculated. If the proportion is larger, the possibility that the community is an abnormal community is greater, and it can be based on business needs. Conduct relevant investigations, and then calculate the accounts in the abnormal community according to some indicators, find out representative accounts, and then conduct related cases investigation on these representative accounts. Some indicators may be account information in the community. Degree centrality, close centrality, feature vector centrality, etc.; or character re-analysis of account information within the community, in order to discover the characteristics of some common behaviors of the association, and conduct targeted fraud prevention. In addition, if the newly added account information forms a new community, it can be compared according to the abnormal community detected earlier, which is beneficial for the detection and prevention of unknown fraud.
计算同一组内的各账号信息之间的相似度,可以以下面两种方法来计算:Calculating the similarity between each account information in the same group can be calculated in the following two ways:
方式1:可选地,计算同一组内的各账号信息之间的相似度,包括:若第i账号信息与第j账号信息位于n类同组中,则将n/m作为第i帐号信息与第j账号信息之间的相似度;第i账号信息与第j账号信息为各账号信息中的任一个,具体来说,在所有账号信息中任意取两个账号信息,比如称为账号信息1与账号信息2,m取3,也就是账号信息1与账号信息2分在了3类中,这3类分别称为第1类、第2类、第3类,假设这两个账号信息在第1类与第3类中同组,那么,这两个账号信息在这3类中的相似度为2/3。Method 1: Optionally, calculating the similarity between each account information in the same group, including: if the i-th account information and the j-th account information are in the same group of n, the n/m is used as the i-th account information. The similarity degree with the j-th account information; the i-th account information and the j-th account information are any one of the account information, specifically, two account information, such as account information, are randomly selected in all the account information. 1 and account information 2, m take 3, that is, account information 1 and account information 2 are in 3 categories, these 3 categories are called the first category, the second category, the third category, assuming these two account information In the same group of the first class and the third class, then the similarity of the two account information in the three categories is 2/3.
方式2:可选地,计算同一组内的各账号信息之间的相似度,包括:若第i账号信息与第j账号信息位于同一组中,统计第i账号信息的哈希向量与第j账号信息的哈希向量中位于同一位且哈希向量值相同的个数h;第i账号信息与第j账号信息为各账号信息中的任一个;第i账号信息与第j账号信息的相似度s=h/K,具体来说,如果所有账号信息中任意的两个账号信息,账号信息1与账号信息2位于同一组,并且账号信息1与账号信息2都是4位的,也就是K为4,账号信息1与账号信息2的4位哈希向量中,前3位是完全相同的,第4位不同,那么,账号信息1与账号信息2的相似度s为3/4。Method 2: Optionally, calculating the similarity between each account information in the same group, including: if the i-th account information and the j-th account information are in the same group, counting the hash vector of the i-th account information and the j-th The number H of the hash information of the account information is the same and the hash vector value is the same; the i-th account information and the j-th account information are any one of the account information; the i-th account information is similar to the j-th account information. Degree s=h/K, specifically, if any two account information in all account information, account information 1 and account information 2 are in the same group, and account information 1 and account information 2 are both 4 digits, that is, K is 4, and the first three digits of the account information 1 and the account information 2 are identical, and the fourth digit is different. Then, the similarity s of the account information 1 and the account information 2 is 3/4.
以上两种计算同一组内各个账号信息之间的相似度的计算方法,可以得出,第1中方法是计算的两个账号信息在各个类中的相似度,而第2种方法是计算的被分到了各类中同一组中的两个账号信息之间的相似度,可以看出,这两种方法中,相比于第2种方法,第1种方法是比较粗略的计算两个账号信息所属的类与类之间的相似度,而第2种计算的两个账号信息在同一组之间的相似度则更精准。不过,这两种方法都相比于现有技术中利用欧式距离公式等来计算网络中所有账号信息中任意两个账号信息之间的相似度的计算量 上得到了明显的改善,进一步加速了网络的建立。The above two methods for calculating the similarity between the account information in the same group can be concluded that the first method is the similarity between the calculated two account information in each class, and the second method is the calculation. The similarity between the two account information in the same group in each category can be seen. In the two methods, compared with the second method, the first method is to roughly calculate two accounts. The similarity between the class and the class to which the information belongs, and the similarity between the two account information calculated in the second category is more accurate. However, both methods calculate the similarity between any two account information in all account information in the network by using the Euclidean distance formula or the like in the prior art. Significant improvements have been made to further accelerate the establishment of the network.
可选地,根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量,包括:根据公式(1)确定每个账号信息对应的K位哈希向量
Figure PCTCN2017105985-appb-000026
Optionally, determining a K-bit hash vector corresponding to each account information according to the preset K hash functions, including: determining a K-bit hash vector corresponding to each account information according to formula (1)
Figure PCTCN2017105985-appb-000026
Figure PCTCN2017105985-appb-000027
Figure PCTCN2017105985-appb-000027
其中,2'b表示
Figure PCTCN2017105985-appb-000028
是一个二进制数,
Figure PCTCN2017105985-appb-000029
是预设的K个哈希函数中的一个,
Where 2'b represents
Figure PCTCN2017105985-appb-000028
Is a binary number,
Figure PCTCN2017105985-appb-000029
Is one of the preset K hash functions,
Figure PCTCN2017105985-appb-000030
Figure PCTCN2017105985-appb-000030
Figure PCTCN2017105985-appb-000031
表示账号信息的特征向量,其中,
Figure PCTCN2017105985-appb-000032
c1,c2…,cd表示账号信息的特征属性,
Figure PCTCN2017105985-appb-000033
表示随机选取的一个非零向量,
Figure PCTCN2017105985-appb-000034
具体来说,预设的哈希函数是
Figure PCTCN2017105985-appb-000035
是预设的K个哈希函数中的任一个,哈希函数
Figure PCTCN2017105985-appb-000036
的值用0或1来表示,也就是说这样的一个哈希函数只能产生两个哈希值,对于数量巨大的账号信息来说明显是不够的,所以根据这样的哈希函数,来确定每个账号的K位哈希向量
Figure PCTCN2017105985-appb-000037
是一个K位的二进制数,比如,可以是6位的二进制数,具体可以为010110,那么,
Figure PCTCN2017105985-appb-000038
Figure PCTCN2017105985-appb-000039
其中,
Figure PCTCN2017105985-appb-000040
表示账号信息的特征向量,
Figure PCTCN2017105985-appb-000041
c1,c2…,cd表示账号信息的特征属性,具体的账号信息特征属性可以是交易金额、交易时间、交易地点、交易地点数、转账地点、转账金额、转账次数等。其中,各账号信息的特征向量在具体实施中可以经过筛选来得到一批理论上效果最好的特征向量,具体地,在一定时间段内抽取欺诈账号信息样本以及正常账号信息样本,将抽取的欺诈账号信息样本以及正常账号信息样本组合为一个整体账号信息样本,根据业务经验进行整体账号信息的数据预处理、特征筛选及属性相关性分析等步骤之后,筛选出一批理论上效果最好的特征向量。根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量,能够充分提取每个账号信息的特征属性并用特征向量表示出来,能够应对复杂的网络中账号信息数量巨大的情况。此外,需要说明的是,第一,每个账号信息对应的K位哈希向量
Figure PCTCN2017105985-appb-000042
的确定实际上是经过一个哈希随机映射的过程得来的,是由
Figure PCTCN2017105985-appb-000043
经过哈希映射得到
Figure PCTCN2017105985-appb-000044
这里使用哈希随机映射的主要目的是使得使得账号信息的特征向量能映射为0或1的统一表示,以便后续处理,而并非简单的降维;第二,原来的特征向量
Figure PCTCN2017105985-appb-000045
映射到新的哈希空间中,会使得在原来的特征向量相似的数据在新的哈希空间中数据也相似的概率很大,这个概率为:
Figure PCTCN2017105985-appb-000046
符合相似度s到概率p 的单调递增映射关系。
Figure PCTCN2017105985-appb-000031
a feature vector indicating account information, wherein
Figure PCTCN2017105985-appb-000032
c 1 , c 2 ..., c d represent the characteristic attributes of the account information,
Figure PCTCN2017105985-appb-000033
Represents a non-zero vector randomly selected,
Figure PCTCN2017105985-appb-000034
Specifically, the default hash function is
Figure PCTCN2017105985-appb-000035
Is any one of the preset K hash functions, the hash function
Figure PCTCN2017105985-appb-000036
The value is represented by 0 or 1. That is to say, such a hash function can only generate two hash values, which is obviously insufficient for a large amount of account information, so it is determined according to such a hash function. K-bit hash vector for each account
Figure PCTCN2017105985-appb-000037
Is a K-bit binary number, for example, can be a 6-bit binary number, specifically 010110, then,
Figure PCTCN2017105985-appb-000038
Figure PCTCN2017105985-appb-000039
among them,
Figure PCTCN2017105985-appb-000040
a feature vector indicating account information,
Figure PCTCN2017105985-appb-000041
c 1 , c 2 ..., c d represent characteristic attributes of the account information, and the specific account information characteristic attributes may be transaction amount, transaction time, transaction place, number of transaction places, transfer place, transfer amount, transfer number, and the like. The feature vector of each account information may be screened to obtain a set of theoretically best feature vectors in a specific implementation. Specifically, the fraud account information sample and the normal account information sample are extracted in a certain period of time, and the extracted The fraudulent account information sample and the normal account information sample are combined into one overall account information sample. After the steps of data preprocessing, feature screening and attribute correlation analysis of the overall account information according to the business experience, a batch of theoretically best results are selected. Feature vector. According to the preset K hash functions, the K-bit hash vector corresponding to each account information is determined, and the feature attribute of each account information can be fully extracted and represented by the feature vector, which can cope with the huge amount of account information in the complex network. Happening. In addition, it should be noted that, first, the K-bit hash vector corresponding to each account information
Figure PCTCN2017105985-appb-000042
The determination is actually obtained through a hash random mapping process.
Figure PCTCN2017105985-appb-000043
After hash mapping
Figure PCTCN2017105985-appb-000044
The main purpose of using hash random mapping here is to enable the feature vector of the account information to be mapped to a uniform representation of 0 or 1, for subsequent processing, rather than simple dimensionality reduction; second, the original feature vector
Figure PCTCN2017105985-appb-000045
Mapping to the new hash space will make the data with similar eigenvectors similar in the new hash space. The probability is:
Figure PCTCN2017105985-appb-000046
A monotonically increasing mapping relationship from the similarity s to the probability p.
以上实施方式中,对于每个账号信息对应的K位哈希向量以及将每个账号信息对应的哈希向量顺序划分为m=K/k类子哈希向量的关系,下面以一个表格的方式将其展示出来,表1示例性地示出了账号信息样本与类之间的关系,如表1所示:In the above embodiment, the K-bit hash vector corresponding to each account information and the hash vector order corresponding to each account information are sequentially divided into m=K/k sub-hash vectors, and the following is a table manner. Shown it, Table 1 exemplarily shows the relationship between the account information sample and the class, as shown in Table 1:
表1:账号信息样本与类之间的关系Table 1: Relationship between account information samples and classes
Figure PCTCN2017105985-appb-000047
Figure PCTCN2017105985-appb-000047
表1中,账号信息样本与类之间的关系可以表示成一个K行N列的矩阵,N表示取的账号信息样本数,c1到cN代表N个账号信息样本,将N个账号信息样本分到m=K/k个类,其中,表格中除第一行之外下面的每一行代表一个类,N个账号信息样本被分到了2k个组内。In Table 1, the relationship between the account information sample and the class can be expressed as a matrix of K rows and N columns, N represents the number of account information samples taken, c 1 to c N represents N account information samples, and N account information is obtained. The samples are divided into m=K/k classes, where each row below the first row represents a class, and N account information samples are divided into 2 k groups.
可选地,根据特征匹配网络,对各账号信息进行社团划分,包括:Optionally, according to the feature matching network, the community information is divided into account information, including:
(1)将各账号信息划分在特征匹配网络中不同的社区中;(1) dividing each account information into different communities in the feature matching network;
(2)根据各账号信息之间的相似度,计算每个账号信息的相似强度,从而生成节点相似强度矩阵;(2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;
(3)针对每个账号信息,从节点相似强度矩阵中账号信息所在的行,按相似强度从 大到小的的顺序尝试将账号信息划至其他社区中;若账号信息自第p社区划分至第q社区后的模块度差为正数,则将账号信息划分至第q社区后结束;(3) For each account information, the line of the account information in the node similarity strength matrix is similarly intensityd. The order of large to small attempts to transfer the account information to other communities; if the module information is positive from the p-th community to the q-th community, the account information is divided into the q-th community and ends;
(4)重复执行,直到社区结构不再改变为止。(4) Repeat until the community structure is no longer changed.
可选地,根据各账号之间的相似度,计算每个账号信息的相似强度,包括:Optionally, calculating the similarity strength of each account information according to the similarity between the accounts, including:
根据公式(2)计算第i账号信息与第j账号信息之间的相似强度si,jCalculating the similarity intensity s i,j between the i-th account information and the j-th account information according to formula (2);
Figure PCTCN2017105985-appb-000048
Figure PCTCN2017105985-appb-000048
其中,Γ(i)表示第i账号信息的邻居集合,Γ(i)∩Γ(j)表示第i账号信息与第j账号信息的共同邻居集合,wai,z为任意账号信息ai与第z账号信息之间的边的权重和。Where Γ(i) represents the neighbor set of the i-th account information, Γ(i)∩Γ(j) represents the common neighbor set of the i-th account information and the j-th account information, w ai,z is any account information ai and the first The weight of the side between the z account information.
具体实施中,第(1)步骤,初始化特征匹配网络,将每个账号信息划分到不同的社区中,这一步骤中的划分可以是随机划分的;第(2)步骤,根据公式(2)来计算各账号信息的相似强度,具体地,假如账号信息1与账号信息2的共同邻居是账号信息3,账号信息1与账号信息2合起来与账号信息3的边的权重是5,那么,任意账号信息ai与账号信息3相连边的权重为5,因而,账号信息1与账号信息2的相似强度是1/5,类似的,其它账号信息之间也是用此方法来计算。假如,取4个账号信息样本,经过计算之后,形成一个4*4的矩阵,假如,这个矩阵为
Figure PCTCN2017105985-appb-000049
从这个矩阵可以看出,账号信息1与账号信息2的相似度为0.25,账号信息1与账号信息3的相似度为0.7,账号信息2与账号信息3的相似度为0.4;第(3)步骤,从这个相似强度矩阵中账号信息所在的行,按相似强度从大到小的的顺序尝试将账号信息划至其他社区中,例如从这个相似矩阵第一行可以看出,想要把账号信息1划分到其它某一社团中时,优先选择相似度较大的账号信息3(第一行中0.6最大)所在的社区中去。如果ΔQ<0,再将账号信息1尝试划分到账号信息4(第一行中0.4次大)所在的社团中去。如果ΔQ<0,则再将账号信息1尝试划分到账号信息2所在的社团中去。如果仍然ΔQ<0,则账号信息1作为一个独立的社团进行保留,矩阵不做更新,再进行第2行的计算。如果上述尝试过程中只要发现ΔQ>0,比如优先尝试的将账号1划分到相似度较大的账号信息3(第一行中0.6最大)所在的社区中去以后,发现ΔQ>0,那么表示尝试成功,第一行计算结束。由于此时账号1的状态已经发生改变,因此将矩阵中第一行第一列所有数据删除,表示后续账号信息不再与账号 信息1进行比较,也就是,
Figure PCTCN2017105985-appb-000050
变成
Figure PCTCN2017105985-appb-000051
然后以同样的过程开始新一轮的尝试计算,即对账号信息2进行社团划分。其中,模块度差ΔQ的计算公式:
Figure PCTCN2017105985-appb-000052
来验证上面对账号信息的尝试划分社区是否正确,其中,n表示网络中所有的权重,ki表示与顶点i连接的边的权重,ki,in表示账号信息i在社区内部的权重之和,Σin表示社区内部的边权重和,Σtot表示与社区内部账号信息连接的边的权重和,包括社区内部的边以及社区外部的边,若ΔQ为正数,则接受本次的划分,若不为正数,则放弃本次的划分。通过账号信息的相似强度矩阵的计算,优先将账号信息划分到与其最相似的邻居账号信息的社团中去,大大节省了社团划分的尝试次数,进一步提高了算法的速度,另外,对账号信息尝试的划分是否合理通过模块度差公式来验证,更加有效保证了尝试划分的合理性与准确性。
In a specific implementation, in step (1), the feature matching network is initialized, and each account information is divided into different communities, and the division in this step may be randomly divided; in step (2), according to formula (2) To calculate the similarity strength of each account information, specifically, if the common neighbor of the account information 1 and the account information 2 is the account information 3, the account information 1 and the account information 2 are combined with the weight of the side of the account information 3 is 5, then, The weight of any account information ai and the account information 3 is 5, and thus, the similarity between the account information 1 and the account information 2 is 1/5. Similarly, other account information is also calculated by this method. If four account information samples are taken, after calculation, a 4*4 matrix is formed. If the matrix is
Figure PCTCN2017105985-appb-000049
It can be seen from this matrix that the similarity between the account information 1 and the account information 2 is 0.25, the similarity between the account information 1 and the account information 3 is 0.7, and the similarity between the account information 2 and the account information 3 is 0.4; (3) Steps, from the row of the account information in the similarity strength matrix, try to transfer the account information to other communities in order of similar strength, for example, from the first line of the similarity matrix, you want to put the account When the information 1 is divided into other communities, the community information of the account information 3 with the similarity (the largest in the first row) is preferentially selected. If ΔQ < 0, the account information 1 is attempted to be divided into the community in which the account information 4 (0.4 times in the first line) is located. If ΔQ < 0, the account information 1 is attempted to be divided into the community in which the account information 2 is located. If ΔQ<0 is still present, the account information 1 is reserved as an independent community, the matrix is not updated, and the calculation of the second line is performed. If ΔQ>0 is found during the above-mentioned attempt, for example, the account 1 is preferentially divided into the community where the account information 3 with the similarity is large (the largest in the first row) is located, and ΔQ>0 is found, then The attempt is successful and the first line of calculation ends. Since the status of the account 1 has changed at this time, all the data in the first column of the first row in the matrix is deleted, indicating that the subsequent account information is no longer compared with the account information 1, that is,
Figure PCTCN2017105985-appb-000050
become
Figure PCTCN2017105985-appb-000051
Then, a new round of trial calculation is started in the same process, that is, the account information 2 is divided into associations. Among them, the calculation formula of the module degree difference ΔQ:
Figure PCTCN2017105985-appb-000052
To verify whether the above-mentioned attempt to divide the account information is correct, where n represents all the weights in the network, k i represents the weight of the edge connected to the vertex i, and k i, in represents the weight of the account information i within the community. And, Σ in indicates the edge weight of the community, Σ tot indicates the weight of the edge connected to the account information inside the community, including the side inside the community and the side outside the community. If ΔQ is a positive number, then the division is accepted. If it is not a positive number, give up this division. Through the calculation of the similarity strength matrix of the account information, the account information is preferentially divided into the community of the neighbor account information that is most similar to it, which greatly saves the number of attempts of the community division, further improves the speed of the algorithm, and further attempts on the account information. Whether the division is reasonable or not is verified by the modularity difference formula, which more effectively ensures the rationality and accuracy of the attempted division.
为了更好的理解本发明技术方案,图2示例性地示出了本发明的整体思路流程图,如图2所示:In order to better understand the technical solution of the present invention, FIG. 2 exemplarily shows a whole schematic flow chart of the present invention, as shown in FIG. 2:
步骤S201:将各账号信息的特征属性通过哈希映射的方法映射为一个多位的哈希映射向量;Step S201: mapping the feature attribute of each account information to a multi-bit hash map vector by using a hash mapping method;
步骤S202:将各账号信息的哈希映射向量进行分类;Step S202: classify the hash map vector of each account information.
步骤S203:对于每个类,将哈希映射向量相同的账号信息划分为一组;Step S203: For each class, divide the same account information of the hash mapping vector into a group;
步骤S204:对每组中的任意两个账号信息进行相似度计算;Step S204: Perform similarity calculation on any two account information in each group;
步骤S205:若每组中的任意两个账号信息的相似度大于阈值,则建立这两个账号信息之间的互连边,边的权重为相似度,从而形成特征匹配网络,其中,形成的特征匹配网络是稀疏的特征匹配网络;Step S205: If the similarity of any two account information in each group is greater than a threshold, the interconnection edge between the two account information is established, and the weight of the edge is similarity, thereby forming a feature matching network, wherein the formed The feature matching network is a sparse feature matching network;
步骤S206:根据特征匹配网络中各账号信息的相似强度矩阵对特征匹配网络进行社团划分。Step S206: Perform community division on the feature matching network according to the similarity strength matrix of each account information in the feature matching network.
与现有技术相比,本发明实施例中,第一,通过随机哈希映射的方法将各账号信息的特征属性映射到一个新的哈希空间中,形成各账号信息的哈希映射向量,对各账号信息的哈希映射向量进行分类,能够在高相似度的账号信息之间建立边,有效避免了大量的任意两个账号信息之间的相似度计算,且高效地为每条边建立了可信的权重值,能够提高后续社团划分的精度与速度;第二,根据各账号信息的相似度建立了特征匹配网络,然后根据 网络中各账号信息的相似强度矩阵对特征匹配网络进行社团划分,不仅可以有效发现异常社团并进行有针对性地措施,同时可以侦测未知的欺诈类型,而且通过相似强度矩阵对对特征匹配网络进行社团划分,即优先将账号信息划分到与其最相似的邻居账号信息的社团中去,大大节省了社团划分尝试的次数,进一步提高了算法的速度;第三,通过形成特征匹配网络,相关账号信息间的相似度作为边的权重被永久存储,即使有较多的新的账号信息进来,也不会对网络中原来的互连边产生影响,仅仅需要将新的账号信息插入到原特征匹配网络中。在向原特征匹配网络图添加新数账号信息的时候,仍然先采用随机哈希映射方法及对各账号信息进行分类,然后与类内的账号信息进行相似度计算,如果该相似度大于阈值,则添加新的边。后续只需要进行计算量较小但是更加精准的社团划分算法即可实现功能。同时,特征匹配网络的结构能更加清晰地展示社团内部及社团间的关联结构,这是传统聚类方法所不能实现的。Compared with the prior art, in the embodiment of the present invention, first, the feature attribute of each account information is mapped into a new hash space by a random hash mapping method to form a hash mapping vector of each account information. The hash map vector of each account information is classified, and an edge can be established between the account information of high similarity, which effectively avoids the calculation of the similarity between a large number of any two account information, and efficiently establishes for each edge. The credible weight value can improve the accuracy and speed of subsequent community division. Secondly, the feature matching network is established according to the similarity of each account information, and then The similarity strength matrix of each account information in the network divides the feature matching network into associations, which not only can effectively detect abnormal communities and carry out targeted measures, but also can detect unknown fraud types, and match the feature matching network through similar strength matrix. For community division, the account information is preferentially divided into the community with the neighbor account information that is most similar to it, which greatly saves the number of community division attempts and further improves the speed of the algorithm. Third, through the formation of feature matching network, related accounts The similarity between the information is permanently stored as the weight of the edge. Even if more new account information comes in, it will not affect the original interconnection edge in the network. It only needs to insert the new account information into the original feature matching. In the network. When adding new account information to the original feature matching network map, the random hash mapping method is first used to classify each account information, and then the similarity calculation is performed with the account information in the class. If the similarity is greater than the threshold, then Add a new edge. Subsequent only need to perform a smaller but more accurate community partitioning algorithm to achieve the function. At the same time, the structure of the feature matching network can more clearly display the association structure within the community and between the communities, which cannot be achieved by the traditional clustering method.
基于相同构思,本发明实施例提供的一种基于特征匹配网络的社团划分装置,如图3所示,该装置包括确定单元301、第一划分单元302、第二划分单元303、计算单元304、形成网络单元305和第三划分单元306。其中:Based on the same concept, a community matching device based on a feature matching network is provided in the embodiment of the present invention. As shown in FIG. 3, the device includes a determining unit 301, a first dividing unit 302, a second dividing unit 303, and a calculating unit 304. A network unit 305 and a third dividing unit 306 are formed. among them:
确定单元301:用于根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;a determining unit 301: configured to determine a K-bit hash vector corresponding to each account information according to the preset K hash functions;
第一划分单元302:用于将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量;The first dividing unit 302 is configured to sequentially divide the hash vector corresponding to each account information into m=K/k sub-hash vectors;
第二划分单元303:用于针对每个类,将子哈希向量相同的账号信息划分为同一组;a second dividing unit 303: configured to divide, for each class, account information with the same sub-hash vector into the same group;
计算单元304:用于计算同一组内的各账号信息之间的相似度;The calculating unit 304 is configured to calculate a similarity between each account information in the same group;
形成网络单元305:用于若各账号信息之间的相似度大于阈值,则在各账号信息之间建立互连边,形成特征匹配网络;Forming a network unit 305: configured to establish an interconnection edge between each account information to form a feature matching network if the similarity between the account information is greater than a threshold;
第三划分单元306:用于根据特征匹配网络,对各账号信息进行社团划分。The third dividing unit 306 is configured to perform community division on each account information according to the feature matching network.
可选地,计算单元304具体用于:Optionally, the calculating unit 304 is specifically configured to:
若第i账号信息与第j账号信息位于n类同组中,则将n/m作为第i帐号信息与第j账号信息之间的相似度;第i账号信息与第j账号信息为各账号信息中的任一个。If the i-th account information and the j-th account information are in the same group of n categories, n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the j-th account information are accounts. Any of the information.
可选地,计算单元304具体还用于:Optionally, the calculating unit 304 is further specifically configured to:
若第i账号信息与第j账号信息位于同一组中,统计第i账号信息的哈希向量与第j账号信息的哈希向量中位于同一位且哈希向量值相同的个数h;第i账号信息与第j账号信息为各账号信息中的任一个;If the i-th account information and the j-th account information are in the same group, the hash number of the i-th account information and the hash vector of the j-th account information are the same number and the hash vector value is the same number h; The account information and the j-th account information are any one of the account information;
第i账号信息与第j账号信息的相似度s=h/K。 The similarity between the i-th account information and the j-th account information is s=h/K.
可选地,确定单元301用于:Optionally, the determining unit 301 is configured to:
根据公式(3)确定每个账号信息对应的K位哈希向量
Figure PCTCN2017105985-appb-000053
Determine the K-bit hash vector corresponding to each account information according to formula (3)
Figure PCTCN2017105985-appb-000053
Figure PCTCN2017105985-appb-000054
Figure PCTCN2017105985-appb-000054
其中,2'b表示
Figure PCTCN2017105985-appb-000055
是一个二进制数,
Figure PCTCN2017105985-appb-000056
是预设的K个哈希函数中的一个,
Where 2'b represents
Figure PCTCN2017105985-appb-000055
Is a binary number,
Figure PCTCN2017105985-appb-000056
Is one of the preset K hash functions,
Figure PCTCN2017105985-appb-000057
Figure PCTCN2017105985-appb-000057
Figure PCTCN2017105985-appb-000058
表示账号信息的特征向量,其中,
Figure PCTCN2017105985-appb-000059
c1,c2…,cd表示账号信息的特征属性,
Figure PCTCN2017105985-appb-000060
表示随机选取的一个非零向量,
Figure PCTCN2017105985-appb-000061
Figure PCTCN2017105985-appb-000058
a feature vector indicating account information, wherein
Figure PCTCN2017105985-appb-000059
c 1 , c 2 ..., c d represent the characteristic attributes of the account information,
Figure PCTCN2017105985-appb-000060
Represents a non-zero vector randomly selected,
Figure PCTCN2017105985-appb-000061
可选地,第三划分单元306具体用于:Optionally, the third dividing unit 306 is specifically configured to:
(1)将各账号信息划分在特征匹配网络中不同的社区中;(1) dividing each account information into different communities in the feature matching network;
(2)根据各账号信息之间的相似度,计算每个账号信息的相似强度,从而生成节点相似强度矩阵;(2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;
(3)针对每个账号信息,从节点相似强度矩阵中账号信息所在的行,按相似强度从大到小的的顺序尝试将账号信息划至其他社区中;若账号信息自第p社区划分至第q社区后的模块度差为正数,则将账号信息划分至第q社区后结束;(3) For each account information, from the row where the account information in the node similarity strength matrix is located, try to assign the account information to other communities in order of similar strength; if the account information is divided from the p community to If the module difference after the qth community is a positive number, the account information is divided into the qth community and ends;
(4)重复执行,直到社区结构不再改变为止。(4) Repeat until the community structure is no longer changed.
可选地,计算单元304具体还用于:Optionally, the calculating unit 304 is further specifically configured to:
根据公式(4)计算第i账号信息与第j账号信息之间的相似强度si,jCalculating the similarity intensity s i,j between the i-th account information and the j-th account information according to formula (4);
Figure PCTCN2017105985-appb-000062
Figure PCTCN2017105985-appb-000062
其中,Γ(i)表示第i账号信息的邻居集合,Γ(i)∩Γ(j)表示第i账号信息与第j账号信息的共同邻居集合,wai,z为任意账号信息ai与第z账号信息之间的边的权重和。Where Γ(i) represents the neighbor set of the i-th account information, Γ(i)∩Γ(j) represents the common neighbor set of the i-th account information and the j-th account information, w ai,z is any account information ai and the first The weight of the side between the z account information.
参见图4,为本发明实施例提供的一种电子设备,该电子设备可应用于本发明的上述实施例。Referring to FIG. 4, an electronic device according to an embodiment of the present invention is applicable to the above embodiment of the present invention.
该电子设备可包括一个或多个处理器410以及存储器420,图4中以一个处理器410为例。The electronic device may include one or more processors 410 and a memory 420, and one processor 410 is exemplified in FIG.
执行基于特征匹配网络的社团划分方法的设备还可以包括:输入装置430和输出装置440。The apparatus for performing the community matching method based on the feature matching network may further include: an input device 430 and an output device 440.
处理器410、存储器420、输入装置430和输出装置440可以通过总线或者其他方式 连接,图4中以通过总线连接为例。The processor 410, the memory 420, the input device 430, and the output device 440 may be through a bus or other means Connection, as shown in Figure 4 by bus connection.
存储器420作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的基于特征匹配网络的社团划分方法对应的程序指令/模块。处理器410通过运行存储在存储器420中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例基于特征匹配网络的社团划分方法。The memory 420 is a non-volatile computer readable storage medium, and can be used for storing a non-volatile software program, a non-volatile computer executable program, and a module, such as a feature matching network-based community partitioning in the embodiment of the present application. The corresponding program instruction/module. The processor 410 executes various functional applications and data processing of the server by running non-volatile software programs, instructions, and modules stored in the memory 420, that is, implementing the community partitioning method based on the feature matching network of the above method embodiments.
存储器420可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据基于特征匹配网络的社团划分装置的使用所创建的数据等。此外,存储器420可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器420可选包括相对于处理器410远程设置的存储器,这些远程存储器可以通过网络连接至基于特征匹配网络的社团划分装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store the creation according to the use of the feature matching network based community division device Data, etc. Moreover, memory 420 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 420 can optionally include memory remotely disposed relative to the processor 410, which can be connected to the feature matching network based community partitioning device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
输入装置430可接收输入的数字或字符信息,以及产生与基于特征匹配网络的社团划分装置的用户设置以及功能控制有关的键信号输入。输出装置440可包括显示屏等显示设备。The input device 430 can receive the input digital or character information and generate key signal inputs related to user settings and function control of the feature matching network based community partitioning device. Output device 440 can include a display device such as a display screen.
所述一个或者多个模块存储在所述存储器420中,当被所述一个或者多个处理器410执行时,执行上述任意方法实施例中的基于特征匹配网络的社团划分方法。The one or more modules are stored in the memory 420, and when executed by the one or more processors 410, perform a feature matching network based community partitioning method in any of the above method embodiments.
处理器410,被配置了一个或多个可执行程序,所述一个或多个可执行程序用于执行以下过程:根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量;针对每个类,将子哈希向量相同的账号信息划分为同一组;计算同一组内的各账号信息之间的相似度;若所述各账号信息之间的相似度大于阈值,则在所述各账号信息之间建立互连边,形成特征匹配网络;根据所述特征匹配网络,对所述各账号信息进行社团划分。The processor 410 is configured with one or more executable programs, and the one or more executable programs are configured to perform the following process: determining K bits corresponding to each account information according to preset K hash functions. a hash vector; the hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors; for each class, the same account information of the sub-hash vectors is divided into the same group; a similarity between each account information; if the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network; matching the network according to the feature , the community division of each account information.
较佳地,处理器410具体用于:Preferably, the processor 410 is specifically configured to:
若第i账号信息与第j账号信息位于n类同组中,则将n/m作为所述第i帐号信息与所述第j账号信息之间的相似度;所述第i账号信息与所述第j账号信息为所述各账号信息中的任一个。If the i-th account information and the j-th account information are in the same group of n, the n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the The j-th account information is any one of the account information.
较佳地,处理器410具体用于:Preferably, the processor 410 is specifically configured to:
若第i账号信息与第j账号信息位于同一组中,统计所述第i账号信息的哈希向量与所述第j账号信息的哈希向量中位于同一位且哈希向量值相同的个数h;所述第i账号信息与 所述第j账号信息为所述各账号信息中的任一个;If the i-th account information and the j-th account information are in the same group, the number of the hash vector of the i-th account information and the hash vector of the j-th account information are the same and the hash vector value is the same. h; the i-th account information and The jth account information is any one of the account information;
所述第i账号信息与所述第j账号信息的相似度s=h/K。The similarity between the i-th account information and the j-th account information is s=h/K.
较佳地,处理器410具体用于:Preferably, the processor 410 is specifically configured to:
根据公式(1)确定所述每个账号信息对应的K位哈希向量
Figure PCTCN2017105985-appb-000063
Determining a K-bit hash vector corresponding to each account information according to formula (1)
Figure PCTCN2017105985-appb-000063
Figure PCTCN2017105985-appb-000064
Figure PCTCN2017105985-appb-000064
其中,2'b表示
Figure PCTCN2017105985-appb-000065
是一个二进制数,
Figure PCTCN2017105985-appb-000066
是预设的K个哈希函数中的一个,
Where 2'b represents
Figure PCTCN2017105985-appb-000065
Is a binary number,
Figure PCTCN2017105985-appb-000066
Is one of the preset K hash functions,
Figure PCTCN2017105985-appb-000067
Figure PCTCN2017105985-appb-000067
Figure PCTCN2017105985-appb-000068
表示账号信息的特征向量,其中,
Figure PCTCN2017105985-appb-000069
c1,c2…,cd表示账号信息的特征属性,
Figure PCTCN2017105985-appb-000070
表示随机选取的一个非零向量,
Figure PCTCN2017105985-appb-000071
Figure PCTCN2017105985-appb-000068
a feature vector indicating account information, wherein
Figure PCTCN2017105985-appb-000069
c 1 , c 2 ..., c d represent the characteristic attributes of the account information,
Figure PCTCN2017105985-appb-000070
Represents a non-zero vector randomly selected,
Figure PCTCN2017105985-appb-000071
较佳地,处理器410具体用于:Preferably, the processor 410 is specifically configured to:
(1)将各账号信息划分在所述特征匹配网络中不同的社区中;(1) dividing each account information into different communities in the feature matching network;
(2)根据各账号信息之间的相似度,计算每个账号信息的相似强度,从而生成节点相似强度矩阵;(2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;
(3)针对每个账号信息,从所述节点相似强度矩阵中所述账号信息所在的行,按相似强度从大到小的的顺序尝试将所述账号信息划至其他社区中;若所述账号信息自第p社区划分至第q社区后的模块度差为正数,则将所述账号信息划分至第q社区后结束;(3) for each account information, from the row in which the account information is located in the node similarity strength matrix, try to classify the account information into other communities in order of similar strength; After the account information is divided into a positive number from the pth community to the qth community, the account information is divided into the qth community and ends;
(4)重复执行,直到社区结构不再改变为止。(4) Repeat until the community structure is no longer changed.
较佳地,处理器410具体用于:Preferably, the processor 410 is specifically configured to:
根据公式(2)计算所述第i账号信息与所述第j账号信息之间的相似强度si,jCalculating a similarity intensity s i,j between the i-th account information and the j-th account information according to formula (2);
Figure PCTCN2017105985-appb-000072
Figure PCTCN2017105985-appb-000072
其中,Γ(i)表示所述第i账号信息的邻居集合,Γ(i)∩Γ(j)表示所述第i账号信息与所述第j账号信息的共同邻居集合,wai,z为任意账号信息ai与第z账号信息之间的边的权重和。Where Γ(i) represents a neighbor set of the i-th account information, and Γ(i)∩Γ(j) represents a common neighbor set of the i-th account information and the j-th account information, w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
从上述内容可看出:本发明实施例中提供一种基于特征匹配网络的社团划分装置,根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;将每个账号信息对应的哈希向量,顺序划分为类子哈希向量;针对每个类,将子哈希向量相同的账号信息划分为同一组;计算同一组内的各账号信息之间的相似度;若各账号信息之间的相似度大于阈 值,则在各账号信息之间建立互连边,形成特征匹配网络;根据特征匹配网络,对各账号信息进行社团划分根据各账号信息之间的相似度,对各账号信息进行社团划分。本发明实施例中首先通过根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量,对于网络中数量巨大的账号信息来说,仅仅产生两个哈希值的哈希函数是不够的,因此确定每个账号信息对应的K位哈希向量能够应对复杂的网络账号信息。然后针对每个类,将子哈希向量相同的账号信息划分为一组,计算同一组内任意账号信息之间的相似度,能够避免针对整个网络中任意账号信息之间计算相似度而带来的计算量非常大的缺点;本发明技术方案能够有效减少账号信息之间相似度的计算量,仅仅计算同一组内的账号信息之间的相似度。最后根据确定各账号信息之间的相似度大于阈值,在各账号信息之间建立互连边,形成特征匹配网络;根据特征匹配网络,对各账号信息进行社团划分,能够更精准的对各账号信息进行社团划分,这样不仅能够使社团之间的关联关系很清楚,而且能够对划分的社团进行分析,找出异常社团,进而对异常社团内的账号进行异常账号排查,更加有针对性地找出欺诈账号,提高应对欺诈账号的效率。此外,如果需要对划分出的社团添加账号信息,只需要对该添加的账号信息重复以上简单的几个步骤,将所添加的账号信息更新到相应的位置即可,并不会产生更新困难的问题。It can be seen from the above that: in the embodiment of the present invention, a community division device based on a feature matching network is provided, and a K-bit hash vector corresponding to each account information is determined according to a preset K hash function; The hash vector corresponding to the account information is sequentially divided into class sub-hash vectors; for each class, the account information with the same sub-hash vector is divided into the same group; and the similarity between the account information in the same group is calculated; If the similarity between each account information is greater than the threshold For the value, an interconnection edge is established between each account information to form a feature matching network. According to the feature matching network, the community information of each account information is divided according to the similarity between the account information, and the account information is divided into associations. In the embodiment of the present invention, the K-bit hash vector corresponding to each account information is first determined according to the preset K hash functions. For a large number of account information in the network, only two hash values are generated. The Greek function is not enough, so it is determined that the K-bit hash vector corresponding to each account information can cope with complex network account information. Then, for each class, the same account information of the sub-hash vector is divided into a group, and the similarity between any account information in the same group is calculated, which can avoid the calculation of similarity between any account information in the entire network. The technical solution of the present invention can effectively reduce the calculation of the similarity between the account information, and only calculate the similarity between the account information in the same group. Finally, according to the determination that the similarity between the account information is greater than the threshold, an interconnection edge is established between the account information to form a feature matching network; according to the feature matching network, the account information is divided into groups, which can more accurately target each account. The information is divided into associations, which not only makes the association relationship between the associations clear, but also analyzes the classified associations, finds out the abnormal associations, and then performs abnormal account checking on the accounts in the abnormal associations, and more specifically looks for them. Raise fraudulent accounts and improve the efficiency of responding to fraudulent accounts. In addition, if you need to add account information to the classified community, you only need to repeat the above simple steps for the added account information, and update the added account information to the corresponding location, and it will not cause update difficulties. problem.
本领域内的技术人员应明白,本发明的实施例可提供为方法、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, or a computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。 The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While the preferred embodiment of the invention has been described, it will be understood that Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。 It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Claims (15)

  1. 一种基于特征匹配网络的社团划分方法,其特征在于,包括:A community division method based on feature matching network, which is characterized in that it comprises:
    根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;Determining a K-bit hash vector corresponding to each account information according to a preset K hash function;
    将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量;The hash vector corresponding to each account information is sequentially divided into m=K/k sub-hash vectors;
    针对每个类,将子哈希向量相同的账号信息划分为同一组;For each class, the same account information of the sub-hash vector is divided into the same group;
    计算同一组内的各账号信息之间的相似度;Calculate the similarity between each account information in the same group;
    若所述各账号信息之间的相似度大于阈值,则在所述各账号信息之间建立互连边,形成特征匹配网络;If the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network;
    根据所述特征匹配网络,对所述各账号信息进行社团划分。And performing community division on each account information according to the feature matching network.
  2. 如权利要求1所述的方法,其特征在于,计算同一组内的各账号信息之间的相似度,包括:The method according to claim 1, wherein calculating the similarity between each account information in the same group comprises:
    若第i账号信息与第j账号信息位于n类同组中,则将n/m作为所述第i帐号信息与所述第j账号信息之间的相似度;所述第i账号信息与所述第j账号信息为所述各账号信息中的任一个。If the i-th account information and the j-th account information are in the same group of n, the n/m is used as the similarity between the i-th account information and the j-th account information; the i-th account information and the The j-th account information is any one of the account information.
  3. 如权利要求1所述的方法,其特征在于,计算同一组内的各账号信息之间的相似度,包括:The method according to claim 1, wherein calculating the similarity between each account information in the same group comprises:
    若第i账号信息与第j账号信息位于同一组中,统计所述第i账号信息的哈希向量与所述第j账号信息的哈希向量中位于同一位且哈希向量值相同的个数h;所述第i账号信息与所述第j账号信息为所述各账号信息中的任一个;If the i-th account information and the j-th account information are in the same group, the number of the hash vector of the i-th account information and the hash vector of the j-th account information are the same and the hash vector value is the same. h; the i-th account information and the j-th account information are any one of the account information;
    所述第i账号信息与所述第j账号信息的相似度s=h/K。The similarity between the i-th account information and the j-th account information is s=h/K.
  4. 如权利要求1所述的方法,其特征在于,根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量,包括:The method of claim 1, wherein the K-bit hash vector corresponding to each account information is determined according to a preset K hash function, including:
    根据公式(1)确定所述每个账号信息对应的K位哈希向量
    Figure PCTCN2017105985-appb-100001
    Determining a K-bit hash vector corresponding to each account information according to formula (1)
    Figure PCTCN2017105985-appb-100001
    Figure PCTCN2017105985-appb-100002
    Figure PCTCN2017105985-appb-100002
    其中,2'b表示
    Figure PCTCN2017105985-appb-100003
    是一个二进制数,
    Figure PCTCN2017105985-appb-100004
    是预设的K个哈希函数中的一个,
    Where 2'b represents
    Figure PCTCN2017105985-appb-100003
    Is a binary number,
    Figure PCTCN2017105985-appb-100004
    Is one of the preset K hash functions,
    Figure PCTCN2017105985-appb-100005
    Figure PCTCN2017105985-appb-100005
    Figure PCTCN2017105985-appb-100006
    表示账号信息的特征向量,其中,
    Figure PCTCN2017105985-appb-100007
    c1,c2…,cd表示账号信息的特征属性,
    Figure PCTCN2017105985-appb-100008
    表示随机选取的一个非零向量,
    Figure PCTCN2017105985-appb-100009
    Figure PCTCN2017105985-appb-100006
    a feature vector indicating account information, wherein
    Figure PCTCN2017105985-appb-100007
    c 1 , c 2 ..., c d represent the characteristic attributes of the account information,
    Figure PCTCN2017105985-appb-100008
    Represents a non-zero vector randomly selected,
    Figure PCTCN2017105985-appb-100009
  5. 如权利要求1至4任一项所述的方法,其特征在于,根据所述特征匹配网络,对所述各账号信息进行社团划分,包括:The method according to any one of claims 1 to 4, wherein the community information is divided into the account information according to the feature matching network, including:
    (1)将各账号信息划分在所述特征匹配网络中不同的社区中;(1) dividing each account information into different communities in the feature matching network;
    (2)根据各账号信息之间的相似度,计算每个账号信息的相似强度,从而生成节点相似强度矩阵;(2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;
    (3)针对每个账号信息,从所述节点相似强度矩阵中所述账号信息所在的行,按相似强度从大到小的的顺序尝试将所述账号信息划至其他社区中;若所述账号信息自第p社区划分至第q社区后的模块度差为正数,则将所述账号信息划分至第q社区后结束;(3) for each account information, from the row in which the account information is located in the node similarity strength matrix, try to classify the account information into other communities in order of similar strength; After the account information is divided into a positive number from the pth community to the qth community, the account information is divided into the qth community and ends;
    (4)重复执行,直到社区结构不再改变为止。(4) Repeat until the community structure is no longer changed.
  6. 如权利要求5所述的方法,其特征在于,所述根据各账号信息之间的相似度,计算每个账号信息的相似强度,包括:The method according to claim 5, wherein the calculating the similarity strength of each account information according to the similarity between the account information includes:
    根据公式(2)计算所述第i账号信息与所述第j账号信息之间的相似强度si,jCalculating a similarity intensity s i,j between the i-th account information and the j-th account information according to formula (2);
    Figure PCTCN2017105985-appb-100010
    其中,w(z)=wai,z  公式(2)
    Figure PCTCN2017105985-appb-100010
    Where w(z)=w ai,z formula (2)
    其中,Γ(i)表示所述第i账号信息的邻居集合,Γ(i)∩Γ(j)表示所述第i账号信息与所述第j账号信息的共同邻居集合,wai,z为任意账号信息ai与第z账号信息之间的边的权重和。Where Γ(i) represents a neighbor set of the i-th account information, and Γ(i)∩Γ(j) represents a common neighbor set of the i-th account information and the j-th account information, w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
  7. 一种基于特征匹配网络的社团划分装置,其特征在于,包括:A community division device based on a feature matching network, comprising:
    确定单元,用于根据预设的K个哈希函数,确定每个账号信息对应的K位哈希向量;a determining unit, configured to determine a K-bit hash vector corresponding to each account information according to a preset K hash function;
    第一划分单元,用于将每个账号信息对应的哈希向量,顺序划分为m=K/k类子哈希向量;a first dividing unit, configured to sequentially divide a hash vector corresponding to each account information into m=K/k sub-hash vectors;
    第二划分单元,用于针对每个类,将子哈希向量相同的账号信息划分为同一组;a second dividing unit, configured to divide account information of the same sub-hash vector into the same group for each class;
    计算单元,用于计算同一组内的各账号信息之间的相似度;a calculating unit, configured to calculate a similarity between each account information in the same group;
    形成网络单元,用于若所述各账号信息之间的相似度大于阈值,则在所述各账号信息之间建立互连边,形成特征匹配网络;Forming a network unit, if the similarity between the account information is greater than a threshold, establishing an interconnection edge between the account information to form a feature matching network;
    第三划分单元,用于根据所述特征匹配网络,对所述各账号信息进行社团划分。The third dividing unit is configured to perform community division on the account information according to the feature matching network.
  8. 如权利要求7所述的装置,其特征在于,The device of claim 7 wherein:
    所述计算单元,具体用于若第i账号信息与第j账号信息位于n类同组中,则将n/m作为所述第i帐号信息与所述第j账号信息之间的相似度;所述第i账号信息与所述第j账号信息为所述各账号信息中的任一个。 The calculating unit is specifically configured to use n/m as the similarity between the i-th account information and the j-th account information if the i-th account information and the j-th account information are in the same group of n categories; The i-th account information and the j-th account information are any one of the account information.
  9. 如权利要求7所述的装置,其特征在于,The device of claim 7 wherein:
    所述计算单元,具体还用于若第i账号信息与第j账号信息位于同一组中,统计所述第i账号信息的哈希向量与所述第j账号信息的哈希向量中位于同一位且哈希向量值相同的个数h;所述第i账号信息与所述第j账号信息为所述各账号信息中的任一个;The calculating unit is further configured to: if the i-th account information and the j-th account information are in the same group, the hash vector of the i-th account information is located in the same place as the hash vector of the j-th account information. And the hash vector value is the same number h; the i-th account information and the j-th account information are any one of the account information;
    所述第i账号信息与所述第j账号信息的相似度s=h/K。The similarity between the i-th account information and the j-th account information is s=h/K.
  10. 如权利要求7所述的装置,其特征在于,所述确定单元,用于根据公式(3)确定所述每个账号信息对应的K位哈希向量
    Figure PCTCN2017105985-appb-100011
    The apparatus according to claim 7, wherein said determining unit is configured to determine a K-bit hash vector corresponding to each account information according to formula (3)
    Figure PCTCN2017105985-appb-100011
    Figure PCTCN2017105985-appb-100012
    Figure PCTCN2017105985-appb-100012
    其中,2'b表示
    Figure PCTCN2017105985-appb-100013
    是一个二进制数,
    Figure PCTCN2017105985-appb-100014
    是预设的K个哈希函数中的一个,
    Where 2'b represents
    Figure PCTCN2017105985-appb-100013
    Is a binary number,
    Figure PCTCN2017105985-appb-100014
    Is one of the preset K hash functions,
    Figure PCTCN2017105985-appb-100015
    Figure PCTCN2017105985-appb-100015
    Figure PCTCN2017105985-appb-100016
    表示账号信息的特征向量,其中,
    Figure PCTCN2017105985-appb-100017
    c1,c2…,cd表示账号信息的特征属性,
    Figure PCTCN2017105985-appb-100018
    表示随机选取的一个非零向量,
    Figure PCTCN2017105985-appb-100019
    Figure PCTCN2017105985-appb-100016
    a feature vector indicating account information, wherein
    Figure PCTCN2017105985-appb-100017
    c 1 , c 2 ..., c d represent the characteristic attributes of the account information,
    Figure PCTCN2017105985-appb-100018
    Represents a non-zero vector randomly selected,
    Figure PCTCN2017105985-appb-100019
  11. 如权利要求7至10任一项所述的装置,其特征在于,A device according to any one of claims 7 to 10, characterized in that
    所述第三划分单元,具体用于(1)将各账号信息划分在所述特征匹配网络中不同的社区中;The third dividing unit is specifically configured to: (1) divide each account information into different communities in the feature matching network;
    (2)根据各账号信息之间的相似度,计算每个账号信息的相似强度,从而生成节点相似强度矩阵;(2) calculating the similarity strength of each account information according to the similarity between the account information, thereby generating a node similarity intensity matrix;
    (3)针对每个账号信息,从所述节点相似强度矩阵中所述账号信息所在的行,按相似强度从大到小的的顺序尝试将所述账号信息划至其他社区中;若所述账号信息自第p社区划分至第q社区后的模块度差为正数,则将所述账号信息划分至第q社区后结束;(3) for each account information, from the row in which the account information is located in the node similarity strength matrix, try to classify the account information into other communities in order of similar strength; After the account information is divided into a positive number from the pth community to the qth community, the account information is divided into the qth community and ends;
    (4)重复执行,直到社区结构不再改变为止。(4) Repeat until the community structure is no longer changed.
  12. 如权利要求11所述的装置,其特征在于,The device of claim 11 wherein:
    所述计算单元,具体还用于根据公式(4)计算所述第i账号信息与所述第j账号信息之间的相似强度si,jThe calculating unit is further configured to calculate a similarity intensity s i,j between the i-th account information and the j-th account information according to formula (4);
    Figure PCTCN2017105985-appb-100020
    其中,w(z)=wai,z  公式(4)
    Figure PCTCN2017105985-appb-100020
    Where w(z)=w ai,z formula (4)
    其中,Γ(i)表示所述第i账号信息的邻居集合,Γ(i)∩Γ(j)表示所述第i账号信息与所 述第j账号信息的共同邻居集合,wai,z为任意账号信息ai与第z账号信息之间的边的权重和。Where Γ(i) represents a neighbor set of the i-th account information, and Γ(i)∩Γ(j) represents a common neighbor set of the i-th account information and the j-th account information, w ai,z is The sum of the weights of the edges between any account information ai and the z-th account information.
  13. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1至6任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6. method.
  14. 一种非易失性计算机存储介质,所述非易失性计算机存储介质用于存储有计算机可执行指令,所述计算机可执行指令用于使所述计算机执行权利要求1至6任一项所述的方法。A non-volatile computer storage medium for storing computer-executable instructions for causing the computer to perform any of claims 1 to 6 The method described.
  15. 一种计算机程序产品,所述计算机程序产品包括存储在非易失性计算机存储介质上计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行权利要求1至6任一项所述的方法。 A computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to execute a claim The method of any of 1 to 6.
PCT/CN2017/105985 2016-12-06 2017-10-13 Method and apparatus for grouping communities on the basis of feature matching network, and electronic device WO2018103456A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611110731.7A CN106709800B (en) 2016-12-06 2016-12-06 Community division method and device based on feature matching network
CN201611110731.7 2016-12-06

Publications (1)

Publication Number Publication Date
WO2018103456A1 true WO2018103456A1 (en) 2018-06-14

Family

ID=58937536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/105985 WO2018103456A1 (en) 2016-12-06 2017-10-13 Method and apparatus for grouping communities on the basis of feature matching network, and electronic device

Country Status (3)

Country Link
CN (1) CN106709800B (en)
TW (1) TWI662421B (en)
WO (1) WO2018103456A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598509A (en) * 2018-10-17 2019-04-09 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN109816535A (en) * 2018-12-13 2019-05-28 中国平安财产保险股份有限公司 Cheat recognition methods, device, computer equipment and storage medium
CN109859054A (en) * 2018-12-13 2019-06-07 平安科技(深圳)有限公司 Network community method for digging, device, computer equipment and storage medium
CN110046929A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110297948A (en) * 2019-05-28 2019-10-01 阿里巴巴集团控股有限公司 Relational network construction method and device
CN111292171A (en) * 2020-02-28 2020-06-16 中国工商银行股份有限公司 Financial product pushing method and device
CN111343012A (en) * 2020-02-17 2020-06-26 平安科技(深圳)有限公司 Cache server deployment method and device of cloud platform and computer equipment
CN111666501A (en) * 2020-06-30 2020-09-15 腾讯科技(深圳)有限公司 Abnormal community identification method and device, computer equipment and storage medium
CN112926991A (en) * 2021-03-30 2021-06-08 顶象科技有限公司 Cascade group severity grade dividing method and system
CN115001971A (en) * 2022-04-14 2022-09-02 西安交通大学 Virtual network mapping method for improving community discovery under heaven-earth integrated information network

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709800B (en) * 2016-12-06 2020-08-11 中国银联股份有限公司 Community division method and device based on feature matching network
CN107194623B (en) * 2017-07-20 2021-01-05 深圳市分期乐网络科技有限公司 Group partner fraud discovery method and device
CN107871277B (en) * 2017-07-25 2021-04-13 平安普惠企业管理有限公司 Server, client relationship mining method and computer readable storage medium
CN110019193B (en) * 2017-09-25 2022-10-14 腾讯科技(深圳)有限公司 Similar account number identification method, device, equipment, system and readable medium
CN108295476B (en) * 2018-03-06 2021-12-28 网易(杭州)网络有限公司 Method and device for determining abnormal interaction account
CN110227268B (en) * 2018-03-06 2022-06-07 腾讯科技(深圳)有限公司 Method and device for detecting illegal game account
CN108829769B (en) * 2018-05-29 2021-08-06 创新先进技术有限公司 Suspicious group discovery method and device
CN109191107A (en) * 2018-06-29 2019-01-11 阿里巴巴集团控股有限公司 Transaction abnormality recognition method, device and equipment
CN109559218A (en) * 2018-11-07 2019-04-02 北京先进数通信息技术股份公司 A kind of determination method, apparatus traded extremely and storage medium
CN110688540B (en) * 2019-10-08 2022-06-10 腾讯科技(深圳)有限公司 Cheating account screening method, device, equipment and medium
CN113034296B (en) * 2019-12-24 2023-09-22 腾讯科技(深圳)有限公司 User account selection method, device, computer equipment and storage medium
CN111444454B (en) * 2020-03-24 2023-05-05 哈尔滨工程大学 Dynamic community division method based on spectrum method
CN111552842A (en) * 2020-03-30 2020-08-18 贝壳技术有限公司 Data processing method, device and storage medium
CN111784528A (en) * 2020-05-27 2020-10-16 平安科技(深圳)有限公司 Abnormal community detection method and device, computer equipment and storage medium
CN112149000B (en) * 2020-09-09 2021-12-17 浙江工业大学 Online social network user community discovery method based on network embedding
CN113761080A (en) * 2021-04-01 2021-12-07 京东城市(北京)数字科技有限公司 Community division method, device, equipment and storage medium
CN113326178A (en) * 2021-06-22 2021-08-31 北京奇艺世纪科技有限公司 Abnormal account number propagation method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090281971A1 (en) * 2008-05-09 2009-11-12 International Business Machines Corporation System and method for classifying data streams with very large cardinality
EP2611101A1 (en) * 2011-12-29 2013-07-03 Verisign, Inc. Systems and methods for detecting similarities in network traffic
CN106095813A (en) * 2016-05-31 2016-11-09 北京奇艺世纪科技有限公司 A kind of identification method of user identifier and device
CN106709800A (en) * 2016-12-06 2017-05-24 中国银联股份有限公司 Community partitioning method and device based on characteristic matching network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6996273B2 (en) * 2001-04-24 2006-02-07 Microsoft Corporation Robust recognizer of perceptually similar content
US8086605B2 (en) * 2005-06-28 2011-12-27 Yahoo! Inc. Search engine with augmented relevance ranking by community participation
CN103999082B (en) * 2011-12-19 2017-09-12 国际商业机器公司 Method, computer program and computer for detecting the community in social media
WO2013178286A1 (en) * 2012-06-01 2013-12-05 Qatar Foundation A method for processing a large-scale data set, and associated apparatus
US20150120583A1 (en) * 2013-10-25 2015-04-30 The Mitre Corporation Process and mechanism for identifying large scale misuse of social media networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090281971A1 (en) * 2008-05-09 2009-11-12 International Business Machines Corporation System and method for classifying data streams with very large cardinality
EP2611101A1 (en) * 2011-12-29 2013-07-03 Verisign, Inc. Systems and methods for detecting similarities in network traffic
CN106095813A (en) * 2016-05-31 2016-11-09 北京奇艺世纪科技有限公司 A kind of identification method of user identifier and device
CN106709800A (en) * 2016-12-06 2017-05-24 中国银联股份有限公司 Community partitioning method and device based on characteristic matching network

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598509B (en) * 2018-10-17 2023-09-01 创新先进技术有限公司 Identification method and device for risk group partner
CN109598509A (en) * 2018-10-17 2019-04-09 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN109816535A (en) * 2018-12-13 2019-05-28 中国平安财产保险股份有限公司 Cheat recognition methods, device, computer equipment and storage medium
CN109859054A (en) * 2018-12-13 2019-06-07 平安科技(深圳)有限公司 Network community method for digging, device, computer equipment and storage medium
CN109859054B (en) * 2018-12-13 2024-03-05 平安科技(深圳)有限公司 Network community mining method and device, computer equipment and storage medium
CN110046929B (en) * 2019-03-12 2023-06-20 平安科技(深圳)有限公司 Fraudulent party identification method and device, readable storage medium and terminal equipment
CN110046929A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110297948B (en) * 2019-05-28 2023-08-22 创新先进技术有限公司 Relational network construction method and device
CN110297948A (en) * 2019-05-28 2019-10-01 阿里巴巴集团控股有限公司 Relational network construction method and device
CN111343012A (en) * 2020-02-17 2020-06-26 平安科技(深圳)有限公司 Cache server deployment method and device of cloud platform and computer equipment
CN111292171B (en) * 2020-02-28 2023-06-27 中国工商银行股份有限公司 Financial product pushing method and device
CN111292171A (en) * 2020-02-28 2020-06-16 中国工商银行股份有限公司 Financial product pushing method and device
CN111666501A (en) * 2020-06-30 2020-09-15 腾讯科技(深圳)有限公司 Abnormal community identification method and device, computer equipment and storage medium
CN111666501B (en) * 2020-06-30 2024-04-12 腾讯科技(深圳)有限公司 Abnormal community identification method, device, computer equipment and storage medium
CN112926991A (en) * 2021-03-30 2021-06-08 顶象科技有限公司 Cascade group severity grade dividing method and system
CN112926991B (en) * 2021-03-30 2024-04-30 中国银联股份有限公司 Method and system for grading severity level of cash-out group
CN115001971A (en) * 2022-04-14 2022-09-02 西安交通大学 Virtual network mapping method for improving community discovery under heaven-earth integrated information network

Also Published As

Publication number Publication date
TW201822022A (en) 2018-06-16
CN106709800B (en) 2020-08-11
CN106709800A (en) 2017-05-24
TWI662421B (en) 2019-06-11

Similar Documents

Publication Publication Date Title
WO2018103456A1 (en) Method and apparatus for grouping communities on the basis of feature matching network, and electronic device
Benchaji et al. Enhanced credit card fraud detection based on attention mechanism and LSTM deep model
JP7102344B2 (en) Machine learning model modeling methods and devices
US11594053B2 (en) Deep-learning-based identification card authenticity verification apparatus and method
US11403643B2 (en) Utilizing a time-dependent graph convolutional neural network for fraudulent transaction identification
US20210081798A1 (en) Neural network method and apparatus
KR20190025005A (en) Method and device for controlling data risk
CN108734380A (en) Adventure account determination method, device and computing device
US11538044B2 (en) System and method for generation of case-based data for training machine learning classifiers
CN110084609B (en) Transaction fraud behavior deep detection method based on characterization learning
CN110570312B (en) Sample data acquisition method and device, computer equipment and readable storage medium
CN113657896A (en) Block chain transaction topological graph analysis method and device based on graph neural network
Aseffa et al. Ethiopian banknote recognition using convolutional neural network and its prototype development using embedded platform
CN116307671A (en) Risk early warning method, risk early warning device, computer equipment and storage medium
Aburbeian et al. Credit card fraud detection using enhanced random forest classifier for imbalanced data
CN112750038B (en) Transaction risk determination method, device and server
CN112966728A (en) Transaction monitoring method and device
Hemachandran et al. Performance analysis of k-nearest neighbor classification algorithms for bank loan sectors
US11989733B2 (en) Multi-model system for electronic transaction authorization and fraud detection
Talekar et al. Credit Card Fraud Detection System: A Survey
CN115907954A (en) Account identification method and device, computer equipment and storage medium
CN114140246A (en) Model training method, fraud transaction identification method, device and computer equipment
CN116861226A (en) Data processing method and related device
Xiao et al. Explainable fraud detection for few labeled time series data
CN114792007A (en) Code detection method, device, equipment, storage medium and computer program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17878551

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17878551

Country of ref document: EP

Kind code of ref document: A1