CN111598713A - Cluster recognition method and device based on similarity weight updating and electronic equipment - Google Patents

Cluster recognition method and device based on similarity weight updating and electronic equipment Download PDF

Info

Publication number
CN111598713A
CN111598713A CN202010724429.0A CN202010724429A CN111598713A CN 111598713 A CN111598713 A CN 111598713A CN 202010724429 A CN202010724429 A CN 202010724429A CN 111598713 A CN111598713 A CN 111598713A
Authority
CN
China
Prior art keywords
user
sub
black seed
users
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010724429.0A
Other languages
Chinese (zh)
Other versions
CN111598713B (en
Inventor
宋孟楠
苏绥绥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qiyu Information Technology Co Ltd
Original Assignee
Beijing Qiyu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qiyu Information Technology Co Ltd filed Critical Beijing Qiyu Information Technology Co Ltd
Priority to CN202010724429.0A priority Critical patent/CN111598713B/en
Publication of CN111598713A publication Critical patent/CN111598713A/en
Application granted granted Critical
Publication of CN111598713B publication Critical patent/CN111598713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device and electronic equipment for identifying a group based on similarity weight updating, wherein the method comprises the following steps: acquiring a black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network; updating the weight of the observation dimension Si according to the relationship between the observation dimensions Si in the black seed user sub-relationship graph; determining the similarity of the users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si; and determining risk groups according to the similarity of the users. The similarity between the users is determined by updating the weight of the observation dimension Si in the black seed sub-user sub-relationship graph in real time, and the accuracy of similarity calculation in the group partner identification is improved, so that the risk group can be identified timely and accurately, the business wind control requirement is met, and the economic loss of enterprises is reduced.

Description

Cluster recognition method and device based on similarity weight updating and electronic equipment
Technical Field
The invention relates to the technical field of computer information processing, in particular to a method and a device for identifying a group based on similarity weight updating, electronic equipment and a computer readable medium.
Background
Due to the rapid development of the internet and the popularization of intelligent terminals, people can transact a plurality of services such as online shopping, online transfer, online loan and the like through the network without leaving home. Meanwhile, in order to earn interests, lawless persons are rampant about the behavior of ganging up and cheating by forging false information by other persons.
Group fraud causes greater economic loss to internet enterprises than personal fraud, and therefore how to identify and avoid group fraud so as to reduce economic loss is a problem to be solved urgently by internet enterprises.
Disclosure of Invention
The invention aims to solve the technical problem that the existing network technology cannot intelligently, quickly and accurately identify the group cheating behavior.
In order to solve the above technical problem, a first aspect of the present invention provides a method for group identification based on similarity weight update, where the method includes:
acquiring a black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network;
updating the weight of the observation dimension Si according to the relationship between the observation dimensions Si in the black seed user sub-relationship graph;
determining the similarity of the users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si;
determining risk groups according to the similarity of the users;
wherein i is a natural number.
According to a preferred embodiment of the present invention, the updating the weight of the observation dimension Si according to the relationship between the observation dimensions Si in the black seed sub-user sub-relationship graph includes:
generating an n-order similarity matrix Di according to the relation between the user observation dimensions Si;
performing spectral clustering on the similarity matrix Di to obtain a final similarity matrix D;
determining the number of the shared edges according to the final similarity matrix D;
updating Si weight according to the number of the shared edges until the target function meets the condition;
wherein n is the number of users contained in the black seed sub-user sub-relationship graph.
According to a preferred embodiment of the present invention, the final similarity matrix D is obtained by the following formula:
D=PiDi+T;
wherein Pi is a random value and T is a predetermined matrix.
According to a preferred embodiment of the present invention, the determining the similarity of the users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si includes:
determining a user observation dimension value ri in the black seed sub-user sub-relation graph;
and determining the similarity of the users in the black seed sub-user sub-relationship graph according to the user observation dimension value ri and the weight of the corresponding updated observation dimension Si.
According to a preferred embodiment of the present invention, the obtaining the black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network includes:
determining black seed users according to user historical data;
diffusing the contact persons of the black seed users according to the group partner scale to obtain a diffusion relation graph;
and segmenting the diffusion relation graph to obtain a black seed user sub-relation graph.
According to a preferred embodiment of the invention, the observation dimensions comprise: at least one of the attribution of the user ID number, the operating system of the device used by the user and the longitude and latitude of the position where the user is located.
In order to solve the above technical problem, a second aspect of the present invention provides a group identification apparatus updated based on similarity weight, the apparatus comprising:
the acquisition module is used for acquiring a black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network;
the updating module is used for updating the weight of the observation dimension Si according to the relationship between the observation dimensions Si in the black seed user sub-relationship graph;
the first determining module is used for determining the similarity of users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si;
the second determining module is used for determining risk groups according to the user similarity;
wherein i is a natural number.
According to a preferred embodiment of the present invention, the update module includes:
the generating module is used for generating an n-order similarity matrix Di according to the relation between the user observation dimensions Si;
the clustering module is used for carrying out spectral clustering on the similarity matrix Di to obtain a final similarity matrix D;
a sub-determination module, configured to determine the number of shared edges according to the final similarity matrix D;
the sub-updating module is used for updating the Si weight according to the number of the common edges until the target function meets the condition;
wherein n is the number of users contained in the black seed sub-user sub-relationship graph.
According to a preferred embodiment of the present invention, the clustering module obtains a final similarity matrix D by the following formula:
D=PiDi+T;
wherein Pi is a random value and T is a predetermined matrix.
According to a preferred embodiment of the present invention, the first determining module includes:
the first sub-determination module is used for determining a user observation dimension value ri in the black seed sub-user sub-relation graph;
and the second sub-determining module is used for determining the similarity of the users in the black seed sub-user sub-relationship graph according to the user observation dimension value ri and the weight of the corresponding updated observation dimension Si.
According to a preferred embodiment of the present invention, the obtaining module includes:
the third sub-determining module is used for determining black sub-users according to the historical data of the users;
the diffusion module is used for diffusing the contact of the black seed user according to the group scale to obtain a diffusion relation graph;
and the segmentation module is used for segmenting the diffusion relation graph to obtain a black seed user sub-relation graph.
According to a preferred embodiment of the invention, the observation dimensions comprise: at least one of the attribution of the user ID number, the operating system of the device used by the user and the longitude and latitude of the position where the user is located.
To solve the above technical problem, a third aspect of the present invention provides an electronic device, comprising:
a processor; and
a memory storing computer executable instructions that, when executed, cause the processor to perform the method described above.
In order to solve the above technical problem, a fourth aspect of the present invention proposes a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs that, when executed by a processor, implement the above method.
Firstly, acquiring a black seed sub-user sub-relationship graph formed by users with close relationships in black seed sub-user contacts according to a social relationship network; updating the weight of the observation dimension Si according to the relationship between the observation dimensions Si in the black seed user sub-relationship graph; determining the similarity of every two users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si; thereby determining risk groups according to user similarity; the similarity between the users is determined by updating the weight of the observation dimension Si in the black seed sub-user sub-relationship graph in real time, and the accuracy of similarity calculation in the group partner identification is improved, so that the risk group can be identified timely and accurately, the business wind control requirement is met, and the economic loss of enterprises is reduced.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive step.
Fig. 1 is a schematic flow chart of a group identification method based on similarity weight update according to the present invention;
FIG. 2 is a schematic flow chart of the present invention for updating the weights of the observation dimensions Si according to the relationship between the observation dimensions Si in the black seed user sub-relationship graph;
FIG. 3 is a schematic illustration of the present invention for determining common edges;
fig. 4 is a schematic structural framework diagram of a group recognition device based on similarity weight update according to the present invention;
FIG. 5 is a block diagram of an exemplary embodiment of an electronic device in accordance with the present invention;
FIG. 6 is a diagrammatic representation of one embodiment of a computer-readable medium of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention may be embodied in many specific forms, and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
The structures, properties, effects or other characteristics described in a certain embodiment may be combined in any suitable manner in one or more other embodiments, while still complying with the technical idea of the invention.
In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.
The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.
Aiming at the existing group fraud behavior in the internet enterprises, the invention combines the specific scene characteristics of the internet service to identify the risk group and provide the identification result to the internet enterprise staff, and the staff can process the resource application of the related staff by rejecting the application (such as rejecting the resource request) or increasing the manual review and the like, so as to reduce the economic loss risk of the internet.
Referring to fig. 1, fig. 1 is a flowchart of a method for group identification based on similarity weight update according to the present invention, as shown in fig. 1, the method includes:
s1, acquiring a black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network;
illustratively, this step includes:
s11, determining black seed users according to the user historical data;
in the invention, the black seed user is a user with bad behaviors such as fraud records or unreturned records of resources. Specifically, users with fraud records or unreturned funds records can be identified through user history data and marked as black seed users.
The user history data may include user service information, user identification information, user contact information, and the like. The user service information is used for recording service data of a user, taking a loan service as an example, the service information is used for recording data of borrowing and repayment of the user, and taking online shopping as an example, the service information is used for recording data of ordering, paying, returning and refunding of the user. The user identification information is used for uniquely identifying the user and can be an Identity (ID) number of the user, and the user contact information can comprise a mailbox, a telephone, a social APP account, an address, an equipment fingerprint, login IP information and the like.
S12, diffusing the contact of the black seed user according to the group scale to obtain a diffusion relation graph;
according to the method and the device, the social contact relationship network of the black seed user is diffused, the user who has the first-degree and second-degree contact relationship with the black seed user forms a diffusion relationship graph, and the recognition efficiency and accuracy of the risk group can be balanced.
The first-degree contact relation means that two users have a direct association relation, and the second-degree contact relation means that two users have an indirect association relation. For example, user a and user B have a first degree contact relationship, and user B and user C also have a first degree contact relationship, then user a and user C have an indirect association relationship through user B, that is, user a and user C have a second degree contact relationship.
The invention adds the black seed user into a pre-established social relationship network as one node. Each node in the social relationship network is used for representing different users, and connecting lines among the nodes are used for representing contact relationships among the users. In the invention, as the black seed user is a user with fraud records, the user who has a contact relation with the black seed user is suspected of fraud.
In practical application, there are many methods for calculating the relationship between two users in a social contact network, and the embodiment of the present application is not particularly limited. If any mail communication, conversation, same equipment, same IP login or social communication and the like exist between the two users, the contact relation between the two users can be regarded as existing, and therefore the contact relation between the two users can be calculated according to the contact information of the users.
Specifically, a weight is newly and respectively set for each contact in the multiple contact information of the black seed user; and counting the times of establishing connection between the black seed user and the first user through each piece of contact information respectively, wherein the first user is any user except the black seed user in the social contact network. And then, according to the number of times of establishing contact between the black seed user and the first user through each piece of contact information and the weight of the corresponding contact information, calculating the contact degree between the black seed user and the first user, and if the calculated contact degree meets a preset condition, determining that the first user and the black seed user have a one-degree contact relationship. In an optional embodiment, the preset condition may be that the contact degree is greater than 1, that is, if the calculated contact degree is greater than 1, it is indicated that the black seed user has a one-degree contact relationship with the first user.
In addition, if the first user has a first degree contact relationship with the black seed user and the second user does not have a first degree contact relationship with the first user, determining that the second user has a second degree contact relationship with the black seed user, wherein the second user is any user in the social relationship network except the black seed user and the first user.
After a social contact network of the black seed user is established, determining to spread first-degree contacts or first-degree contacts and second-degree contacts of the black seed user according to the group scale;
wherein, the group size refers to the number of the personnel included in the risk group. The invention can set the group scale according to the business experience. In one example, the first degree contacts of the black seed user are flooded when the group size is equal to or less than 3 people, and the first degree contacts and the second degree contacts of the black seed user are flooded when the group size is greater than 3 people.
In the invention, if the first-degree contact of the black seed user is diffused, all the first-degree contacts of the black seed user are searched, and a diffusion relation graph is formed by all the first-degree contacts of the black seed user; each node in the diffusion relation graph is used for representing different users, connecting lines between the nodes are used for representing contact person relations between the users, and the users in the diffusion relation graph are black seed users or first-degree contact persons of the black seed users. And if the first-degree contact and the second-degree contact of the black seed user are diffused, searching the first-degree contact and the second-degree contact of the black seed user to obtain a diffusion relation graph. The users in the diffusion relation graph are black seed users, first degree contacts of the black seed users, or second degree contacts of the black seed users.
S13, segmenting the diffusion relation graph to obtain a black seed user sub-relation graph;
the invention completes a group identification process of a stage of coarse granularity by segmenting the diffusion relation graph. After segmentation, the users with close relations in the diffusion relation graph are segmented into the same black seed sub-user sub-relation graph, the black seed sub-user sub-relation graph corresponds to a suspected risk group, and the users which are not segmented into any sub-relation graph do not form the risk group.
The method for segmenting the diffusion relation graph can adopt the existing heap method or the method for constructing a confidence network and segmenting connected subgraphs through the confidence network.
S2, updating the weight of the observation dimension Si according to the relation between the observation dimensions Si in the black seed user sub-relation graph;
the method comprises the step of carrying out two-stage risk group recognition based on the calculation of the user similarity in the black seed sub-user sub-relationship graph so as to improve the accuracy of group recognition. When the user similarity is calculated, the weight of each observation dimension Si is updated in real time according to the relation between the observation dimensions Si in the black seed sub-user sub-relation graph, and the similarity calculation accuracy in the group identification can be effectively improved.
The observation dimension may be user information of different dimensions, and specifically may include: the ID number attribution, the operating system of the equipment used by the user and the longitude and latitude of the position of the user; account name, registration time, IP address information used at the time of registration, device information of a device used at the time of registration, and the like.
Illustratively, as shown in fig. 2, the present step includes:
s21, generating an n-order similarity matrix Di according to the relation between the user observation dimensions Si;
wherein n is the number of users contained in the black seed sub-user sub-relationship graph, and i is the number of observation dimensions. In the similarity matrix Di corresponding to each observation dimension Si, 0 represents that the observation dimensions Si between two users are different, and 1 represents that the observation dimensions Si between two users are similar.
Taking the black seed sub-user sub-relationship diagram including three users, i.e., user 1, user 2 and user 3 (i.e., n = 3), the user observation dimension S1 is selected as the ID number attribution, if the user 1 and the user 2 belong to different ID numbers, the user 1 and the user 3 belong to the same ID number, and the user 2 and the user 3 belong to the same ID number. A 3 rd order similarity matrix D1 is generated based on the relationship between the user ID numbers home locations,
Figure 990704DEST_PATH_IMAGE002
in the same way, the similarity matrix Di corresponding to each observation dimension Si can be obtained.
S22, performing spectral clustering on the similarity matrix Di to obtain a final similarity matrix D;
the main idea of spectral clustering is to consider all data as points in space, and these points can be connected by edges. The edge weight value between two points with a longer distance is lower, the edge weight value between two points with a shorter distance is higher, and the graph formed by all data points is cut, so that the edge weight sum between different subgraphs after graph cutting is as low as possible, and the edge weight sum in the subgraph is as high as possible, thereby achieving the purpose of clustering. The spectral clustering is based on spectral segmentation of the graph, when clustering is carried out, an object set to be clustered is taken as a vertex set to construct a weighted graph, and then a clustering result is obtained by analyzing a characteristic vector and a characteristic value of a matrix related to the weighted graph.
In the invention, an n-order final similarity matrix D is obtained by the following formula:
D=PiDi+T;
wherein Pi is a random value, preferably 1/n, and T is a preset matrix.
After spectral clustering, it is obtained which users are clustered into a class and which users cannot be clustered into a class. Correspondingly, in the final similarity matrix D, 1 represents that two users are grouped into one class, and 0 represents that two users cannot be grouped into one class. Illustratively, if the final similarity matrix is:
Figure 501320DEST_PATH_IMAGE004
then, it indicates that user 2 and user 3 are grouped into a class, and user 1 and user 2 and user 3 cannot be grouped into a class.
S23, determining the number of the shared edges according to the final similarity matrix D;
the shared edge refers to an edge formed by the same observation dimension between two users gathered to one class in the black seed sub-user sub-relationship graph. Specifically, the users gathered to one class are determined according to the final similarity matrix D, and then the number of the two users having the same observation dimension gathered to one class is used as the number of the shared edges. As shown in fig. 3, the black seed sub-user sub-relationship graph includes three users, user 1, user 2, and user 3, and the selected observation dimensions include: s1, S2, and S3. Wherein, the user 1 and the user 2 have the same observation dimensions S2 and S3, the user 1 and the user 3 have the same observation dimension S2, the user 2 and the user 3 have the same observation dimensions S1 and S3, and it is determined that the user 1 and the user 2 are grouped into one class according to the final similarity matrix D, and then the number of the same observation dimensions of the user 1 and the user 2 is taken as the number of the shared edges, that is, two shared edges.
S24, updating the Si weight according to the number of the shared edges until the objective function meets the condition;
the objective function is used for judging whether the result of the spectral clustering meets the condition of Si updating iteration, and can be specifically set according to needs. And circularly executing the steps S21-S24, and stopping the Si updating iteration when the objective function meets the Si updating iteration condition. In general, the result of spectral clustering can converge at 3-5 degrees, and therefore, the iteration of Si update can be stopped at 3-5 degrees.
Specifically, for users clustered into a class, the Si weight may be updated according to the proportion of the total shared edge contributed by each observation dimension, for example, if user 1 and user 2 are clustered into a class, and if user 1 and user 2 have four shared edges, where S1 contributes 1 shared edge, S2 contributes 2 shared edges, and S3 contributes 1 shared edge, S1 weight is 1/4, S2 weight is 2/4, and S3 weight is 1/4.
S3, determining the similarity of the users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si;
illustratively, this step includes:
s31, determining a user observation dimension value ri in the black seed sub-user sub-relationship graph;
the observation dimension Si between the two users is the same, and the corresponding observation dimension value ri is 1, and the observation dimension Si between the two users is different, and the corresponding observation dimension value ri is 0.
And S32, determining the similarity of the users in the black seed sub-user relationship graph according to the user observation dimension value ri and the weight of the corresponding updated observation dimension Si.
Specifically, the similarity of the users in the black seed user sub-relationship graph can be obtained by multiplying each observation dimension Si by the corresponding weight and summing.
S4, determining risk groups according to the user similarity;
specifically, if the similarity between two users is equal, the two users are determined to belong to a risk group, and then the two users are respectively compared with the similarity of other users in the sub-relationship graph, so that the risk group is finally determined.
In this embodiment, the risk group data obtained in step S4 is preliminary group data, which may include some data of normal accounts, for example, data of anchor trumpet, in this case, a white list may be generated according to the user' S remark, for example, to remark that an account is a trumpet, and if an account in the risk group data exists in the white list, the account in the white list may be deleted in the risk group data, so as to obtain final risk group data. Therefore, the present invention may further perform the following steps:
s5, determining whether a preset white list contains users in the risk group or not;
and S6, if the preset white list has the users in the risk group, deleting the risk group users contained in the preset white list in the risk group.
In an embodiment, after finally determining the risk group data, the embodiment may further include the following steps:
and sending the risk group data to a wind control platform, automatically triggering the monitoring of the accounts in the risk group data by the wind control platform at the later stage, and automatically associating and blocking the whole group account after a part of the group accounts violate rules.
For example, it can be determined from data fed back from the business itself or a third party whether the account is prohibited (for example, if an account has a violation behavior in daily business reported by other accounts, or is patrolled or triggers a high-risk behavior, etc., all of the accounts are prohibited), and if there are 10 accounts in the risk group data, of which 7 accounts have been prohibited, and the prohibited proportion exceeds a preset threshold, the risk group data can be considered as a high-risk group, and the remaining 3 accounts in the group are prohibited.
Fig. 4 is a schematic diagram of an architecture of a group recognition device based on similarity weight update according to the present invention, as shown in fig. 4, the device includes:
an obtaining module 41, configured to obtain a black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network;
an updating module 42, configured to update the weight of the observation dimension Si according to the relationship between the observation dimensions Si in the black seed user sub-relationship graph; wherein the observation dimensions include: at least one of the attribution of the user ID number, the operating system of the device used by the user and the longitude and latitude of the position where the user is located.
A first determining module 43, configured to determine similarity of users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si;
a second determining module 44, configured to determine risk groups according to the user similarity;
wherein i is a natural number.
In a specific embodiment, the obtaining module 41 includes:
a third sub-determining module 411, configured to determine a black seed sub-user according to the user history data;
the diffusion module 412 is configured to diffuse the contact of the black seed user according to the group scale to obtain a diffusion relation graph;
and the segmentation module 413 is configured to segment the diffusion relation graph to obtain a black seed user sub-relation graph.
The update module 42 includes:
the generating module 421 is configured to generate an n-order similarity matrix Di according to a relationship between the user observation dimensions Si;
the clustering module 422 is configured to perform spectral clustering on the similarity matrix Di to obtain a final similarity matrix D;
the sub-determining module 423 is configured to determine the number of the shared edges of the black seed sub-user sub-relational graph according to the final similarity matrix D;
a sub-updating module 424, configured to update the Si weight according to the number of the common edges until the objective function meets the condition;
wherein n is the number of users contained in the black seed sub-user sub-relationship graph.
Specifically, the clustering module 422 obtains the final similarity matrix D by the following formula:
D=PiDi+T;
wherein Pi is a random value and T is a predetermined matrix.
The first determination module 43 includes:
a first sub-determining module 431, configured to determine a user observation dimension value ri in the black seed sub-user sub-relationship graph;
a second sub-determining module 432, configured to determine the similarity of the users in the black seed sub-user sub-relationship graph according to the user observation dimension value ri and the weight of the corresponding updated observation dimension Si.
Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
In the following, embodiments of the electronic device of the present invention are described, which may be regarded as an implementation in physical form for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.
Fig. 5 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 500 of the exemplary embodiment is represented in the form of a general-purpose data processing device. The components of the electronic device 500 may include, but are not limited to: at least one processing unit 510, at least one memory unit 520, a bus 530 connecting different electronic device components (including the memory unit 520 and the processing unit 510), a display unit 540, and the like.
The storage unit 520 stores a computer readable program, which may be a code of a source program or a read-only program. The program may be executed by the processing unit 510 such that the processing unit 510 performs the steps of various embodiments of the present invention. For example, the processing unit 510 may perform the steps as shown in fig. 1.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM) 5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203. The memory unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: operating the electronic device, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 300 (e.g., keyboard, display, network device, bluetooth device, etc.), enable a user to interact with the electronic device 500 via the external devices 500, and/or enable the electronic device 500 to communicate with one or more other data processing devices (e.g., router, modem, etc.). Such communication can occur via input/output (I/O) interfaces 550, and can also occur via network adapter 560 to one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet. The network adapter 560 may communicate with other modules of the electronic device 500 via the bus 530. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID electronics, tape drives, and data backup storage electronics, among others.
FIG. 6 is a schematic diagram of one computer-readable medium embodiment of the present invention. As shown in fig. 6, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic device, apparatus, or device that is electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer program, when executed by one or more data processing devices, enables the computer-readable medium to implement the above-described method of the invention, namely: acquiring a black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network; updating the weight of the observation dimension Si according to the relationship between the observation dimensions Si in the black seed user sub-relationship graph; determining the similarity of the users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si; and determining risk groups according to the similarity of the users.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a data processing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution electronic device, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object oriented programming languages such as Java, C + + or the like and conventional procedural programming languages, such as "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the present invention can be implemented as a method, an apparatus, an electronic device, or a computer-readable medium executing a computer program. Some or all of the functions of the present invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP).
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims (8)

1. A method for group recognition based on similarity weight update, the method comprising:
acquiring a black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network;
generating an n-order similarity matrix Di according to the relation between the user observation dimensions Si;
performing spectral clustering on the similarity matrix Di to obtain a final similarity matrix D;
determining the number of the shared edges according to the final similarity matrix D;
updating Si weight according to the number of the shared edges until an objective function meets a condition, wherein n is the number of users contained in the black seed user sub-relational graph;
determining the similarity of the users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si;
determining risk groups according to the similarity of the users;
wherein i is a natural number.
2. The method of claim 1, wherein the final similarity matrix D is obtained by the following formula:
D=PiDi+T;
wherein Pi is a random value and T is a predetermined matrix.
3. The method according to claim 1, wherein the determining the similarity of the users in the black seed sub-user relationship graph according to the updated weights of the observation dimensions Si comprises:
determining a user observation dimension value ri in the black seed sub-user sub-relation graph;
and determining the similarity of the users in the black seed sub-user sub-relationship graph according to the user observation dimension value ri and the weight of the corresponding updated observation dimension Si.
4. The method of claim 1, wherein obtaining the black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network comprises:
determining black seed users according to user historical data;
diffusing the contact persons of the black seed users according to the group partner scale to obtain a diffusion relation graph;
and segmenting the diffusion relation graph to obtain a black seed user sub-relation graph.
5. The method of claim 1, wherein the observation dimensions comprise: at least one of the attribution of the user ID number, the operating system of the device used by the user and the longitude and latitude of the position where the user is located.
6. A group recognition apparatus based on similarity weight update, the apparatus comprising:
the acquisition module is used for acquiring a black seed sub-user sub-relationship graph according to the black seed sub-user social relationship network;
the updating module is used for updating the weight of the observation dimension Si according to the relationship between the observation dimensions Si in the black seed user sub-relationship graph;
the first determining module is used for determining the similarity of users in the black seed sub-user sub-relationship graph according to the updated weight of the observation dimension Si;
the second determining module is used for determining risk groups according to the user similarity;
wherein i is a natural number;
the update module includes:
the generating module is used for generating an n-order similarity matrix Di according to the relation between the user observation dimensions Si;
the clustering module is used for carrying out spectral clustering on the similarity matrix Di to obtain a final similarity matrix D;
a sub-determination module, configured to determine the number of shared edges according to the final similarity matrix D;
the sub-updating module is used for updating the Si weight according to the number of the common edges until the target function meets the condition;
wherein n is the number of users contained in the black seed sub-user sub-relationship graph.
7. An electronic device, comprising:
a processor; and
a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-5.
8. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-5.
CN202010724429.0A 2020-07-24 2020-07-24 Cluster recognition method and device based on similarity weight updating and electronic equipment Active CN111598713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010724429.0A CN111598713B (en) 2020-07-24 2020-07-24 Cluster recognition method and device based on similarity weight updating and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010724429.0A CN111598713B (en) 2020-07-24 2020-07-24 Cluster recognition method and device based on similarity weight updating and electronic equipment

Publications (2)

Publication Number Publication Date
CN111598713A true CN111598713A (en) 2020-08-28
CN111598713B CN111598713B (en) 2021-12-14

Family

ID=72191888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010724429.0A Active CN111598713B (en) 2020-07-24 2020-07-24 Cluster recognition method and device based on similarity weight updating and electronic equipment

Country Status (1)

Country Link
CN (1) CN111598713B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215690A (en) * 2020-09-08 2021-01-12 北京数美时代科技有限公司 Black product group analysis method and device based on multi-association network and computer equipment
CN113297389A (en) * 2021-04-29 2021-08-24 上海淇玥信息技术有限公司 Method and device for association relationship between devices and electronic device
CN116934507A (en) * 2023-09-19 2023-10-24 国任财产保险股份有限公司 Intelligent claim settlement method and system based on big data driving

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423982A (en) * 2016-05-24 2017-12-01 阿里巴巴集团控股有限公司 Service implementation method and device based on account
CN108492173A (en) * 2018-03-23 2018-09-04 上海氪信信息技术有限公司 A kind of anti-Fraud Prediction method of credit card based on dual-mode network figure mining algorithm
CN108681936A (en) * 2018-04-26 2018-10-19 浙江邦盛科技有限公司 A kind of fraud clique recognition methods propagated based on modularity and balance label
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109146707A (en) * 2018-08-27 2019-01-04 罗孚电气(厦门)有限公司 Power consumer analysis method, device and electronic equipment based on big data analysis
CN109784636A (en) * 2018-12-13 2019-05-21 中国平安财产保险股份有限公司 Fraudulent user recognition methods, device, computer equipment and storage medium
CN109816519A (en) * 2019-01-25 2019-05-28 宜人恒业科技发展(北京)有限公司 A kind of recognition methods of fraud clique, device and equipment
CN110083791A (en) * 2019-05-05 2019-08-02 北京三快在线科技有限公司 Target group detection method, device, computer equipment and storage medium
WO2019196545A1 (en) * 2018-04-12 2019-10-17 阿里巴巴集团控股有限公司 Data processing method, apparatus and device for insurance fraud identification, and server
CN111222976A (en) * 2019-12-16 2020-06-02 北京淇瑀信息科技有限公司 Risk prediction method and device based on network diagram data of two parties and electronic equipment
CN111401468A (en) * 2020-03-26 2020-07-10 上海海事大学 Weight self-updating multi-view spectral clustering method based on shared neighbor

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423982A (en) * 2016-05-24 2017-12-01 阿里巴巴集团控股有限公司 Service implementation method and device based on account
CN108492173A (en) * 2018-03-23 2018-09-04 上海氪信信息技术有限公司 A kind of anti-Fraud Prediction method of credit card based on dual-mode network figure mining algorithm
WO2019196545A1 (en) * 2018-04-12 2019-10-17 阿里巴巴集团控股有限公司 Data processing method, apparatus and device for insurance fraud identification, and server
CN108681936A (en) * 2018-04-26 2018-10-19 浙江邦盛科技有限公司 A kind of fraud clique recognition methods propagated based on modularity and balance label
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109146707A (en) * 2018-08-27 2019-01-04 罗孚电气(厦门)有限公司 Power consumer analysis method, device and electronic equipment based on big data analysis
CN109784636A (en) * 2018-12-13 2019-05-21 中国平安财产保险股份有限公司 Fraudulent user recognition methods, device, computer equipment and storage medium
CN109816519A (en) * 2019-01-25 2019-05-28 宜人恒业科技发展(北京)有限公司 A kind of recognition methods of fraud clique, device and equipment
CN110083791A (en) * 2019-05-05 2019-08-02 北京三快在线科技有限公司 Target group detection method, device, computer equipment and storage medium
CN111222976A (en) * 2019-12-16 2020-06-02 北京淇瑀信息科技有限公司 Risk prediction method and device based on network diagram data of two parties and electronic equipment
CN111401468A (en) * 2020-03-26 2020-07-10 上海海事大学 Weight self-updating multi-view spectral clustering method based on shared neighbor

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215690A (en) * 2020-09-08 2021-01-12 北京数美时代科技有限公司 Black product group analysis method and device based on multi-association network and computer equipment
CN113297389A (en) * 2021-04-29 2021-08-24 上海淇玥信息技术有限公司 Method and device for association relationship between devices and electronic device
CN116934507A (en) * 2023-09-19 2023-10-24 国任财产保险股份有限公司 Intelligent claim settlement method and system based on big data driving
CN116934507B (en) * 2023-09-19 2023-12-26 国任财产保险股份有限公司 Intelligent claim settlement method and system based on big data driving

Also Published As

Publication number Publication date
CN111598713B (en) 2021-12-14

Similar Documents

Publication Publication Date Title
CN111598713B (en) Cluster recognition method and device based on similarity weight updating and electronic equipment
CN108009915B (en) Marking method and related device for fraudulent user community
WO2020192184A1 (en) Gang fraud detection based on graph model
WO2022126970A1 (en) Method and device for financial fraud risk identification, computer device, and storage medium
CN111222976B (en) Risk prediction method and device based on network map data of two parties and electronic equipment
CN112738102B (en) Asset identification method, device, equipment and storage medium
WO2024007599A1 (en) Heterogeneous graph neural network-based method and apparatus for determining target service
CN114186626A (en) Abnormity detection method and device, electronic equipment and computer readable medium
CN111092999A (en) Data request processing method and device
CN111598714A (en) Two-stage unsupervised group partner identification method and device and electronic equipment
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
CN116703555A (en) Early warning method, early warning device, electronic equipment and computer readable medium
CN115545088B (en) Model construction method, classification method, device and electronic equipment
CN111210109A (en) Method and device for predicting user risk based on associated user and electronic equipment
CN116468281A (en) Abnormal user group identification method and device, storage medium and electronic equipment
CN116228384A (en) Data processing method, device, electronic equipment and computer readable medium
US11348115B2 (en) Method and apparatus for identifying risky vertices
CN115809853A (en) Configuration optimization method, system and storage medium for enterprise business process
CN114356712A (en) Data processing method, device, equipment, readable storage medium and program product
US20200125325A1 (en) Identification of users across multiple platforms
CN110895564A (en) Potential customer data processing method and device
CN113609451B (en) Risk equipment identification method and device based on relational network feature derivation
CN115022002B (en) Verification mode determining method and device, storage medium and electronic equipment
CN110351116B (en) Abnormal object monitoring method, device, medium and electronic equipment
US20230099510A1 (en) Network topology monitoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant