CN111598714A - Two-stage unsupervised group partner identification method and device and electronic equipment - Google Patents

Two-stage unsupervised group partner identification method and device and electronic equipment Download PDF

Info

Publication number
CN111598714A
CN111598714A CN202010724458.7A CN202010724458A CN111598714A CN 111598714 A CN111598714 A CN 111598714A CN 202010724458 A CN202010724458 A CN 202010724458A CN 111598714 A CN111598714 A CN 111598714A
Authority
CN
China
Prior art keywords
users
user
black seed
relation graph
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010724458.7A
Other languages
Chinese (zh)
Inventor
宋孟楠
苏绥绥
郑彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qilu Information Technology Co Ltd
Original Assignee
Beijing Qilu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qilu Information Technology Co Ltd filed Critical Beijing Qilu Information Technology Co Ltd
Priority to CN202010724458.7A priority Critical patent/CN111598714A/en
Publication of CN111598714A publication Critical patent/CN111598714A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Computation (AREA)
  • Finance (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a two-stage unsupervised group partner identification method, a device and electronic equipment, wherein the method comprises the following steps: determining black seed users according to user historical data; diffusing the contact persons of the black seed users according to the group partner scale to obtain a diffusion relation graph; segmenting the diffusion relation graph to obtain a sub-relation graph; calculating the similarity of the users in the sub-relation graph; and determining risk groups according to the similarity of the users. The method and the system identify the risk group based on the social relationship network of the black seed user, can identify the risk group timely and accurately, meet the business wind control requirement and reduce the economic loss of enterprises.

Description

Two-stage unsupervised group partner identification method and device and electronic equipment
Technical Field
The invention relates to the technical field of computer information processing, in particular to a two-stage unsupervised group partner identification method and device, electronic equipment and a computer readable medium.
Background
Due to the rapid development of the internet and the popularization of intelligent terminals, people can transact a plurality of services such as online shopping, online transfer, online loan and the like through the network without leaving home. Meanwhile, in order to earn interests, lawless persons are rampant about the behavior of ganging up and cheating by forging false information by other persons.
Group fraud causes greater economic loss to internet enterprises than personal fraud, and therefore how to identify and avoid group fraud so as to reduce economic loss is a problem to be solved urgently by internet enterprises.
Disclosure of Invention
The invention aims to solve the technical problem that the existing network technology cannot intelligently, quickly and accurately identify the group cheating behavior.
In order to solve the above technical problem, a first aspect of the present invention provides a two-phase unsupervised group identification method, including:
determining black seed users according to user historical data;
diffusing the contact persons of the black seed users according to the group partner scale to obtain a diffusion relation graph;
segmenting the diffusion relation graph to obtain a sub-relation graph;
calculating the similarity of the users in the sub-relation graph;
and determining risk groups according to the similarity of the users.
According to a preferred embodiment of the present invention, the diffusing the contacts of the black seed user according to the group scale to obtain a diffusion relation graph includes:
determining to spread the first degree contact or the first degree contact and the second degree contact of the black seed user according to the group partner scale;
if the first-degree contact of the black seed user is diffused, searching the first-degree contact of the black seed user to obtain the diffusion relation graph;
and if the first-degree contact and the second-degree contact of the black seed user are diffused, searching the first-degree contact and the second-degree contact of the black seed user to obtain the diffusion relation graph.
According to a preferred embodiment of the present invention, the calculating the similarity of the users in the sub-relationship graph includes:
acquiring feature data of any two users in the sub-relation graph;
and determining the similarity of the two users according to the characteristic data.
According to a preferred embodiment of the present invention, the similarity S between two users is determined by the following formula:
Figure 679974DEST_PATH_IMAGE001
wherein, D is the Euclidean distance between two users, and sigma is a preset parameter.
According to a preferred embodiment of the invention, the method further comprises:
determining whether there are users in the risk group in a preset white list;
and if the preset white list has the users in the risk group, deleting the risk group users contained in the preset white list in the risk group.
According to a preferred embodiment of the present invention, the black seed user is a user who has a fraudulent record or a record of unreturned resources.
In order to solve the above technical problem, a second aspect of the present invention provides a two-phase unsupervised group partner identifying device, comprising:
the first determining module is used for determining black seed users according to user historical data;
the diffusion module is used for diffusing the contact of the black seed user according to the group scale to obtain a diffusion relation graph;
the segmentation module is used for segmenting the diffusion relation graph to obtain a sub-relation graph;
the calculation module is used for calculating the similarity of the users in the sub-relation graph;
and the second determining module is used for determining the risk group according to the user similarity.
According to a preferred embodiment of the present invention, the diffusion module comprises:
the first sub-determination module is used for determining to spread the first degree contact or the first degree contact and the second degree contact of the black seed user according to the group scale;
the sub-diffusion module is used for searching the first-degree contact of the black seed user to obtain the diffusion relation graph if the first-degree contact of the black seed user is diffused; and if the first-degree contact and the second-degree contact of the black seed user are diffused, searching the first-degree contact and the second-degree contact of the black seed user to obtain the diffusion relation graph.
According to a preferred embodiment of the invention, the calculation module comprises:
the acquisition module is used for acquiring the feature data of any two users in the sub-relation graph;
and the second sub-determination module is used for determining the similarity of the two users according to the characteristic data.
According to a preferred embodiment of the present invention, the second sub-determining module determines the similarity S between two users according to the following formula:
Figure 597114DEST_PATH_IMAGE002
wherein, D is the Euclidean distance between two users, and sigma is a preset parameter.
According to a preferred embodiment of the invention, the device further comprises:
a third determining module, configured to determine whether there is a user in the risk group in a preset white list;
and the deleting module is used for deleting the risk group-partner users contained in the preset white list in the risk group-partner if the users in the risk group-partner exist in the preset white list.
According to a preferred embodiment of the present invention, the black seed user is a user who has a fraudulent record or a record of unreturned resources.
To solve the above technical problem, a third aspect of the present invention provides an electronic device, comprising:
a processor; and
a memory storing computer executable instructions that, when executed, cause the processor to perform the method described above.
In order to solve the above technical problem, a fourth aspect of the present invention proposes a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs that, when executed by a processor, implement the above method.
Firstly, determining black seed users according to user historical data, and diffusing contacts of the black seed users based on a relation network of the black seed users to obtain a diffusion relation graph; obtaining a sub-relationship graph formed by users with close relationships in the black seed sub-user contact through segmentation of the diffusion relationship graph; further calculating the similarity of every two users in the sub-relationship graph; and determining risk groups according to the similarity of the users. The method and the system identify the risk group based on the social relationship network of the black seed user, can identify the risk group timely and accurately, meet the business wind control requirement and reduce the economic loss of enterprises.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive step.
Fig. 1 is a flow chart of a two-stage unsupervised group identification method according to the present invention;
FIG. 2 is a schematic flow chart of the steps of calculating the similarity of users in a sub-relationship graph according to the present invention;
fig. 3 is a schematic structural framework diagram of a two-stage unsupervised group identification device according to the present invention;
FIG. 4 is a block diagram of an exemplary embodiment of an electronic device in accordance with the present invention;
FIG. 5 is a schematic diagram of one embodiment of a computer-readable medium of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention may be embodied in many specific forms, and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
The structures, properties, effects or other characteristics described in a certain embodiment may be combined in any suitable manner in one or more other embodiments, while still complying with the technical idea of the invention.
In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.
The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.
Aiming at the existing group fraud behavior in the internet enterprises, the invention combines the specific scene characteristics of the internet service to identify the risk group and provide the identification result to the internet enterprise staff, and the staff can process the resource application of the related staff by rejecting the application (such as rejecting the resource request) or increasing the manual review and the like, so as to reduce the economic loss risk of the internet.
Referring to fig. 1, fig. 1 is a flowchart of a two-phase unsupervised group partner identification method according to the present invention, as shown in fig. 1, the method includes:
s1, determining black seed users according to the user historical data;
in the invention, the black seed user is a user with bad behaviors such as fraud records or unreturned records of resources. Specifically, users with fraud records or unreturned funds records can be identified through user history data and marked as black seed users.
The user history data may include user service information, user identification information, user contact information, and the like. The user service information is used for recording service data of a user, taking a loan service as an example, the service information is used for recording data of borrowing and repayment of the user, and taking online shopping as an example, the service information is used for recording data of ordering, paying, returning and refunding of the user. The user identification information is used for uniquely identifying the user and can be an Identity (ID) number of the user, and the user contact information can comprise a mailbox, a telephone, a social APP account, an address, an equipment fingerprint, login IP information and the like.
S2, diffusing the contact of the black seed user according to the group scale to obtain a diffusion relation graph;
the method and the system are used for diffusing based on the social relationship network of the black seed users, and the users who have the first-degree and second-degree contact relationship with the black seed users form a diffusion relationship graph, so that the recognition efficiency and accuracy of the risk group can be balanced.
The first-degree contact relation means that two users have a direct association relation, and the second-degree contact relation means that two users have an indirect association relation. For example, user a and user B have a first degree contact relationship, and user B and user C also have a first degree contact relationship, then user a and user C have an indirect association relationship through user B, that is, user a and user C have a second degree contact relationship.
The invention adds the black seed user into a pre-established social relationship network as one node. Each node in the social relationship network is used for representing different users, and connecting lines among the nodes are used for representing contact relationships among the users. In the invention, as the black seed user is a user with fraud records, the user who has a contact relation with the black seed user is suspected of fraud.
In practical application, there are many methods for calculating the contact relationship between two users in the social relationship network, and the embodiment of the present application is not particularly limited. If any mail communication, conversation, same equipment, same IP login or social communication and the like exist between the two users, the contact relation between the two users can be regarded as existing, and therefore the contact relation between the two users can be calculated according to the contact information of the users.
Specifically, a weight is newly and respectively set for each contact in the multiple contact information of the black seed user; and counting the times of establishing connection between the black seed user and the first user through each piece of contact information respectively, wherein the first user is any user except the black seed user in the social relationship network. And then, according to the number of times of establishing contact between the black seed user and the first user through each piece of contact information and the weight of the corresponding contact information, calculating the contact degree between the black seed user and the first user, and if the calculated contact degree meets a preset condition, determining that the first user and the black seed user have a one-degree contact relationship. In an optional embodiment, the preset condition may be that the contact degree is greater than 1, that is, if the calculated contact degree is greater than 1, it is indicated that the black seed user has a one-degree contact relationship with the first user.
In addition, if the first user has a first-degree contact relationship with the black seed user and the second user does not have a first-degree contact relationship with the first user, determining that the second user has a second-degree contact relationship with the black seed user, wherein the second user is any user in the social relationship network except the black seed user and the first user.
After a social relationship network of the black seed user is established, determining to spread first-degree contacts or first-degree contacts and second-degree contacts of the black seed user according to the group scale;
wherein, the group size refers to the number of the personnel included in the risk group. The invention can set the group scale according to the business experience. In one example, the first degree contacts of the black seed user are flooded when the group size is equal to or less than 3 people, and the first degree contacts and the second degree contacts of the black seed user are flooded when the group size is greater than 3 people.
In the invention, if the first-degree contact of the black seed user is diffused, all the first-degree contacts of the black seed user are searched, and a diffusion relation graph is formed by all the first-degree contacts of the black seed user; each node in the diffusion relation graph is used for representing different users, connecting lines between the nodes are used for representing contact person relations between the users, and the users in the diffusion relation graph are black seed users or first-degree contact persons of the black seed users. And if the first-degree contact and the second-degree contact of the black seed user are diffused, searching the first-degree contact and the second-degree contact of the black seed user to obtain a diffusion relation graph. The users in the diffusion relation graph are black seed users, first degree contacts of the black seed users, or second degree contacts of the black seed users.
S3, segmenting the diffusion relation graph to obtain a sub-relation graph;
the invention completes a group identification process of a stage of coarse granularity by segmenting the diffusion relation graph. After segmentation, the users with close relations in the diffusion relation graph are segmented into the same sub-relation graph, the users correspond to a suspected risk group, and the users which are not segmented into any sub-relation graph do not form a risk group.
The method for segmenting the diffusion relation graph can adopt the existing heap method or the method for constructing a confidence network and segmenting connected subgraphs through the confidence network.
S4, calculating the similarity of the users in the sub-relation graph;
the method carries out two-stage risk group recognition based on the calculation of the user similarity in the sub-relation graph so as to improve the accuracy of group recognition. Illustratively, as shown in fig. 2, the present step includes:
s41, acquiring feature data of any two users in the sub-relational graph;
in the present invention, the feature data of the user may include multi-dimensional user information, which specifically includes: the ID number attribution, the operating system of the equipment used by the user and the longitude and latitude of the position of the user; account name, registration time, IP address information used at the time of registration, device information of a device used at the time of registration, and the like.
And S42, determining the similarity of the two users according to the characteristic data.
In the invention, the similarity of the users can be classified into a category type and a numerical type according to the value range. Wherein the value of the category similarity is a numerical value between 0 and 1; the value of the numerical similarity may be any value.
For the similarity of the type, it may be determined whether the feature data are similar, and if the feature data are the same, the similarity is determined. For example, for ID number attribution, if ID numbers of two users are attributed identically, it is determined that ID numbers of the two users are attributed similarly. And defining the weight of each data characteristic according to business experience, and in principle, setting the weight of a small probability event to be large and the weight of a large probability event to be small. For example, the same weight attributed to an ID number is greater than the same weight attributed to an operating system because two user operating systems are the same and belong to a larger probability event than the same weight attributed to an ID number. And finally, determining the similarity of the users according to the weight of each characteristic data and whether each characteristic data is similar.
For numerical similarity, the similarity S of two users is determined by the following formula:
Figure 314534DEST_PATH_IMAGE003
wherein, D is the Euclidean distance between two users, and sigma is a preset parameter.
And S5, determining the risk group according to the similarity of the users.
Specifically, for the category type similarity, if the similarity between two users is equal, it is determined that the two users belong to a risk group, and then the similarity comparison between the two users and other users in the sub-relationship graph is performed respectively, so as to finally determine the risk group.
And for the numerical similarity, when the difference between the similarity of the two users is within a preset range, determining that the two users belong to a risk group, and respectively comparing the similarity of the two users with the similarity of other users in the sub-relationship graph to finally determine the risk group.
In this embodiment, the risk group data obtained in step S5 is preliminary group data, which may include some data of normal accounts, for example, data of anchor trumpet, in this case, a white list may be generated according to the user' S remark, for example, to remark that an account is a trumpet, and if an account in the risk group data exists in the white list, the account in the white list may be deleted in the risk group data, so as to obtain final risk group data. Therefore, the present invention may further perform the following steps:
s6, determining whether a preset white list contains users in the risk group or not;
and S7, if the preset white list has the users in the risk group, deleting the risk group users contained in the preset white list in the risk group.
In an embodiment, after finally determining the risk group data, the embodiment may further include the following steps:
and sending the risk group data to a wind control platform, automatically triggering the monitoring of the accounts in the risk group data by the wind control platform at the later stage, and automatically associating and blocking the whole group account after a part of the group accounts violate rules.
For example, it can be determined from data fed back from the business itself or a third party whether the account is prohibited (for example, if an account has a violation behavior in daily business reported by other accounts, or is patrolled or triggers a high-risk behavior, etc., all of the accounts are prohibited), and if there are 10 accounts in the risk group data, of which 7 accounts have been prohibited, and the prohibited proportion exceeds a preset threshold, the risk group data can be considered as a high-risk group, and the remaining 3 accounts in the group are prohibited.
Fig. 3 is a schematic diagram of an architecture of a two-phase unsupervised group partner recognition device according to the present invention, as shown in fig. 3, the device includes:
a first determining module 31, configured to determine black seed users according to user history data; wherein, the black seed user is a user with fraud record or resource unreturned record.
The diffusion module 32 is configured to diffuse the contact of the black seed user according to the group scale to obtain a diffusion relation graph;
a segmentation module 33, configured to segment the diffusion relation graph to obtain a sub-relation graph;
a calculating module 34, configured to calculate similarity of users in the sub-relationship graph;
and a second determining module 35, configured to determine risk groups according to the user similarity.
In one embodiment, the diffusion module 32 includes:
the first sub-determining module 321 is configured to determine to spread the first degree contact or the first degree contact and the second degree contact of the black seed user according to the group partner size;
the sub-diffusion module 322 is configured to search the first-degree contact of the black seed user to obtain the diffusion relation graph if the first-degree contact of the black seed user is diffused; and if the first-degree contact and the second-degree contact of the black seed user are diffused, searching the first-degree contact and the second-degree contact of the black seed user to obtain the diffusion relation graph.
The calculation module 34 includes:
an obtaining module 341, configured to obtain feature data of any two users in the sub-relationship graph;
a second sub-determining module 342, configured to determine similarity between the two users according to the feature data.
Specifically, the second sub-determining module 342 determines the similarity S between two users according to the following formula:
Figure 709744DEST_PATH_IMAGE005
wherein, D is the Euclidean distance between two users, and sigma is a preset parameter.
Further, the apparatus further comprises:
a third determining module, configured to determine whether there is a user in the risk group in a preset white list;
and the deleting module is used for deleting the risk group-partner users contained in the preset white list in the risk group-partner if the users in the risk group-partner exist in the preset white list.
Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
In the following, embodiments of the electronic device of the present invention are described, which may be regarded as an implementation in physical form for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.
Fig. 4 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 4, the electronic device 400 of the exemplary embodiment is represented in the form of a general-purpose data processing device. The components of electronic device 400 may include, but are not limited to: at least one processing unit 410, at least one memory unit 420, a bus 430 connecting different electronic device components (including the memory unit 420 and the processing unit 410), a display unit 440, and the like.
The storage unit 420 stores a computer-readable program, which may be a code of a source program or a read-only program. The program may be executed by the processing unit 410 such that the processing unit 410 performs the steps of various embodiments of the present invention. For example, the processing unit 410 may perform the steps as shown in fig. 1.
The storage unit 420 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM) 4201 and/or a cache memory unit 4202, and may further include a read only memory unit (ROM) 4203. The storage unit 420 may also include a program/utility 4204 having a set (at least one) of program modules 4205, such program modules 4205 including, but not limited to: operating the electronic device, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 430 may be any bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 400 may also communicate with one or more external devices 300 (e.g., keyboard, display, network device, bluetooth device, etc.), enable a user to interact with the electronic device 400 via the external devices 400, and/or enable the electronic device 400 to communicate with one or more other data processing devices (e.g., router, modem, etc.). Such communication may occur via input/output (I/O) interfaces 450, and may also occur via a network adapter 460 with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network such as the Internet). The network adapter 460 may communicate with other modules of the electronic device 400 via the bus 430. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in the electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID electronics, tape drives, and data backup storage electronics, among others.
FIG. 5 is a schematic diagram of one computer-readable medium embodiment of the present invention. As shown in fig. 5, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic device, apparatus, or device that is electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer program, when executed by one or more data processing devices, enables the computer-readable medium to implement the above-described method of the invention, namely: determining black seed users according to user historical data; diffusing the contact persons of the black seed users according to the group partner scale to obtain a diffusion relation graph; segmenting the diffusion relation graph to obtain a sub-relation graph; calculating the similarity of the users in the sub-relation graph; and determining risk groups according to the similarity of the users.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a data processing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution electronic device, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object oriented programming languages such as Java, C + + or the like and conventional procedural programming languages, such as "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the present invention can be implemented as a method, an apparatus, an electronic device, or a computer-readable medium executing a computer program. Some or all of the functions of the present invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP).
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims (9)

1. A two-phase unsupervised group partner identification method, the method comprising:
determining black seed users according to user historical data;
diffusing the contact persons of the black seed users according to the group partner scale to obtain a diffusion relation graph;
segmenting the diffusion relation graph to obtain a sub-relation graph;
calculating the similarity of the users in the sub-relation graph;
and determining risk groups according to the similarity of the users.
2. The method of claim 1, wherein the diffusing the contacts of the black seed user according to the group scale to obtain a diffusion relation graph comprises:
determining to spread the first degree contact or the first degree contact and the second degree contact of the black seed user according to the group partner scale;
if the first-degree contact of the black seed user is diffused, searching the first-degree contact of the black seed user to obtain the diffusion relation graph;
and if the first-degree contact and the second-degree contact of the black seed user are diffused, searching the first-degree contact and the second-degree contact of the black seed user to obtain the diffusion relation graph.
3. The method according to any of claims 1-2, wherein the calculating the similarity of users in the graph comprises:
acquiring feature data of any two users in the sub-relation graph;
and determining the similarity of the two users according to the characteristic data.
4. The method of claim 3, wherein the similarity S between two users is determined by the following formula:
Figure 923779DEST_PATH_IMAGE001
wherein, D is the Euclidean distance between two users, and sigma is a preset parameter.
5. The method of claim 1, further comprising:
determining whether there are users in the risk group in a preset white list;
and if the preset white list has the users in the risk group, deleting the risk group users contained in the preset white list in the risk group.
6. The method of claim 1, wherein the black seed user is a user with a fraudulent record or a resource unreturned record.
7. A two-phase unsupervised group partner identifying device, the device comprising:
the first determining module is used for determining black seed users according to user historical data;
the diffusion module is used for diffusing the contact of the black seed user according to the group scale to obtain a diffusion relation graph;
the segmentation module is used for segmenting the diffusion relation graph to obtain a sub-relation graph;
the calculation module is used for calculating the similarity of the users in the sub-relation graph;
and the second determining module is used for determining the risk group according to the user similarity.
8. An electronic device, comprising:
a processor; and
a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-6.
9. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-6.
CN202010724458.7A 2020-07-24 2020-07-24 Two-stage unsupervised group partner identification method and device and electronic equipment Pending CN111598714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010724458.7A CN111598714A (en) 2020-07-24 2020-07-24 Two-stage unsupervised group partner identification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010724458.7A CN111598714A (en) 2020-07-24 2020-07-24 Two-stage unsupervised group partner identification method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111598714A true CN111598714A (en) 2020-08-28

Family

ID=72184648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010724458.7A Pending CN111598714A (en) 2020-07-24 2020-07-24 Two-stage unsupervised group partner identification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111598714A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993371A (en) * 2023-09-25 2023-11-03 中邮消费金融有限公司 Abnormality detection method and system based on biological characteristics

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109816519A (en) * 2019-01-25 2019-05-28 宜人恒业科技发展(北京)有限公司 A kind of recognition methods of fraud clique, device and equipment
CN110135916A (en) * 2019-05-23 2019-08-16 北京优网助帮信息技术有限公司 A kind of similar crowd recognition method and system
CN110263227A (en) * 2019-05-15 2019-09-20 阿里巴巴集团控股有限公司 Clique based on figure neural network finds method and system
CN110472050A (en) * 2019-07-24 2019-11-19 阿里巴巴集团控股有限公司 A kind of clique's clustering method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109816519A (en) * 2019-01-25 2019-05-28 宜人恒业科技发展(北京)有限公司 A kind of recognition methods of fraud clique, device and equipment
CN110263227A (en) * 2019-05-15 2019-09-20 阿里巴巴集团控股有限公司 Clique based on figure neural network finds method and system
CN110135916A (en) * 2019-05-23 2019-08-16 北京优网助帮信息技术有限公司 A kind of similar crowd recognition method and system
CN110472050A (en) * 2019-07-24 2019-11-19 阿里巴巴集团控股有限公司 A kind of clique's clustering method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993371A (en) * 2023-09-25 2023-11-03 中邮消费金融有限公司 Abnormality detection method and system based on biological characteristics

Similar Documents

Publication Publication Date Title
US10904175B1 (en) Verifying users of an electronic messaging system
WO2022126970A1 (en) Method and device for financial fraud risk identification, computer device, and storage medium
CN111598713B (en) Cluster recognition method and device based on similarity weight updating and electronic equipment
TWI804575B (en) Method and apparatus, computer readable storage medium, and computing device for identifying high-risk users
WO2020207167A1 (en) Text classification method, apparatus and device, and computer-readable storage medium
WO2022126963A1 (en) Customer profiling method based on customer response corpora, and device related thereto
WO2021068635A1 (en) Information processing method and apparatus, and electronic device
CN111612038B (en) Abnormal user detection method and device, storage medium and electronic equipment
US20120284263A1 (en) Typed relevance scores in an identity resolution system
AU2019101565A4 (en) User data sharing method and device
CN112738102A (en) Asset identification method, device, equipment and storage medium
CN112468658B (en) Voice quality detection method and device, computer equipment and storage medium
CN111586695B (en) Short message identification method and related equipment
CN109657148B (en) Abnormal operation identification method, device, server and medium for reported POI
US10721242B1 (en) Verifying a correlation between a name and a contact point in a messaging system
CN111598714A (en) Two-stage unsupervised group partner identification method and device and electronic equipment
CN116662987A (en) Service system monitoring method, device, computer equipment and storage medium
CN110287315A (en) Public sentiment determines method, apparatus, equipment and storage medium
CN113904828B (en) Method, apparatus, device, medium and program product for detecting sensitive information of interface
US11348115B2 (en) Method and apparatus for identifying risky vertices
CN113595886A (en) Instant messaging message processing method and device, electronic equipment and storage medium
CN113365113A (en) Target node identification method and device
US10466965B2 (en) Identification of users across multiple platforms
CN115022002B (en) Verification mode determining method and device, storage medium and electronic equipment
US11892986B2 (en) Activated neural pathways in graph-structured data models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination