CN111049809A - Risk user identification method and device, computer equipment and storage medium - Google Patents

Risk user identification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111049809A
CN111049809A CN201911183921.5A CN201911183921A CN111049809A CN 111049809 A CN111049809 A CN 111049809A CN 201911183921 A CN201911183921 A CN 201911183921A CN 111049809 A CN111049809 A CN 111049809A
Authority
CN
China
Prior art keywords
group
risk
vertex
users
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911183921.5A
Other languages
Chinese (zh)
Inventor
刘利
郭鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201911183921.5A priority Critical patent/CN111049809A/en
Publication of CN111049809A publication Critical patent/CN111049809A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Abstract

The embodiment of the invention discloses a method and a device for identifying a risk user, computer equipment and a storage medium. The method belongs to the technical field of network security, and comprises the following steps: constructing a graph according to the user data sample; obtaining a node vector of each vertex of the graph; obtaining the distance between node vectors of each vertex of the graph; determining a related vertex group as a group according to the distance between the node vectors of each vertex of the graph and a preset distance threshold; determining the proportion of preset risk users in the group according to the user storage table of the group and a pre-stored risk user storage table; judging whether the group is a risk group or not according to the preset proportion of the risk users in the group; if yes, obtaining users except for preset risk users in the risk group as target users, and marking the target users as suspicious risk users. Because the suspicious risk user and the risk user belong to the same group and are most likely to be the risk users, the accuracy of identifying the suspicious risk user is higher.

Description

Risk user identification method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of network security technologies, and in particular, to a method and an apparatus for identifying a risky user, a computer device, and a storage medium.
Background
In the prior art, the anti-cheating technical scheme is mainly used for identifying dangerous users based on a model constructed by artificially set rules, and specifically, strong rules are established according to some transaction characteristics to identify and calculate the dangerous users.
However, the limitation of the manual rule is large, and the illegal user can adjust the cheating behavior of the illegal user according to the set manual rule to avoid the capture of the anti-cheating model, so that the accuracy is low when the existing anti-cheating model is adopted to identify the suspicious risk user.
Disclosure of Invention
The embodiment of the invention provides a risk user identification method, a risk user identification device, computer equipment and a storage medium, and aims to solve the problem that in the prior art, an identification method for suspicious risk users is low in accuracy.
In a first aspect, an embodiment of the present invention provides a method for identifying a risky user, including:
constructing a graph according to a user data sample, wherein the user data sample comprises a plurality of user data, and a vertex of the graph is one user data of the user data sample;
acquiring a node vector of each vertex of the graph;
obtaining the distance between the node vectors of each vertex of the graph;
determining an associated vertex group as a group according to the distance between the node vectors of the vertexes of the graph and a preset distance threshold, wherein if the distance between the node vectors of the vertexes corresponding to two users is smaller than the distance threshold, the two users are confirmed to belong to the same associated vertex group;
determining the proportion of preset risk users in the group according to a user storage table of the group and a prestored risk user storage table, wherein the user storage table is used for storing a list of the users in the group, and the risk user storage table is used for storing a preset list of risk users;
judging whether the group is a risk group or not according to the preset proportion of the risk users in the group;
if the group is a risk group, acquiring users except for preset risk users in the risk group as target users;
and marking the target user as a suspicious risk user.
In a second aspect, an embodiment of the present invention further provides an apparatus for identifying a risky user, including:
the construction unit is used for constructing a graph according to a user data sample, the user data sample comprises a plurality of user data, and a vertex of the graph is one of the user data sample;
a first obtaining unit configured to obtain a node vector of each vertex of the graph;
a second obtaining unit configured to obtain a distance between node vectors of vertices of the graph;
a first determining unit, configured to determine, according to a distance between node vectors of vertices of the graph and a preset distance threshold, an associated vertex group as a group, where if the distance between node vectors of vertices corresponding to two users is smaller than the distance threshold, it is determined that the two users belong to the same associated vertex group;
a second determining unit, configured to determine a preset risk user proportion in the group according to a user storage table of the group and a pre-stored risk user storage table, where the user storage table is used to store a list of users of the group, and the risk user storage table is used to store a preset list of risk users;
the first judgment unit is used for judging whether the group is a risk group according to the preset proportion of risk users in the group;
a third obtaining unit, configured to obtain, if the group is a risk group, users other than preset risk users in the risk group as target users;
and the marking unit is used for marking the target user as a suspicious risk user.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the above method when executing the computer program.
In a fourth aspect, the present invention also provides a computer-readable storage medium, which stores a computer program, and the computer program can implement the above method when being executed by a processor.
The embodiment of the invention provides a method and a device for identifying a risk user, computer equipment and a storage medium. Wherein the method comprises the following steps: constructing a graph according to a user data sample, wherein the user data sample comprises a plurality of user data, and a vertex of the graph is one user data of the user data sample; acquiring a node vector of each vertex of the graph; obtaining the distance between the node vectors of each vertex of the graph; determining an associated vertex group as a group according to the distance between the node vectors of the vertexes of the graph and a preset distance threshold, wherein if the distance between the node vectors of the vertexes corresponding to two users is smaller than the distance threshold, the two users are confirmed to belong to the same associated vertex group; determining the proportion of preset risk users in the group according to a user storage table of the group and a prestored risk user storage table, wherein the user storage table is used for storing a list of the users in the group, and the risk user storage table is used for storing a preset list of risk users; and judging whether the group is a risk group or not according to the preset proportion of the risk users in the group. According to the technical scheme, a graph is constructed according to a user data sample; acquiring a node vector of each vertex of the graph; determining a related vertex group as a group according to the distance between the node vectors of the vertexes of the graph and a preset distance threshold; and determining a risk group according to the preset proportion of the risk users in the group, and further determining suspicious risk users. In the scheme, a local one-sided artificial rule is skipped, a risk group is determined through the determined risk users, and then suspicious risk users are determined through the risk group. Because the suspicious risk users and the risk users belong to the same group, have similar behavior characteristics and are most likely to be the risk users, the accuracy of identifying the suspicious risk users is higher.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for identifying a risky user according to an embodiment of the present invention;
fig. 2 is a schematic sub-flow diagram of a method for identifying a risky user according to an embodiment of the present invention;
fig. 3 is a schematic sub-flow diagram of a method for identifying a risky user according to an embodiment of the present invention;
fig. 4 is a sub-flow diagram of a method for identifying a risky user according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of an apparatus for identifying a risky user according to an embodiment of the present invention;
fig. 6 is a schematic block diagram of a first obtaining unit of an apparatus for identifying a risky user according to an embodiment of the present invention;
fig. 7 is a schematic block diagram of a first obtaining unit of an apparatus for identifying a risky user according to an embodiment of the present invention;
fig. 8 is a schematic block diagram of a second obtaining unit of an apparatus for identifying a risky user according to an embodiment of the present invention;
fig. 9 is a schematic block diagram of a first determining unit of an apparatus for identifying a risky user according to an embodiment of the present invention;
fig. 10 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for identifying a risky user according to an embodiment of the present invention. The risk user identification method provided by the embodiment of the invention can be applied to a terminal. As shown, the method includes the following steps S1-S8.
S1, constructing a graph according to the user data sample, wherein the user data sample comprises a plurality of user data, and the vertex of the graph is one of the user data sample.
In one implementation, a graph is a more complex data structure than a linear table and a tree, where the relationship between vertices is arbitrary and any two vertices may be related. The diagram is a many-to-many data structure. It contains two parts, vertex set and edge set, the edge reflects the relation between the vertexes. If an association exists between two vertices of the graph, an edge exists between the two vertices; if there is no association between two vertices of the graph, there is no edge between the two vertices.
In particular, a Graph (Graph) is composed of a finite, non-empty set of vertices and a set of edges between the vertices, usually expressed as: g (V, E), where G represents a graph, V is the set of vertices in the graph G, and E is the set of edges in the graph G.
In the scheme, a graph is constructed according to a user data sample, wherein the user data sample comprises a plurality of user data, and the user data comprises a mobile phone number of a user, a home address of the user, a company address of the user, a mobile phone number of an emergency contact of the user and equipment related information of the user, such as an equipment ID, a WiFi/MAC address, a GPS coordinate and the like. In a specific implementation, the user data is used as the vertex of the graph. Furthermore, whether an edge exists between two vertexes is judged according to the relevance between the data of each vertex. Specifically, whether the same data exists between two vertexes is judged; and if the same data exists between the two vertexes, determining that an edge exists between the two vertexes, otherwise, determining that no edge exists between the two vertexes. For example, in one embodiment, vertex A and vertex B have an edge if the data for the device ID (or other data) is the same.
And S2, acquiring the node vector of each vertex of the graph.
In a specific implementation, each vertex in the graph is identified as a vector (i.e., a node vector of each vertex of the computation graph) by a Network Representation Learning technique (Network Representation Learning). The obtained vector form can have the capability of representation and reasoning in a vector space, and can be easily and conveniently used as the input of a machine learning model, and further the obtained vector form can be applied to common applications in a social network, such as visualization tasks, vertex classification tasks, link prediction, community discovery and other tasks, and can also be applied to other common tasks such as a recommendation system as social side information.
Referring to FIG. 2, in one embodiment, the above step S2 specifically includes the following steps S21-S22.
And S21, starting from one vertex of the graph, and performing random walk according to the edges between the vertex and other vertices to obtain a vertex sequence with a preset length.
In a specific implementation, a Deepwalk algorithm is adopted to obtain a node vector of a vertex of the graph. Specifically, starting from a vertex in the graph, random walk is performed according to edges between the vertices, so as to obtain a preset vertex sequence with a fixed length.
And S22, inputting the vertex sequence into a preset word vector training model for learning to obtain a node vector.
In a specific implementation, the obtained vertex sequence is analogized to a sentence in a natural language (the vertex sequence is a sentence, and the vertices in the sequence are words in the sentence), and the vertex sequence is input to a word vector training model (for example, a skip-gram model) to be learned so as to obtain a node vector of the vertex.
Referring to fig. 3, in an embodiment, the above step S2 specifically includes the following steps S210 to S220.
And S210, starting from one vertex of the graph, performing wandering according to edges between the vertex and other vertexes to obtain a vertex sequence with a preset length, wherein in the wandering process, the probability of returning to the original vertex is a preset returning probability parameter, and the probability of not returning to the original vertex is a preset leaving probability parameter.
In a specific implementation, a Node2vec algorithm is used to obtain a Node vector of a vertex of the graph. Specifically, a Return probability parameter (Return parameter) p (i.e., a probability of returning to the original vertex) is predefined; and define the departure probability parameter (Inoutparameter) q, (i.e., the probability of not returning to the original vertex). Based on the above-mentioned return probability parameter p and departure probability parameter q, a walk is performed from one vertex in the graph according to an edge between the vertices (the probability of returning to the previous vertex during the walk is p, and the probability of not returning to the previous vertex is q), and a vertex sequence of a preset fixed length is obtained.
S220, inputting the vertex sequence into a preset word vector training model for learning to obtain a node vector.
In a specific implementation, the obtained vertex sequence is analogized to a sentence in a natural language (the vertex sequence is a sentence, and the vertices in the sequence are words in the sentence), and the vertex sequence is input to a word vector training model (for example, a skip-gram model) to be learned so as to obtain a node vector of the vertex.
Alternatively, in other embodiments, other network representation learning algorithms may be used to obtain the node vectors of the vertices of the graph, and the present solution is not limited in particular.
And S3, acquiring the distance between the node vectors of each vertex of the graph.
In a specific implementation, after the node vectors of the vertices of the graph are obtained, the distances between the node vectors are further obtained. The distance between the node vectors can be used for representing the similarity between the node vectors, the smaller the distance between the node vectors is, the higher the similarity between the node vectors is, and the larger the distance between the node vectors is, the lower the similarity between the node vectors is.
In one embodiment, a two-node vector a (x)11,x12,…,x1n) And node vector b (x)21,x22,…,x2n) The distance d between can be calculated by the following formula
Figure BDA0002291958620000061
Calculation of where x11,x12,…,x1nIs a component of the node vector a; x is the number of21,x22,…,x2nAre components of the node vector b.
And S4, determining the associated vertex group as a group according to the distance between the node vectors of the vertexes of the graph and a preset distance threshold, wherein if the distance between the node vectors of the vertexes corresponding to two users is smaller than the distance threshold, the two users are confirmed to belong to the same associated vertex group.
In specific implementation, a distance threshold is set first, and then whether the distance between the node vectors of the vertices corresponding to two users is smaller than the distance threshold is calculated and judged respectively, if yes, the two users are judged to belong to the same associated vertex group. The users in each associated vertex group act as a group.
It should be noted that the distance threshold may be set by a person skilled in the art based on experience, and the present solution is not limited in particular.
And S5, determining the preset risk user proportion in the group according to the user storage table of the group and the pre-stored risk user storage table.
The user storage table is used for storing a list of users of the group, and the risk user storage table is used for storing a preset list of risk users.
In a specific implementation, a risk user storage table (the risk user storage table is used for storing a list of risk users) is stored in advance. After the group is determined, a list of group users is stored using a user storage table. And then comparing the user storage table of the group with the risk user storage table to determine the number of the dangerous users in the user storage table of the group, and further determining the proportion of the total dangerous users of the group.
And S6, judging whether the group is a risk group according to the preset proportion of the risk users in the group.
In specific implementation, whether the group is a risk group is judged according to the preset proportion of risk users in the group.
Referring to FIG. 4, in one embodiment, the above step S6 specifically includes the following steps S61-S63.
And S61, judging whether the ratio of the preset risk users in the group is larger than a preset ratio threshold value.
In specific implementation, whether the proportion of the preset risk users in the group is greater than a preset proportion threshold is judged, if yes, the proportion of the risk users in the group is higher, and therefore the group is judged to be a risk group; otherwise, the proportion of the risk users in the group is low, and therefore the group is judged to be a non-risk group.
It should be noted that the duty ratio threshold can be set empirically by one skilled in the art, for example, in one embodiment, the duty ratio threshold is set to 10%.
And S62, if the proportion of the preset risk users in the group is larger than a preset proportion threshold value, determining that the group is a risk group.
In specific implementation, if the proportion of the preset risk users in the group is greater than a preset proportion threshold, the group is determined to be a risk group. In the risk group, users not marked as risk users are suspicious risk users, and the suspicious risk users are also very likely to be cheating risk users.
And S63, if the proportion of the preset risk users in the group is not greater than a preset proportion threshold, determining that the group is a non-risk group.
In specific implementation, if the proportion of the preset risk users in the group is not greater than a preset proportion threshold, the group is determined to be a non-risk group.
And S7, if the group is a risk group, acquiring users except the preset risk user in the risk group as target users.
In specific implementation, if the group is a risk group, users other than preset risk users in the risk group are obtained as target users. For example, in one embodiment, the risk groups include users a1, a2, and A3. Where a1 is the preset risk user, then a2 and A3 are the target users.
And S8, marking the target user as a suspicious risk user.
In a specific implementation, the target users are marked as suspicious risk users, and the suspicious risk users are also very likely to be cheating risk users.
According to the technical scheme, a graph is constructed according to a user data sample; acquiring a node vector of each vertex of the graph; determining a related vertex group as a group according to the distance between the node vectors of the vertexes of the graph and a preset distance threshold; and determining a risk group according to the preset proportion of the risk users in the group, and further determining suspicious risk users. In the scheme, a local one-sided artificial rule is skipped, a risk group is determined through the determined risk users, and then suspicious risk users are determined through the risk group. Because the suspicious risk users and the risk users belong to the same group, have similar behavior characteristics and are most likely to be the risk users, the accuracy of identifying the suspicious risk users is higher.
Fig. 5 is a schematic block diagram of an apparatus 60 for identifying a risky user according to an embodiment of the present invention. As shown in fig. 5, the present invention further provides a risky user identification apparatus 60 corresponding to the above risky user identification method. The risky user identification apparatus 60 includes means for performing the risky user identification method described above, and the apparatus may be configured in a desktop computer, a tablet computer, a laptop computer, or the like. Specifically, referring to fig. 5, the risky user identifying apparatus 60 includes a constructing unit 61, a first obtaining unit 62, a second obtaining unit 63, a first determining unit 64, a second determining unit 65, a first judging unit 66, a third obtaining unit 67, and a marking unit 68.
The constructing unit 61 is configured to construct a graph according to a user data sample, where the user data sample includes a plurality of user data, and a vertex of the graph is one of the user data sample.
A first obtaining unit 62 is configured to obtain a node vector of each vertex of the graph.
A second obtaining unit 63, configured to obtain distances between node vectors of vertices of the graph.
A first determining unit 64, configured to determine, according to a distance between node vectors of vertices of the graph and a preset distance threshold, an associated vertex group as a group, where if the distance between node vectors of vertices corresponding to two users is smaller than the distance threshold, it is determined that the two users belong to the same associated vertex group.
A second determining unit 65, configured to determine a preset risk user proportion in the group according to a user storage table of the group and a pre-stored risk user storage table, where the user storage table is used to store a list of users of the group, and the risk user storage table is used to store a preset list of risk users.
The first judging unit 66 is configured to judge whether the group is a risk group according to a preset proportion of risk users in the group.
A third obtaining unit 67, configured to obtain, if the group is a risk group, users other than the preset risk user in the risk group as target users.
A marking unit 68, configured to mark the target user as a suspicious risk user.
In one embodiment, as shown in fig. 6, the first obtaining unit 62 includes a first wandering unit 621 and a first input unit 622.
The first wandering unit 621 performs random wandering from a vertex of the graph according to edges between the vertex and other vertices, to obtain a vertex sequence of a preset length.
The first input unit 622 is configured to input the vertex sequence into a preset word vector training model for learning to obtain a node vector.
In an embodiment, as shown in fig. 7, the first obtaining unit 62 includes a second wander unit 623 and a second input unit 624.
And a second wandering unit 623 configured to, starting from a vertex of the graph, perform wandering according to an edge between the vertex and another vertex to obtain a vertex sequence of a preset length, where in the wandering process, a probability of returning to the original vertex is a preset return probability parameter, and a probability of not returning to the original vertex is a preset departure probability parameter.
The second input unit 624 is configured to input the vertex sequence into a preset word vector training model for learning to obtain a node vector.
In an embodiment, as shown in fig. 8, the second obtaining unit 63 comprises a calculating unit 631.
A calculating unit 631 for calculating by the following formula
Figure BDA0002291958620000091
Calculating the distance between two node vectors, wherein x11, x12, … and x1n are components of the node vector a; x21, x22, …, x2n are components of node vector b.
In one embodiment, as shown in fig. 9, the first determining unit 66 includes a second determining unit 661, a first determining unit 662 and a second determining unit 663.
The second determining unit 661 is configured to determine whether the ratio of the preset risk users in the group is greater than a preset ratio threshold.
A first determining unit 662, configured to determine that the group is a risk group if the proportion of the preset risk users in the group is greater than a preset proportion threshold.
The second determining unit 663 is configured to determine that the group is a non-risk group if the proportion of the preset risk users in the group is not greater than a preset proportion threshold.
It should be noted that, as can be clearly understood by those skilled in the art, the detailed implementation process of the risk user identification apparatus 60 and each unit may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, no further description is provided herein.
The at-risk user identification means 60 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 10.
Referring to fig. 10, fig. 10 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a terminal, wherein the terminal may be an electronic device with a communication function, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device.
Referring to fig. 10, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, causes the processor 502 to perform a method of risk user identification.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 may be caused to perform an at-risk user identification method.
The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 10 is a block diagram of only a portion of the configuration relevant to the present teachings and is not intended to limit the computing device 500 to which the present teachings may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:
constructing a graph according to a user data sample, wherein the user data sample comprises a plurality of user data, and a vertex of the graph is one user data of the user data sample;
acquiring a node vector of each vertex of the graph;
obtaining the distance between the node vectors of each vertex of the graph;
determining an associated vertex group as a group according to the distance between the node vectors of the vertexes of the graph and a preset distance threshold, wherein if the distance between the node vectors of the vertexes corresponding to two users is smaller than the distance threshold, the two users are confirmed to belong to the same associated vertex group;
determining the proportion of preset risk users in the group according to a user storage table of the group and a prestored risk user storage table, wherein the user storage table is used for storing a list of the users in the group, and the risk user storage table is used for storing a preset list of risk users;
judging whether the group is a risk group or not according to the preset proportion of the risk users in the group;
if the group is a risk group, acquiring users except for preset risk users in the risk group as target users;
and marking the target user as a suspicious risk user.
In an embodiment, when the processor 502 implements the step of obtaining the node vector of each vertex of the graph, the following steps are specifically implemented:
starting from one vertex of the graph, carrying out random walk according to edges between the vertex and other vertices to obtain a vertex sequence with a preset length;
and inputting the vertex sequence into a preset word vector training model for learning to obtain a node vector.
In an embodiment, when the processor 502 implements the step of obtaining the node vector of each vertex of the graph, the following steps are specifically implemented:
starting from one vertex of the graph, performing migration according to edges between the vertex and other vertices to obtain a vertex sequence with a preset length, wherein in the migration process, the probability of returning to the original vertex is a preset return probability parameter, and the probability of not returning to the original vertex is a preset departure probability parameter;
and inputting the vertex sequence into a preset word vector training model for learning to obtain a node vector.
In an embodiment, when the step of obtaining the distance between the node vectors of the vertices of the graph is implemented, the processor 502 specifically implements the following steps:
by the following formula
Figure BDA0002291958620000111
Calculating the distance between two node vectors, where x11,x12,…,x1nIs a component of the node vector a; x is the number of21,x22,…,x2nAre components of the node vector b.
In an embodiment, when the step of determining whether the group is a risk group according to the ratio of the preset risk users in the group is implemented by the processor 502, the following steps are specifically implemented:
judging whether the ratio of preset risk users in the group is greater than a preset ratio threshold value or not;
if the proportion of the preset risk users in the group is larger than a preset proportion threshold value, judging the group as a risk group;
and if the proportion of the preset risk users in the group is not greater than a preset proportion threshold value, determining that the group is a non-risk group.
It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program may be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. The computer program, when executed by a processor, causes the processor to perform the steps of:
constructing a graph according to a user data sample, wherein the user data sample comprises a plurality of user data, and a vertex of the graph is one user data of the user data sample;
acquiring a node vector of each vertex of the graph;
obtaining the distance between the node vectors of each vertex of the graph;
determining an associated vertex group as a group according to the distance between the node vectors of the vertexes of the graph and a preset distance threshold, wherein if the distance between the node vectors of the vertexes corresponding to two users is smaller than the distance threshold, the two users are confirmed to belong to the same associated vertex group;
determining the proportion of preset risk users in the group according to a user storage table of the group and a prestored risk user storage table, wherein the user storage table is used for storing a list of the users in the group, and the risk user storage table is used for storing a preset list of risk users;
judging whether the group is a risk group or not according to the preset proportion of the risk users in the group;
if the group is a risk group, acquiring users except for preset risk users in the risk group as target users;
and marking the target user as a suspicious risk user.
In an embodiment, when the step of obtaining the node vector of each vertex of the graph is implemented by the processor executing the computer program, the following steps are specifically implemented:
starting from one vertex of the graph, carrying out random walk according to edges between the vertex and other vertices to obtain a vertex sequence with a preset length;
and inputting the vertex sequence into a preset word vector training model for learning to obtain a node vector.
In an embodiment, when the step of obtaining the node vector of each vertex of the graph is implemented by the processor executing the computer program, the following steps are specifically implemented:
starting from one vertex of the graph, performing migration according to edges between the vertex and other vertices to obtain a vertex sequence with a preset length, wherein in the migration process, the probability of returning to the original vertex is a preset return probability parameter, and the probability of not returning to the original vertex is a preset departure probability parameter;
and inputting the vertex sequence into a preset word vector training model for learning to obtain a node vector.
In an embodiment, when the processor executes the computer program to implement the step of obtaining distances between node vectors of vertices of the graph, the processor specifically implements the following steps:
by the following formula
Figure BDA0002291958620000131
Calculating the distance between two node vectors, where x11,x12,…,x1nIs a component of the node vector a; x is the number of21,x22,…,x2nAre components of the node vector b.
In an embodiment, when the processor executes the computer program to implement the step of determining whether the group is a risk group according to the ratio of the risk users preset in the group, the following steps are specifically implemented:
judging whether the ratio of preset risk users in the group is greater than a preset ratio threshold value or not;
if the proportion of the preset risk users in the group is larger than a preset proportion threshold value, judging the group as a risk group;
and if the proportion of the preset risk users in the group is not greater than a preset proportion threshold value, determining that the group is a non-risk group.
The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, while the invention has been described with respect to the above-described embodiments, it will be understood that the invention is not limited thereto but may be embodied with various modifications and changes.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for identifying an at-risk user, comprising:
constructing a graph according to a user data sample, wherein the user data sample comprises a plurality of user data, and a vertex of the graph is one user data of the user data sample;
acquiring a node vector of each vertex of the graph;
obtaining the distance between the node vectors of each vertex of the graph;
determining an associated vertex group as a group according to the distance between the node vectors of the vertexes of the graph and a preset distance threshold, wherein if the distance between the node vectors of the vertexes corresponding to two users is smaller than the distance threshold, the two users are confirmed to belong to the same associated vertex group;
determining the proportion of preset risk users in the group according to a user storage table of the group and a prestored risk user storage table, wherein the user storage table is used for storing a list of the users in the group, and the risk user storage table is used for storing a preset list of risk users;
judging whether the group is a risk group or not according to the preset proportion of the risk users in the group;
if the group is a risk group, acquiring users except for preset risk users in the risk group as target users;
and marking the target user as a suspicious risk user.
2. The method of claim 1, wherein obtaining a node vector for each vertex of the graph comprises:
starting from one vertex of the graph, carrying out random walk according to edges between the vertex and other vertices to obtain a vertex sequence with a preset length;
and inputting the vertex sequence into a preset word vector training model for learning to obtain a node vector.
3. The method of claim 1, wherein obtaining a node vector for each vertex of the graph comprises:
starting from one vertex of the graph, performing migration according to edges between the vertex and other vertices to obtain a vertex sequence with a preset length, wherein in the migration process, the probability of returning to the original vertex is a preset return probability parameter, and the probability of not returning to the original vertex is a preset departure probability parameter;
and inputting the vertex sequence into a preset word vector training model for learning to obtain a node vector.
4. The method of claim 1, wherein obtaining distances between node vectors of vertices of the graph comprises:
by the following formula
Figure FDA0002291958610000011
Calculating the distance between two node vectors, where x11,x12,…,x1nIs a component of the node vector a; x is the number of21,x22,…,x2nAre components of the node vector b.
5. The method according to claim 1, wherein the determining whether the group is a risk group according to the preset proportion of risk users in the group comprises:
judging whether the ratio of preset risk users in the group is greater than a preset ratio threshold value or not;
and if the preset proportion of the risk users in the group is larger than a preset proportion threshold value, judging that the group is a risk group.
6. The method according to claim 5, wherein the determining whether the group is a risk group according to the preset proportion of the risk users in the group further comprises:
and if the proportion of the preset risk users in the group is not greater than a preset proportion threshold value, determining that the group is a non-risk group.
7. An apparatus for identifying an at-risk user, comprising:
the construction unit is used for constructing a graph according to a user data sample, the user data sample comprises a plurality of user data, and a vertex of the graph is one of the user data sample;
a first obtaining unit configured to obtain a node vector of each vertex of the graph;
a second obtaining unit configured to obtain a distance between node vectors of vertices of the graph;
a first determining unit, configured to determine, according to a distance between node vectors of vertices of the graph and a preset distance threshold, an associated vertex group as a group, where if the distance between node vectors of vertices corresponding to two users is smaller than the distance threshold, it is determined that the two users belong to the same associated vertex group;
a second determining unit, configured to determine a preset risk user proportion in the group according to a user storage table of the group and a pre-stored risk user storage table, where the user storage table is used to store a list of users of the group, and the risk user storage table is used to store a preset list of risk users;
the first judgment unit is used for judging whether the group is a risk group according to the preset proportion of risk users in the group;
a third obtaining unit, configured to obtain, if the group is a risk group, users other than preset risk users in the risk group as target users;
and the marking unit is used for marking the target user as a suspicious risk user.
8. The apparatus according to claim 7, wherein the first obtaining unit comprises:
and the first wandering unit starts from one vertex of the graph and conducts random wandering according to edges between the vertex and other vertices to obtain a vertex sequence with a preset length.
And the first input unit is used for inputting the vertex sequence into a preset word vector training model for learning so as to obtain a node vector.
9. A computer arrangement, characterized in that the computer arrangement comprises a memory having stored thereon a computer program and a processor implementing the method according to any of claims 1-6 when executing the computer program.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when being executed by a processor, is adapted to carry out the method according to any one of claims 1-6.
CN201911183921.5A 2019-11-27 2019-11-27 Risk user identification method and device, computer equipment and storage medium Pending CN111049809A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911183921.5A CN111049809A (en) 2019-11-27 2019-11-27 Risk user identification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911183921.5A CN111049809A (en) 2019-11-27 2019-11-27 Risk user identification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111049809A true CN111049809A (en) 2020-04-21

Family

ID=70233781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911183921.5A Pending CN111049809A (en) 2019-11-27 2019-11-27 Risk user identification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111049809A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612039A (en) * 2020-04-24 2020-09-01 平安直通咨询有限公司上海分公司 Abnormal user identification method and device, storage medium and electronic equipment
CN113159793A (en) * 2020-12-09 2021-07-23 同盾控股有限公司 Data processing method and device, electronic equipment and computer storage medium
CN114119037A (en) * 2022-01-24 2022-03-01 深圳尚米网络技术有限公司 Marketing anti-cheating system based on big data
CN116094827A (en) * 2023-01-18 2023-05-09 支付宝(杭州)信息技术有限公司 Safety risk identification method and system based on topology enhancement

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563429A (en) * 2017-07-27 2018-01-09 国家计算机网络与信息安全管理中心 A kind of sorting technique and device of network user colony
CN109670937A (en) * 2018-09-26 2019-04-23 平安科技(深圳)有限公司 Risk subscribers recognition methods, user equipment, storage medium and device
US20190207960A1 (en) * 2017-12-29 2019-07-04 DataVisor, Inc. Detecting network attacks
CN109978538A (en) * 2017-12-28 2019-07-05 阿里巴巴集团控股有限公司 Determine fraudulent user, training pattern, the method and device for identifying risk of fraud
CN110009430A (en) * 2019-04-11 2019-07-12 腾讯科技(深圳)有限公司 Cheating user's detection method, electronic equipment and computer readable storage medium
CN110019989A (en) * 2019-04-08 2019-07-16 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN110046805A (en) * 2019-03-29 2019-07-23 阿里巴巴集团控股有限公司 Risk subscribers find method and device
CN110135853A (en) * 2019-04-25 2019-08-16 阿里巴巴集团控股有限公司 Clique's user identification method, device and equipment
CN110222554A (en) * 2019-04-16 2019-09-10 深圳壹账通智能科技有限公司 Cheat recognition methods, device, electronic equipment and storage medium
CN110245787A (en) * 2019-05-24 2019-09-17 阿里巴巴集团控股有限公司 A kind of target group's prediction technique, device and equipment
CN110362639A (en) * 2019-06-27 2019-10-22 上海淇馥信息技术有限公司 A kind of method for prewarning risk, device, electronic equipment calculating analysis based on demographic associations

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563429A (en) * 2017-07-27 2018-01-09 国家计算机网络与信息安全管理中心 A kind of sorting technique and device of network user colony
CN109978538A (en) * 2017-12-28 2019-07-05 阿里巴巴集团控股有限公司 Determine fraudulent user, training pattern, the method and device for identifying risk of fraud
US20190207960A1 (en) * 2017-12-29 2019-07-04 DataVisor, Inc. Detecting network attacks
CN109670937A (en) * 2018-09-26 2019-04-23 平安科技(深圳)有限公司 Risk subscribers recognition methods, user equipment, storage medium and device
CN110046805A (en) * 2019-03-29 2019-07-23 阿里巴巴集团控股有限公司 Risk subscribers find method and device
CN110019989A (en) * 2019-04-08 2019-07-16 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN110009430A (en) * 2019-04-11 2019-07-12 腾讯科技(深圳)有限公司 Cheating user's detection method, electronic equipment and computer readable storage medium
CN110222554A (en) * 2019-04-16 2019-09-10 深圳壹账通智能科技有限公司 Cheat recognition methods, device, electronic equipment and storage medium
CN110135853A (en) * 2019-04-25 2019-08-16 阿里巴巴集团控股有限公司 Clique's user identification method, device and equipment
CN110245787A (en) * 2019-05-24 2019-09-17 阿里巴巴集团控股有限公司 A kind of target group's prediction technique, device and equipment
CN110362639A (en) * 2019-06-27 2019-10-22 上海淇馥信息技术有限公司 A kind of method for prewarning risk, device, electronic equipment calculating analysis based on demographic associations

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
彭欣宇: "基于DeepWalk的社团检测方法", 《电脑知识与技术》 *
祝周等: "基于多维混合图和核心节点的社团发现算法", 《网络空间安全》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612039A (en) * 2020-04-24 2020-09-01 平安直通咨询有限公司上海分公司 Abnormal user identification method and device, storage medium and electronic equipment
CN111612039B (en) * 2020-04-24 2023-09-29 平安直通咨询有限公司上海分公司 Abnormal user identification method and device, storage medium and electronic equipment
CN113159793A (en) * 2020-12-09 2021-07-23 同盾控股有限公司 Data processing method and device, electronic equipment and computer storage medium
CN114119037A (en) * 2022-01-24 2022-03-01 深圳尚米网络技术有限公司 Marketing anti-cheating system based on big data
CN116094827A (en) * 2023-01-18 2023-05-09 支付宝(杭州)信息技术有限公司 Safety risk identification method and system based on topology enhancement

Similar Documents

Publication Publication Date Title
CN111049809A (en) Risk user identification method and device, computer equipment and storage medium
CN109241415B (en) Project recommendation method and device, computer equipment and storage medium
JP6484730B2 (en) Collaborative filtering method, apparatus, server, and storage medium for fusing time factors
TWI761642B (en) Method, device and electronic device for determining decision-making strategy corresponding to business
WO2021031825A1 (en) Network fraud identification method and device, computer device, and storage medium
WO2016151618A1 (en) Predictive model updating system, predictive model updating method, and predictive model updating program
CN110009486B (en) Method, system, equipment and computer readable storage medium for fraud detection
CN111460234B (en) Graph query method, device, electronic equipment and computer readable storage medium
CN111353850B (en) Risk identification strategy updating method and device and risk merchant identification method and device
CN111090807A (en) Knowledge graph-based user identification method and device
CN110083507B (en) Key performance index classification method and device
CN113379301A (en) Method, device and equipment for classifying users through decision tree model
CN111027412A (en) Human body key point identification method and device and electronic equipment
CN108449311A (en) A kind of social networks hiding method based on attack node similitude
CN114398521A (en) Device type determining method and data processing system for acquiring abnormal device
CN114297665A (en) Intelligent contract vulnerability detection method and device based on deep learning
CN112966964A (en) Product matching method, device, equipment and storage medium based on design requirements
CN110990834B (en) Static detection method, system and medium for android malicious software
CN114998707B (en) Attack method and device for evaluating robustness of target detection model
CN115099366A (en) Classification prediction method and device and electronic equipment
JP6844143B2 (en) Information processing device
JP2010250833A5 (en)
CN109284354B (en) Script searching method and device, computer equipment and storage medium
CN111727108B (en) Method, device and system for controlling robot and storage medium
CN111191675A (en) Pedestrian attribute recognition model implementation method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200421

WD01 Invention patent application deemed withdrawn after publication