WO2021159766A1 - 一种数据识别方法、装置、设备以及可读存储介质 - Google Patents

一种数据识别方法、装置、设备以及可读存储介质 Download PDF

Info

Publication number
WO2021159766A1
WO2021159766A1 PCT/CN2020/126055 CN2020126055W WO2021159766A1 WO 2021159766 A1 WO2021159766 A1 WO 2021159766A1 CN 2020126055 W CN2020126055 W CN 2020126055W WO 2021159766 A1 WO2021159766 A1 WO 2021159766A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
node
abnormal
users
target user
Prior art date
Application number
PCT/CN2020/126055
Other languages
English (en)
French (fr)
Inventor
郑巧玲
石志林
应秋芳
胡彬
张�浩
张纪红
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021159766A1 publication Critical patent/WO2021159766A1/zh
Priority to US17/672,814 priority Critical patent/US20220172090A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/316User authentication by observing the pattern of computer usage, e.g. typical user behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data

Definitions

  • This application relates to the field of computer technology, and in particular to a data recognition method, device, equipment, and readable storage medium.
  • the identification of an abnormal user is mainly through the identification of the user's behavior characteristic data. If the user's behavior characteristic data matches the behavior characteristic data of the abnormal user, the user is determined as an abnormal user. However, there may be abnormal users who will imitate the legitimate behavior of normal users, making the behavior characteristic data corresponding to such abnormal users closer to the legal behavior characteristic data, which will make the abnormal users be identified as normal users, so The recognition accuracy is not high.
  • the embodiments of the present application provide a data recognition method, device, equipment, and readable storage medium, which can improve the accuracy of data recognition.
  • One aspect of the embodiments of the present application provides a data identification method, including:
  • the foregoing target user set includes at least two users who have a social relationship;
  • the proliferation abnormal user is identified among the aforementioned users to be confirmed; the aforementioned user to be confirmed is the aforementioned Users other than the above abnormal users in the target user set.
  • One aspect of the embodiments of the present application provides a data recognition device, including:
  • a target user set obtaining module configured to obtain a target user set; the foregoing target user set includes at least two users who have a social relationship;
  • the abnormal user determination module is used to obtain the default abnormal user, and determine the abnormal user in the above-mentioned target user set according to the above-mentioned default abnormal user;
  • the behavior state detection module is used to determine the state of the above-mentioned target user set according to the above-mentioned abnormal users;
  • the abnormal diffusion user identification module is used to identify the abnormal diffusion among the users to be confirmed according to the social relationship between the abnormal user and the users to be confirmed in the target user set if the state of the target user set is abnormal.
  • User; the above-mentioned user to be confirmed is a user other than the above-mentioned abnormal user in the above-mentioned target user set.
  • the above abnormal user determination module includes:
  • the abnormal user determining unit is configured to match users in the target user set with the default abnormal user, and determine the user whose matching rate reaches the matching threshold as the abnormal user in the target user set.
  • the above behavior state detection module includes:
  • the total number of users acquiring unit is configured to acquire the number of abnormal users and the total number of users in the target user set;
  • An abnormal concentration determination unit configured to determine the abnormal concentration of the target user set according to the number of abnormal users and the total number of users in the target user set;
  • the first state determining unit is configured to determine the state of the target user set as a normal state if the abnormal concentration is less than the concentration threshold;
  • the first state determining unit is further configured to determine the state of the target user set as an abnormal state if the abnormal concentration is greater than or equal to a concentration threshold.
  • the above behavior state detection module includes:
  • the behavior feature acquiring unit is configured to acquire a user's social behavior feature set; the aforementioned user's social behavior feature set includes the social behavior feature of each user in the aforementioned user group;
  • the feature distribution determining unit is configured to determine the first feature distribution of the abnormal user according to the social behavior features in the user's social behavior feature set; the first feature distribution is used to characterize the social behavior characteristics of the abnormal user Number of species
  • the feature distribution determining unit is further configured to determine a second feature distribution of users in the target user set according to the social behavior features in the user's social behavior feature set; the second feature distribution is used to characterize the target user set The number of types of social behavior characteristics that users have;
  • the feature distribution difference degree determining unit is configured to determine the feature distribution difference degree between the abnormal user and the users in the target user set according to the first feature distribution concentration degree and the second feature distribution degree;
  • the second state determining unit is configured to determine the state of the target user set according to the first characteristic distribution degree and the characteristic distribution difference degree.
  • the second state determining unit is further configured to determine the state of the target user set as a normal state if the difference degree of the feature distribution is less than the difference degree threshold, and the first feature distribution degree is less than the distribution threshold;
  • the second state determining unit is further configured to determine the state of the target user set as a normal state if the characteristic distribution difference degree is greater than or equal to the difference degree threshold, and the first characteristic distribution degree is greater than or equal to the distribution threshold value. ;
  • the second state determining unit is further configured to determine the state of the target user set as an abnormal state if the characteristic distribution difference degree is greater than or equal to the difference degree threshold, and the first characteristic distribution degree is less than the distribution threshold value.
  • the aforementioned target user collection acquisition module includes:
  • the relationship topology diagram obtaining unit is used to obtain the relationship topology diagram corresponding to the user group;
  • the above relationship topology diagram includes N nodes k, and the N nodes k are one-to-one corresponding to the users in the user group, and N is the user group in the user group. Number of users; the edge weight between two nodes k is determined based on the social relationship between two users in the above-mentioned user group;
  • the sampling path obtaining unit is configured to obtain the sampling path corresponding to the node k in the above-mentioned relational topology diagram according to the number of path samples;
  • the jump probability determination unit is configured to determine the jump probability between the node k and the associated node in the sampling path according to the edge weights in the relationship topology graph; the associated node refers to the sampling path except for the node k Other nodes;
  • the target user set determining unit is configured to update the relationship topology diagram according to the jump probability to obtain an updated relationship topology diagram, and determine the target user set in the updated relationship topology diagram.
  • the above-mentioned relational topology graph obtaining unit includes:
  • the user group acquisition subunit is used to acquire the user group, and each user in the above-mentioned user group is regarded as a node k;
  • the weight setting subunit is used to connect the edges between the nodes k corresponding to the users with the social relationship, and set the initial edges between the nodes k according to the social behavior records between the users with the social relationship. Weights;
  • the probability conversion subunit is used to perform probability conversion of the above initial weights to obtain the above edge weights
  • the relational topology graph generating subunit is configured to generate the aforementioned relational topology graph according to the node k corresponding to the aforementioned user group and the aforementioned edge weight.
  • the above jump probability determination unit includes:
  • the intermediate node obtaining subunit is used to obtain an intermediate node between the node k and the associated node in the sampling path if there is no edge between the node k and the associated node; the node k can be obtained through the intermediate node Reach the above-mentioned associated node;
  • the connecting node pair determining subunit is configured to use two nodes with edges in the node k, the intermediate node, and the associated node as the connecting node pair to obtain the edge weight corresponding to the connecting node pair;
  • the jump probability determination subunit is used to determine the jump probability between the above-mentioned node k and the above-mentioned associated node according to the edge weight corresponding to the above-mentioned connecting node pair.
  • the above-mentioned target user set determining unit includes:
  • the update node edge subunit is used to update the connected edges in the above-mentioned relationship topology graph according to the above-mentioned node k and the above-mentioned associated node to obtain a transitional relationship topology graph; the above-mentioned node k and the above-mentioned associated node in the transitional relationship topology graph All connected with edges;
  • the edge weight setting subunit is used to set the jump probability between the above node k and the above associated node as the edge weight between the above node k and the above associated node in the above transition relationship topology graph to obtain the target relationship topology picture;
  • the target user set determining subunit is used to determine the target user set in the target relationship topology diagram.
  • the above-mentioned target user set determining subunit is also used to exponentially increase the above-mentioned jump probability, transform the jump probability obtained after the exponential increase, to obtain the target probability, and update the above-mentioned node k and the above-mentioned node k according to the above-mentioned target probability.
  • the above-mentioned target user set determining subunit is also used to determine the associated node whose edge weight after the update is greater than the weight threshold as an important associated node of the above-mentioned node k;
  • the target user set determining subunit is also used to divide the target relationship topology map into at least two community topology maps based on the node k and the important associated nodes, and obtain the target community topology map from the at least two community topology maps. , As the above-mentioned target user set.
  • the above-mentioned proliferation abnormal user identification module includes:
  • the first association user determination unit is configured to determine, among the users to be confirmed, users who have a social association relationship with the abnormal user if the state of the target user set is an abnormal state;
  • the first abnormal diffusion user determination unit is configured to determine the user who has a social relationship with the abnormal user as the abnormal diffusion user.
  • the above-mentioned proliferation abnormal user identification module includes:
  • the second association user determination unit is configured to, if the state of the above-mentioned target user set is an abnormal state, determine users who have a social association relationship with the above-mentioned abnormal user among the above-mentioned users to be confirmed;
  • the second diffusion abnormal user determination unit is configured to obtain abnormal user nodes corresponding to the abnormal users, obtain the associated user nodes corresponding to the users who have a social relationship with the abnormal users, and compare the abnormal user nodes with the associated user nodes.
  • the associated user node whose edge weight is greater than the associated threshold is determined to be the abnormal diffusion node, and the user corresponding to the abnormal diffusion node is determined as the abnormal diffusion user.
  • a set of to-be-identified users determining module configured to determine the set of target users in an abnormal state as the set of users to be identified
  • the key text data extraction module is used to obtain user text data of users in the aforementioned user set to be identified, and extract key text data from the aforementioned user text data;
  • Sensitive source data acquisition module used to acquire sensitive source data
  • the abnormal category determination module is used to match the above-mentioned key text data with the above-mentioned sensitive source data, and determine the abnormal category of the above-mentioned set of users to be identified according to the matching result.
  • One aspect of the embodiments of the present application provides a computer device, including: a processor and a memory;
  • the above-mentioned memory stores a computer program, and when the above-mentioned computer program is executed by the above-mentioned processor, the indicted processor executes the method as in the embodiment of the present application.
  • the embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program includes program instructions. In the method.
  • the embodiment of the present application obtains a target user set; the target user set includes at least two users with a social relationship; obtains the default abnormal user, and determines the abnormal user in the target user set according to the default abnormal user; according to the abnormal user , Determine the status of the above-mentioned target user set; if the status of the above-mentioned target user set is an abnormal state, identify diffusion among the above-mentioned users to be confirmed according to the social relationship between the abnormal user and the users to be confirmed in the above-mentioned target user set Abnormal users; the above-mentioned users to be confirmed are users other than the above-mentioned abnormal users in the above-mentioned target user set.
  • Figure 1 is a network architecture diagram provided by an embodiment of the present application.
  • Fig. 2A is a schematic diagram of a scenario for determining proliferation of abnormal users provided by an embodiment of the present application
  • FIG. 2B is a schematic diagram of a scenario for determining proliferation of abnormal users provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a data identification method provided by an embodiment of the present application.
  • 4A is a schematic diagram of a scenario for determining the status of a target user set provided by an embodiment of the present application
  • FIG. 4B is a schematic diagram of a scenario for determining the status of a target user set provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a process for obtaining a set of target users according to an embodiment of the present application
  • FIG. 6A is a schematic diagram of a node relationship list provided by an embodiment of the present application.
  • FIG. 6B is a schematic diagram of a node relationship provided by an embodiment of the present application.
  • 6C is a schematic diagram of a node relationship including initial weights provided by an embodiment of the present application.
  • FIG. 6D is a schematic diagram of a relationship topology diagram provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a scenario for dividing a community topology map provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a process for determining an abnormal category of a set of target users in an abnormal state according to an embodiment of the present application
  • FIG. 9 is a schematic structural diagram of a data identification device provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Fig. 1 is a network architecture diagram provided by an embodiment of the present application.
  • the network architecture may include a business server 1000 and a backend server cluster.
  • the aforementioned backend server cluster may include multiple backend servers, as shown in FIG. 1, for example, may include a backend server 100a, a backend server 100b, a backend server 100c, ..., a backend server 100n.
  • the back-end server 100a, the back-end server 100b, the back-end server 100c, ..., the back-end server 100n can be connected to the above-mentioned business server 1000 respectively, so that each back-end server can communicate with the business server 1000 through the network connection. Data exchange, so that the business server 1000 can receive business data from each backend server.
  • Each backend server shown in FIG. 1 corresponds to a user terminal, and can be used to store service data of the corresponding user terminal.
  • Each user terminal can be integrated and installed with the target application.
  • the target application When the target application is running in each user terminal, the background server corresponding to each user terminal can store the service data provided by the target application, and perform data interaction with the service server 1000 shown in FIG. 1 above.
  • the target application may include an application with a function of displaying data information such as text, image, audio, and video.
  • the application can be a payment application, which can be used to transfer funds between users; it can also be a social application, such as an instant messaging application, which can be used to communicate between users.
  • the business server 1000 in this application can collect data from the backends of these applications (such as the aforementioned backend server cluster).
  • the data can be user identity information (such as user id) used to characterize users, and transfer records between users.
  • the service server 1000 can use the users in the data as user nodes in the community, and can also determine the social relationship between these user nodes. Therefore, the social relationship in this article refers to the relationship in which the user has experienced any information transmission behavior during the use of the target application.
  • Information transmission behavior also known as social behavior, includes but is not limited to at least one of the following, user information transmission behavior (such as adding users as contacts, following users, etc.), content information transmission behavior (such as instant chat, audio/ Video call, content forwarding, message, reply message, etc.), fund transaction relationship (such as payment, transfer, etc.), etc.
  • user information transmission behavior such as adding users as contacts, following users, etc.
  • content information transmission behavior such as instant chat, audio/ Video call, content forwarding, message, reply message, etc.
  • fund transaction relationship such as payment, transfer, etc.
  • the method of each embodiment may be executed by one or more computing devices, for example, one or more computing devices in the business server 1000 and the backend server cluster shown in FIG. 1.
  • the computing device can divide the user group into at least two user sets (hereinafter also referred to as communities) according to the social association relationships and social behavior records between users in the user group. For example, the computing device can divide these users into multiple user sets based on the social behaviors among a large number of users collected, so that the social relationship between each user and the users in the user set to which the user belongs is similar to that of users in other user sets. Compared with the social relationship, the relationship is closer.
  • the computing device can identify abnormal users in each user set based on existing abnormal user samples, and determine whether the user set is in a normal state or an abnormal state based on the abnormal users in each user set. If the user set is in an abnormal state, the computing device determines the proliferation abnormal user in the user set according to the social association relationship between the abnormal user in the user set and other users in the user set.
  • one user terminal among multiple user terminals may be selected as the target user terminal.
  • the target user terminal may include: smart phones, tablet computers, desktop computers, and other smart terminals that carry the functions of displaying and playing data information.
  • the user terminal corresponding to the back-end server 100a shown in FIG. 1 may be used as the target user terminal, and the target user terminal may be integrated with the above-mentioned target application.
  • the back-end server 100a corresponding to the target user terminal Data interaction can be performed with the business server 1000.
  • the business server 1000 can detect and collect the social relationship between these large numbers of users through the background server.
  • the service server 1000 may determine that there is a social association relationship between the user A and the user B, and the social association relationship is a communication relationship. After detecting a large number of users and determining the social relationship between these users, the business server 1000 may regard these large numbers of users as a user group, and each user in the user group as a node, and will have a social relationship. An edge connection is made between the nodes corresponding to the users. According to the social behavior records between users with social association relationships, the edge weights are set for the edges between nodes. According to the user group and the edge weight, a topological graph of the generated relationship can be constructed.
  • the business server 1000 may divide the user group into at least two communities according to the social association relationship between users in the user group and social behavior records. Subsequently, based on the existing abnormal user samples, the business server 1000 can identify abnormal users in the above-mentioned communities, and based on the abnormal users in each community, the business server 1000 can determine whether the community is in a normal state or an abnormal state. If the community is in an abnormal state, the business server 1000 may obtain abnormal users in the abnormal community.
  • the business server 1000 can determine the proliferation abnormal users among the non-abnormal users in the abnormal community.
  • the purpose of determining the diffusion of abnormal users here is to identify a larger range of abnormal users, because the pre-detected abnormal user samples may have a small sample number and low coverage of abnormal users, which in turn makes the abnormal user samples in the abnormal.
  • the coverage of abnormal users identified in the community is small, and some abnormal users have not been identified. Therefore, in order to improve the recognition accuracy and expand the coverage, the proliferation abnormal users can be determined according to the social relationship of the abnormal users that have been identified in the abnormal communities.
  • the business server 1000 may adopt the following implementation manners for determining the abnormally proliferating users.
  • the business server 1000 may select a community topology map in the divided community topology map as the target user set, that is, the target user set includes at least two users who have a social association relationship.
  • the business server 1000 may obtain the default abnormal user (that is, an existing abnormal user sample). According to the default abnormal user, the business server 1000 can determine the abnormal user in the target user set, and according to the number of abnormal users and the total number of users in the target user set, the business server 1000 can detect the status of the target user set .
  • the business server 1000 may identify the abnormally diffused user among the users to be confirmed according to the social relationship between the abnormal user and the user to be confirmed in the target user set, and then Diffusion abnormal users are also regarded as abnormal users.
  • the user to be confirmed is a user other than the abnormal user in the target user set.
  • the business server 1000 may generate a recognition result according to the abnormal users in each relationship topology map, and return the recognition result to the background server.
  • the background server may determine a large number of users corresponding to their respective user terminals as user groups, divide them into different community topology maps according to the user groups, and then obtain different user sets, and identify abnormal users in the user sets Regarding the proliferation of abnormal users, the implementation of the background server identifying abnormal users and proliferating abnormal users can be referred to the description of the above-mentioned service server identifying abnormal users and proliferating abnormal users, which will not be repeated here.
  • the method provided in the embodiments of the present application may be executed by a computer device, which includes but is not limited to a terminal or a server.
  • Fig. 2A is a schematic diagram of a scenario for determining a proliferation abnormal user provided by an embodiment of the present application.
  • the business server 2000 can obtain the existing default abnormal user (that is, the existing abnormal user sample), and compare the default abnormal user with the node in the target user set 200a. The corresponding users are matched, and the users whose matching rate reaches the matching threshold are regarded as abnormal users.
  • the business server 2000 may determine the state of the target user set 200a as an abnormal state, that is, the target user set 200a is an abnormal community.
  • the proliferation abnormal user can be determined in the abnormal target user set 200a, for example, the user d and the user e have If the edge weight of user d and user e is 0.8, which is greater than the correlation threshold of 0.75, it can indicate that user e has a strong correlation with abnormal user d, and user e has a great probability of being an abnormal user.
  • the edge weight between user d and user c is 0.56. It can be seen that 0.56 is much smaller than the correlation threshold of 0.75.
  • the degree of association is very weak, and the probability that the user c is an abnormal user is small, and the user c can be regarded as a non-abnormal user.
  • the edge weight between user k and user g is 0.5, which is much smaller than the correlation threshold of 0.75, then user g can be regarded as a non-abnormal user, which is the difference between user k and user e It has an edge, but it is not the edge from user k to user e, so it can be considered that user k cannot reach user e.
  • the service and server 2000 may determine the user e as a proliferation abnormal user. Subsequently, the business server 2000 may determine an abnormal user in the target user set 200a, and the abnormal user may include an abnormal user e, an abnormal user d, and an abnormal user k.
  • Fig. 2B is a schematic diagram of a scenario for determining a proliferation abnormal user provided by an embodiment of the present application.
  • the service server 2000 may identify the user d and the user k as abnormal users in the target user set 200a.
  • the implementation manner in which the business server 2000 recognizes that the user d and the user k are abnormal users in the target user set 200a can be seen in the above-mentioned FIG. 2A where the business server 2000 recognizes that the user d and the user k are abnormal users Description, I won’t repeat it here.
  • the service server 2000 can determine that the target user set 200a is in an abnormal state.
  • the social association relationship between the abnormal user d and the abnormal user k that is, whether there is an edge in the target user set 200a
  • the proliferation abnormal user can be determined. For example, if there is an edge between the abnormal user d and the user e, it can indicate that there is a social relationship between the user e and the abnormal user d.
  • the service server 2000 can determine the user e To proliferate abnormal users.
  • the service server 2000 may determine the user c as a proliferation abnormal user. Similarly, if there is an edge between the abnormal user k and the user g, the service server 2000 may determine the user g as a proliferation abnormal user.
  • the business server 2000 may determine an abnormal user in the target user set 200a, and the abnormal user is a diffusing abnormal user e, an abnormal user d, an abnormal user k, a diffusing abnormal user c, and a diffusing abnormal user g.
  • FIG. 3 is a schematic flowchart of a data identification method provided by an embodiment of the present application. As shown in Figure 3, the process of the method may include the following steps.
  • Step S101 Obtain a target user set, where the target user set includes at least two users who have a social relationship.
  • the target user set can be determined from multiple users.
  • the multiple users may be multiple users screened out according to preset conditions, or multiple users corresponding to a certain background server, or all users of a social application (also referred to as a user group).
  • the determined target user set satisfies the following condition: the closeness of the social relationship between users in the target user set is higher than the social relationship between users in the target user set and users in the target user set Tightness.
  • the closeness of the social relationship between users can be determined according to the user's social behavior records.
  • the social behavior record may include, but is not limited to, the frequency of information interaction between users, the number of information interactions, the duration of information interaction, the amount of information interacted, and the transaction amount, etc.
  • the target user set may be a community topology map.
  • the community topology graph includes the nodes corresponding to the users, the edges between the nodes, and the edge weight of each edge. Among them, the edges between nodes are used to indicate the social relationship between nodes (users), and the edge weight is used to indicate the degree of association. If two users have a social association relationship, the nodes corresponding to the two users have edges. The closer the relationship between the two users, the greater the degree of association and the greater the edge weight.
  • the community topology graph can be used to indicate whether there is a social association relationship between nodes, and the degree of association between two nodes that have a social association relationship.
  • the social association relationship here can be payment relationship, communication friend relationship, device association relationship, etc.
  • user a and user b can be determined as Have a device association relationship.
  • user a and user b can be determined as Have a device association relationship.
  • other forms of relationship can also be used (for example, two user social accounts do not have a friend relationship, but they have had a conversation through the social account) , This application does not limit the scope of social relationships.
  • the target user set can be obtained from the relationship topology map corresponding to the user group, that is, the nodes in the target user set are some nodes in the relationship topology map of the user group.
  • the relationship topology graph can be divided, so that at least two community topology graphs can be obtained, and the at least two community topology graphs are optional
  • One can be used as a set of target users.
  • the user group can be divided into at least two communities according to the social association relationship and the degree of association between users in the user group, where the degree of association between users in each community is close.
  • Step S102 Obtain a default abnormal user, and determine the abnormal user in the above-mentioned target user set according to the above-mentioned default abnormal user.
  • the default abnormal user may be a preset abnormal user sample, and the abnormal user sample may be a pre-detected abnormal user.
  • the number of default abnormal users can include at least two.
  • the default abnormal users can include user attribute information (such as id, name, fingerprint, etc.). Taking attribute information as id as an example, each user in the above target user set can be The id of is matched with the id of the default abnormal user, and the user whose matching rate reaches the matching threshold in the target user set can be determined as the abnormal user in the target user set.
  • the default abnormal user includes ⁇ default abnormal user 1, 1> and ⁇ default abnormal user 2, 2>, that is, the id of the default abnormal user 1 and the default abnormal user 1 is 1, and the default abnormal user 2 and the default abnormal user 2 are also included.
  • the id of is 2, and the target user set includes ⁇ user A, 1>, ⁇ user B, 4>, ⁇ user C, 6> ⁇ , then the id of the default abnormal user 1 (ie 1 and 2) can be combined with the Match the user IDs (ie 1, 4, 6) of the target user set, and the matching result is that the id1 of user A matches the id1 of the default abnormal user 1, and then user A can be determined as the abnormal user in the target user set .
  • Step S103 Determine the state of the above-mentioned target user set according to the above-mentioned abnormal users.
  • the status of the target user set can be determined according to the number of abnormal users and the total number of users in the target user set.
  • the abnormal concentration of the target user set can be determined, where the abnormal concentration refers to the ratio of the number of abnormal users in the target user set to the total number of users, if If the abnormal concentration is less than the concentration threshold, it can indicate that the proportion of abnormal users in the target user set is low, and the state of the target user set can be determined as a normal state; if the abnormal concentration is greater than the concentration threshold, it can indicate the target user In the collection, if the proportion of abnormal users is high, the status of the target user collection can be determined as the abnormal status.
  • the method for determining the abnormal concentration of the target user set can be as shown in formula (1):
  • C can be used to represent the abnormal concentration of the target user set
  • N can be used to represent the number of abnormal users in the target user set
  • M can be used to represent the total number of users in the target user set.
  • the status of the target user set may be determined through the user's social behavior feature set, for example, to obtain the user's social behavior feature set.
  • the user’s social behavior feature set here includes the social behavior characteristics of each user in the aforementioned user group, that is, the user’s social behavior feature set may include the detected social behavior characteristics of each user in the user group.
  • Historical data if user A has been to the central park and the flower town, the two social behavior characteristics of the user A have been to the central park and the flower town can be stored in the user's social behavior feature set.
  • the user's social behavior feature set may include the communication device used by the user, the wireless network, and the user's behavior (such as frequent visits to the same place).
  • the types and quantity of social behavior features of abnormal users in the target user set can be counted.
  • the information entropy can be determined. Smaller, it can indicate that the distribution of abnormal users in social behavior characteristics is more concentrated.
  • the method for determining information entropy can be as shown in formula (2):
  • H(x) can be used to represent information entropy
  • P(x i ) can be used to represent the distribution degree of each social behavior feature of the user.
  • the above-mentioned social behavior feature set includes three social behavior features of wireless network, user behavior, and communication device, and i in the above formula (2) can be 1, 2, and 3.
  • the social behavior feature of the wireless network can be represented by x1, x2, and x3
  • the social behavior feature of the user's behavior can be represented by x1, x2, and x3
  • the social behavior feature of the communication device can be represented by x1, x2, and x3.
  • the distribution degree of abnormal users in the social behavior characteristic of wireless network P (wireless network) (that is , the value of P(x 1 ) is P (wireless network)); for the social behavior characteristic of user behavior, there are 30 abnormalities Users have visited the same coffee shop more than 10 times on the same day, and if 20 abnormal users have visited 20 different other places on the same day, the number of abnormal users in the social behavior characteristic of user behavior is equal to 21 (ie 1 coffee shop + 20 other places), because among the 50 abnormal users, 30 abnormal users went to the same coffee shop on the same day, it can indicate that the abnormal users are in the user’s behavior
  • the distribution of social behavior is relatively concentrated, and the distribution degree P (user behavior) of abnormal users in the social behavior characteristic of user behavior can be obtained (that is , the value of P(x 2 ) is P (user behavior)); for communication
  • the device is a social behavior feature.
  • the device logs in to the account, the number of abnormal users in the social behavior characteristic of the communication device is 37 (that is, 1 communication device A + 1 communication device B + 35 other communication devices), because among the 50 abnormal users, There are 35 abnormal users all using different communication devices.
  • the number of communication devices is large and the differences are large, which can indicate that abnormal users are scattered in the social behavior characteristics of communication devices, that is, the concentration is low, and abnormal users can be obtained.
  • the distribution degree P (communication device) on the social behavior feature of communication device that is , the value of P(x 3 ) is P (communication device)).
  • the distribution degree P wireless network
  • the distribution degree P user behavior
  • the distribution degree P communication equipment
  • H(x) the first characteristic distribution degree of the abnormal user
  • the first feature distribution degree H(x) refers to a total distribution value of abnormal users on the three social behavior characteristics of the wireless network, the user's behavior, and the communication device.
  • the second feature distribution degree of the users (including abnormal users) in the target user set can be determined, that is, the feature distribution degree of the target user set as a whole.
  • determining the second feature distribution degree for example, reference may be made to the above description of determining the first feature distribution degree, which will not be repeated here.
  • the characteristic distribution difference degree between the abnormal user and the user in the target user set can be determined (the difference degree between the first characteristic distribution degree and the second characteristic distribution degree ), if the characteristic distribution difference degree is less than the difference degree threshold, and the first characteristic distribution degree is less than the distribution degree threshold, it can indicate that the social behavior characteristic distribution of the abnormal user is concentrated and the distribution difference with the target user set as a whole is small, then It shows that the social behavior characteristics of abnormal users in the target user set are normal and popular, then the target user set is in a normal state; if the characteristic distribution difference degree is greater than or equal to the difference degree threshold, and the first characteristic distribution degree is greater than or Equal to the distribution threshold, it can indicate that the social behavior characteristics of abnormal users are scattered, and the distribution of the overall distribution of the target user set is large.
  • the social behavior characteristics are also inconsistent, it can indicate that the social behavior characteristics of abnormal users in the target user set are niche characteristics, then the target user set is in a normal state; if the characteristic distribution difference degree is greater than or equal to the difference degree threshold , And the first feature distribution is less than the distribution threshold, it can indicate that the social behavior characteristics of abnormal users are concentrated, the social behavior characteristics of abnormal users are relatively consistent, and the abnormal users are between the non-abnormal users in the target user set The social behavior characteristics of is very different, then the target user set is abnormal.
  • the method for determining the degree of feature distribution difference for example, can be as shown in formula (3):
  • D KL (P ⁇ Q) can be used to represent the degree of feature distribution difference
  • P(i) can be used to represent the first feature distribution (that is, the distribution of abnormal users’ social behavior features)
  • Q(i) can be used to represent the first feature distribution.
  • the degree of feature distribution that is, the degree of distribution of the overall social behavior characteristics of users in the target user set.
  • the status of the target user set can be determined by the abnormal concentration of the target user set, or by the user's social behavior characteristics, or by combining the abnormal concentration and the user's social behavior characteristics. Determining, that is, first determining the abnormal concentration, and then determining the abnormal concentration by the user's social behavior characteristics after the abnormal concentration is greater than the concentration threshold, that is to say, the abnormal concentration is greater than the concentration threshold, and the first characteristic distribution is less than the distribution threshold, and When the characteristic distribution difference degree is greater than or equal to the difference degree threshold, the state of the target user is determined as an abnormal state.
  • Step S104 if the state of the target user set is abnormal, then according to the social relationship between the abnormal user and the user to be confirmed in the target user set, identify the proliferating abnormal user among the users to be confirmed;
  • the user is a user other than the abnormal user in the above-mentioned target user set.
  • a user who has a social relationship with the abnormal user can be determined among the users to be confirmed, and the user who has a social relationship with the abnormal user is determined as Proliferation of abnormal users.
  • having a social association relationship may mean that in the community topology graph where the node corresponding to the abnormal user is located, there is an edge starting from the abnormal user between the node corresponding to the abnormal user and the node corresponding to the user to be confirmed.
  • the abnormal users are user d and user k.
  • node d node e and node c can be reached, and for node k, node g can be reached, then user e and node c corresponding to node e can be corresponded
  • the user c of and the user g corresponding to node g are both determined to be abnormal proliferation users.
  • a user who has a social relationship with the abnormal user is determined among the users to be confirmed, and the abnormal user node corresponding to the abnormal user is obtained, and the abnormal user is obtained.
  • the associated user node corresponding to the user whose abnormal user has a social relationship, the associated user node whose edge weight between the abnormal user node and the associated user node is greater than the associated threshold is determined as the diffusion abnormal node, and the diffusion abnormal node corresponds to the associated user node Of users are determined to be the abnormal users.
  • the abnormal users are user d and user k.
  • node d node e and node c can be reached, then node e and node c can be determined as the associated user node of node d.
  • the weight of the edge from d to the associated user node e is 0.8, which is greater than the association threshold 0.75, and the weight of the edge from node d to the associated user node c is 0.56, which is much smaller than the association threshold 0.75, then the associated user node e can be determined as a proliferation abnormal node;
  • node g node g can be reached, and node g can be determined as the associated user node of node k.
  • the weight of the edge from node k to associated user node g is 0.5, and 0.5 is much smaller than the associated threshold 0.75, then the associated user node g is not a diffusion anomaly node.
  • Fig. 4A is a schematic diagram of a scenario for determining the status of a target user set provided by an embodiment of the present application. As shown in Figure 4A, taking the target user set 400a as an example, the abnormal users in the target user set 400a are user e and user f.
  • the business server can count the number of abnormal users According to user a, user b, user c, user d, user e, and user f in the target user set 400a, the service server can count that the total number of users in the target user set 400a is 6, then the target user set 400a
  • Fig. 4B is a schematic diagram of a scenario for determining the status of a target user set provided by an embodiment of the present application.
  • the abnormal users in the target user set 400b are user e, user f, user g, user h, and user i.
  • the user social behavior feature set includes wifi and User equipment, that is, according to the user’s social behavior feature set, it can be known that the wifi name used by abnormal user h is "Z", the wifi name used by abnormal user i is "X”, abnormal user e, abnormal user f And the wifi name used by the abnormal user g is "W”, it can be seen that for the social behavior feature of wifi, 60% of abnormal users use the same wifi.
  • the distribution is relatively concentrated. According to this distribution, the distribution degree of the abnormal user in the social behavior feature of wifi can be obtained as P(wifi); similarly, according to the user’s social behavior feature set, it can be known that the abnormal user e uses The used equipment is equipment A and equipment B, the equipment used by abnormal user f is equipment B and equipment C, the equipment used by abnormal user g is equipment D, and the equipment used by abnormal user h is equipment A and equipment E.
  • the devices used by the user are device B and device F. It can be seen that there are 3 abnormal users who have used the same device, that is, device B. There are 2 abnormal users who have used the same device A. The abnormal user is in the user device.
  • the distribution of social behavior characteristics is relatively concentrated.
  • the distribution degree of abnormal users on the social behavior characteristics of user equipment is P (user equipment).
  • the distribution degree P(wifi) of the abnormal user on the social behavior feature of wifi and the distribution degree P(user equipment) of the abnormal user on the social behavior feature of the user equipment and the above formula (2) the abnormal user can be obtained
  • the first feature distribution degree in social behavior characteristics is A; in the same way, the overall social interaction of users in the target user set (including abnormal user e, abnormal user f, abnormal user g, abnormal user h, and abnormal user i) can be obtained.
  • the second feature distribution degree of the behavior feature is B.
  • the social behavior feature distribution of the abnormal user and the overall social behavior of the target user set 400b can be obtained
  • the difference degree of characteristic distribution that is, the characteristic distribution difference degree of abnormal users is C, where, because the first characteristic distribution degree A is less than the distribution degree threshold D, and the characteristic distribution difference degree C is greater than the difference degree threshold E, the service server can The state of the target user set 400b is determined to be an abnormal state.
  • the plurality of users may be divided into at least two user sets according to the collected social relationships and social behaviors among the plurality of users. Make the closeness of the social association relationship between users in each user set higher than the closeness of the social association relationship between users in different user sets; take each user set in the multiple user sets as all The target user set.
  • a relationship topology diagram may be determined according to the social relationships and social behaviors among the multiple users.
  • each node corresponds to the For one user among multiple users, the edge connecting the two nodes indicates that the users corresponding to the two nodes have a social relationship; the relationship between the two users is determined according to the social relationship and social behavior between the multiple users.
  • the tightness of the social association relationship is determined according to the tightness of the weight of the edge between the nodes corresponding to the two users; the clustering algorithm is used to divide the relationship topological graph into at least two sub-topological graphs, and the at least A set of users corresponding to nodes in one of the two sub-topological graphs is used as the target user set.
  • Fig. 5 is a schematic diagram of a process for acquiring a target user set provided by an embodiment of the present application. As shown in Figure 5, the process may include the following steps.
  • Step S201 Obtain a relationship topology map corresponding to the user group.
  • the relationship topology graph may include N nodes k, and the N nodes k correspond to the users in the user group one-to-one, and N is the number of users in the user group; the edge weight between the two nodes k is based on the user Determined by the social relationship between two users in the group.
  • N can be the number of users in the user group.
  • each user in the user group can be regarded as node k, for example, user A is regarded as node A, and user B is regarded as node B.
  • the edge weight between two nodes k in the relationship topology graph can be determined.
  • N users in a user group and each user can correspond to a node k. If there is a social relationship between the two users, then the two nodes k corresponding to the two users can be connected by an edge.
  • initial weights can be set for the edges between these nodes k, and the initial weights can be converted into probability, and the result of the probability conversion can be used as the edge of the edge between nodes k
  • the weight according to the node k corresponding to the user group and the edge weight, can generate the relationship topology map corresponding to the user group.
  • the social behavior record here can be the transfer amount, transfer frequency, communication frequency, and communication duration between users with social relationships, the transfer amount between two users, or transfer frequency, or communication frequency, or communication The greater the duration, the greater the initial weight set for the edges of these two users.
  • the probability conversion here can refer to the standardization of the initial weight of each edge.
  • W ij represents the initial weight between node i and node j
  • the social relationship between users shows the relationship between node A, node B, node C, and node D in the form of a list.
  • the list shown in FIG. 6A can be used to represent a list of node relationships corresponding to users.
  • the node relationship list may be composed of a first header parameter, a second header parameter, and data corresponding to the first header parameter and the second header parameter.
  • the data corresponding to the first header parameter and the second header parameter may include edge weight data.
  • One edge weight data corresponds to two nodes, and the edge weight data can be used to indicate the degree of association between the two nodes. The greater the edge weight, the greater the degree of association between the two nodes.
  • the first header parameter may be a row parameter, and the second header parameter may be a column parameter; or, the first header parameter may be a column parameter, and the second header parameter may be a row parameter.
  • an adjacency matrix A1 used to characterize the association relationship between node A, node B, node C, and node D can be obtained.
  • the adjacency matrix A1 is as shown in the following matrix:
  • the adjacency matrix A1 is a 4 ⁇ 4 matrix.
  • the value 1 in the adjacency matrix A1 can be used to indicate that there is a social relationship between two users (that is, there is an edge between nodes), and the value 0 can be used to indicate that there is no social relationship between two users (that is, the relationship between nodes).
  • the edges are not connected between).
  • the edge weight data 12 corresponding to node A and node B can be set to 1; the relationship between user D and user A There is no social relationship between them, and no edge connection between node D and node A is required, and the edge weight data 41 corresponding to node D and node A can be set to 0.
  • a self-loop is added to each node, that is to say, an edge is added to each node, that is to say, the edge weight data 11, the edge weight data 22, the edge weight data 33 and the edge weight data are added. 44 are all set to 1.
  • the node relationship graph corresponding to user A, user B, user C, and user D can be obtained, which should be as shown in Figure 6B (connect the nodes corresponding to the value 1 in the adjacency matrix A1, and you can Figure 6B) is obtained.
  • the significance of adding a self-ring edge to each node here is that in the subsequent calculation process, the edge weight corresponding to the self-ring edge (the edge weight is 1) needs to be used, that is, you only need to know the edge weight of each self-ring edge. , So the self-loop edge of each node will not be shown in Figure 6B.
  • an initial weight can be set for each edge.
  • user A and user B user A transfers money to user B twice, where If the transfer amount reaches 100,000 in turn, the initial weight of the edge between node A and node B can be set to 10; for user A and user C, there is no social behavior record between user A and user C (that is, between user A and user C). If there is no transfer behavior, no call behavior), the initial weight of the edge between node A and node B can be set to 1.
  • user B and user C user B and user C communicate frequently, and the duration of each call If it is more than 20 minutes, the initial weight of the edge between node B and node C can be set to 8.
  • the initial weight of the edge between node B and node D can be set The weight is set to 9.
  • the node relationship diagram 6C containing the initial weight can be obtained.
  • the initial weight and the adjacency matrix A1 one can be used to characterize the association relationship and the degree of association between node A, node B, node C, and node D.
  • the adjacency matrix A2, the adjacency matrix A2 is shown in the following matrix:
  • the adjacency matrix A2 is a 4 ⁇ 4 matrix.
  • Probabilistic conversion (ie, standardization) can be performed on the elements in the adjacency matrix A2 (ie, the initial weight).
  • the method of probability conversion can be, taking element M12 (ie, the initial weight of the edge from node A to node B) as an example.
  • the edge weights of other edges can be obtained.
  • the adjacency matrix A2 and the edge weights after the probability conversion of each element one can be used to characterize the association relationship between node A, node B, node C, and node D, and The probability matrix A3 of the degree of association, the probability matrix A3 is shown in the following matrix:
  • the probability matrix A3 is a 4 ⁇ 4 matrix.
  • the edge weights from each node to its own node that is, the element M11, the element M22, the element M33, and the element M44) do not need to undergo probability conversion.
  • the corresponding relationship topology diagram of the user group (including the user A, the user B, the user C, and the user D) can be obtained as shown in FIG. 6D.
  • Step S202 according to the number of path samples, obtain the sampling path corresponding to the node k in the relationship topology diagram.
  • the jump probability of each node to other nodes in the relational topology graph can be calculated by walking, so that the community return of each node can be obtained, for example
  • the calculation method can be as shown in formula (5):
  • (M ij ) can be used to represent the jump probability from node i to node j
  • Mik can be used to represent the probability of node i to node k (edge weight)
  • M kj can be used to represent the transition from node k to node j. Probability (edge weight).
  • node A can walk 3 steps to reach node D (that is, node A-node B-node C-node D).
  • the weight of the edge from node A to node B is 0.2
  • the weight of the edge from node B to node C is 0.3
  • the weight of edge from node C to node D is 0.4.
  • this program uses Monte-Carlo (Monte-Carlo, MCL) sampling walk method to calculate, that is, the path of each node is sampled, so as to calculate the other sampling paths from each node to the node.
  • MCL Monte-Carlo
  • the transition threshold can obtain the associated nodes in the sampling path, and then calculate the jump probability of each node to the associated node in the sampling path.
  • the number of path samples in this application is a non-zero positive integer, and the number of path samples may be a manually specified value, or a value randomly generated by the server within the allowable range of the value.
  • the sampling path corresponding to each node k can be obtained in the relational topology map corresponding to the user group.
  • the sampling path refers to extracting the part corresponding to the number of path samples from the path with node k as the starting node. path.
  • the associated node of each node k can be determined in the sampling path of each node k, where the associated node is the node other than node k in the sampling path, for example, it can refer to the slave node k Initially, within the jump threshold (including the jump threshold), the node that can be reached by the jump is performed. For example, taking the relationship topology diagram in the embodiment corresponding to FIG. 6D as an example, in the relationship topology diagram of FIG. 6D, the node The path where A is the starting node includes path ABC, path ABC, and path ACB. The number of sampling paths is 1, which means that a path needs to be extracted from the path of node A as the sampling path of node A.
  • path ABC is the node The sampling path of A; the jump threshold is 1, that is to say, in path ABC, starting from node A, jumping from node A by 1 step can reach node B, then in path ABC, node B can be regarded as node A's Associate node.
  • the correlation threshold refers to the maximum limit on the number of jump steps in the sampling path. For each node k in the relational topology graph, node k is used as the starting node to jump from the number of jump steps to 1. The number of jump steps is increased.
  • a sampling path of node c is cegkij, and the jump threshold is 4, starting with node c, one step from node c can reach node e, and the number of jump steps is added After 1, the number of jump steps 1 increases to 2, then jump 2 steps to reach node g (pass node e to node g), increase jump step 2 to 3, then jump 3 steps (pass node e and node g) can reach node k, increase the number of jump steps 3 to 4, then jump 4 steps (through node e, node g, and node k) to reach node i, then the sampling path cegkij of node c , Node e, node g, node k, and node i can all be determined as the associated nodes of node c.
  • Step S203 Determine the jump probability between the node k and the associated node in the sampling path according to the edge weights in the relationship topology graph; the associated node refers to nodes other than the node k in the sampling path.
  • the jump probability between node k and the associated node can be determined according to the edge weights in the relationship topology graph corresponding to the user group. For example, if there is no edge between node k and the associated node, then the In the sampling path, the intermediate node between the node k and the associated node of node k can be obtained, and the node k can reach the associated node through the intermediate node. In the node k, the intermediate node, and the associated node, there can be The two nodes of the edge are used as a pair of connected nodes, and the jump probability between node k and the associated node can be determined according to the corresponding edge weight of the connected node pair.
  • the sampling path of node A is ABD
  • the jump threshold is 3
  • the number of jump steps can be 1 and 2.
  • the associated nodes of node A are node B and node D, where node A and node D
  • node B can be used as an intermediate node between node A and node D
  • node B and node C are If there is an edge between them, then node A and node B can be regarded as connecting node pair AB, and node B and node C can be regarded as connecting node pair BC.
  • the weight of the edge between connecting node pair AB can be obtained as 0.36
  • the weight of the edge between the connected node pair BC is 0.8
  • Step S204 Update the above-mentioned relationship topology diagram according to the above-mentioned jump probability to obtain an updated relationship topology diagram, and determine the above-mentioned target user set in the above-mentioned updated relationship topology diagram.
  • the above-mentioned relationship topology graph can be updated according to the jump probability, that is, the edges connected in the above-mentioned relationship topology graph can be updated according to the node k and the associated node, that is, each node k and the other
  • the associated nodes with edges are connected by edges (new edges are added to the relational topology graph) to obtain the transitional relational topology graph.
  • the associated nodes of node A are node B and node D, where node A can reach node D through node B, then node A and node D can be connected by an edge, and The edge plus the direction is used to indicate that the edge is from node A to node D.
  • the jump probability between node k and the associated node can be set as the edge weight between node k and the associated node to obtain the target relationship topology diagram, which is the updated target relationship topology diagram The relationship topology diagram.
  • the sampling path of node A is ABD.
  • the sampling path of node B is BAC.
  • the sampling path of node C is CABD
  • Probability Matrix A4 is shown in the following matrix:
  • the probability matrix A4 is a 4 ⁇ 4 matrix.
  • the element 0 in the probability matrix A4 above indicates that the nodes cannot be reached. For example, take the element M13 (that is, the weight of the edge from node A to node C) as an example, although in the probability matrix A3, the probability of node A to node C is 0.1 (that is, node A can reach node C, and the difference between node A and node C There is an edge between nodes), but because the extraction path of node A is ABD, other unextracted paths of node A are no longer considered, only node A to node B and node A to node D (that is, the probability matrix A4 Element M12 and element M14).
  • the edge weights (jump probability) in the target relational topological graph can be convexly transformed, that is, the edge weights are exponentially increased, and the jumps obtained after exponentially increasing Probability conversion (ie standardization processing).
  • the target probability can be obtained.
  • the edge weights between node k and the associated nodes of node k can be updated. Among these updated edge weights, if there are associated nodes greater than the weight threshold, the updated edge weight can be greater than or equal to The associated node of the weight threshold is determined to be an important associated node of node k.
  • the target relationship topology can be divided into at least two community topology maps, where the at least two community topology
  • the topological map of the target community is obtained in the figure, which can be used as a collection of target users.
  • the jump probability is increased exponentially, and the jump probability obtained after the exponential growth is subjected to probability conversion (standardization processing), that is, a convex transformation is performed on the jump probability, and the method for obtaining the target probability can be as follows: ) Shows:
  • ⁇ r (M ij ) is used to represent the target probability from node i to node j
  • Mij is used to represent the edge weight from node i to node j
  • (M ij ) r is used to represent the edge weight from node i to node j.
  • the element M 21 is 0.83, the value after exponential growth and standardization is 0.968, and the element M 41 is 0.266, and the value after exponential growth and standardization is 0.032. It can be seen that by exponentially increasing and standardizing the elements, The value of the element (edge weight) can be made larger (for example, 0.83 becomes 0.968), and the value of the element (edge weight) can be made smaller (for example, 0.266 becomes 0.032). In other words, through the MCL sampling and walking method and convex transformation, this solution can make the degree of association between users become tighter, and make the degree of association between users weaker and weaker, which is more conducive to The division of communities makes the result of division more accurate.
  • the number of iterations can be set, so that the steps from obtaining the sampling path to calculating the target probability can be repeated multiple times, that is to say, the random sampling of each node k is performed for the first time .
  • the target probability can be used as the edge weight between nodes, and the second random sampling is performed and the target probability between nodes is calculated.
  • you can The target probability is used as the edge weight to calculate the new target probability between nodes.
  • the final target probability can be determined as a stable probability, and then the community topology graph is divided by the stable target probability.
  • FIG. 7 is a schematic diagram of a scene of dividing a community topology provided by an embodiment of the present application.
  • the service server 1000 may determine user a corresponding to terminal A, user b corresponding to terminal B, ..., user k corresponding to terminal K as a user group ⁇ a, b, c, e, f, g, i, j, k ⁇ , the business server 1000 can treat each user in the user group as a node, and according to the social relationship between users, the business server 1000 can make edge connections between nodes to generate a user group ⁇ a, b, c, e, f, g, i, j, k ⁇ correspond to the topological graph of the relationship, and then, according to the social behavior records between users, the edge weight can be determined for the edge in the topological graph of the relationship , As shown in Figure 7, the edge weight of node c and node e is 0.7, the edge weight of no
  • node b takes node b as an example, and other nodes have the same method of obtaining the sampling path as node b. I will not repeat them here. There are four paths starting with node b, bij, bia, bikgec, and bikged.
  • the business server 1000 can extract bij and bikgec from the four paths of bij, bia, bikgec, and bikged, and Taking bij and bikgec as the sampling path of node b, subsequently, the business server 1000 can obtain the jump threshold value of 2, according to the jump threshold value 2, as shown in Figure 7, in the sampling path of bij, the position of node b Jump 2 times (jump from node b to node i connected to node b, and then jump from node i to node j connected to node i), you can reach node j, that is, there is no connection between node b and node j.
  • the business server 1000 can connect the edge between node b and node j, and add a direction to the edge to indicate that this edge reaches node j from node b.
  • the business server 1000 can obtain the edge weight of node b and node j as 0.4; in the sampling path of bikgec, take the position of node b In the beginning, the node that can be reached by hops twice is node k.
  • business server 1000 does not need to calculate node b and node g .
  • the jump probability between node e and node c only need to calculate the jump probability between node b and node k, according to the edge weight of node b and node i is 0.5, and the edge weight of node i and node k is 0.4,
  • the business server 1000 can obtain that the jump probability of node b to node k is 0.2, and the business server 1000 can connect node b to node k by an edge, and add the direction to the edge to indicate that this edge is reached by node b Node j, and use 0.2 as the edge weight of node b and node k.
  • the business server 1000 can use nodes other than node b in the sampling path (that is, node i, node j, and node k) as the associated nodes of node b, Then after sampling the path of node b, the edge weights of the associated nodes of node b and node b (ie, node i, node j, and node k) are 0.5 (node b to node i) and 0.4 (node b to node j) and 0.2 (node b to node). In the same way, the business server 1000 can obtain the sampling path of other nodes and the jumping probability of other nodes to the associated node, and the sampling path of each node and the jumping probability of the node to its associated node can be as shown in Table 1:
  • the column data is the starting node, and the row data is the arrival node. Taking node a as an example, the jump probability of node a to node b is 0.35, the jump probability of node a to node i is 0.7, and node a The jump probability to node k is 0.28.
  • the edge weights greater than or equal to the weight threshold 0.5 are: the jump probability from node a to node i is 0.7, and the jump probability from node b to node i is 0.5, the jump probability from node c to node d is 0.56, the jump probability from node c to node e is 0.7, the jump probability from node d to node c is 0.56, and the jump probability from node d to node e is 0.8,
  • the jump probability from node e to node d is 0.8
  • the jump probability from node e to node g is 0.6
  • the jump probability from node g to node k is 0.5
  • the jump probability from node i to node a is 0.7
  • node j If the jump probability to node a is 0.7, and the jump probability from node j to node i is 0.8, then the business server 1000 can use the jump probability as the edge weight of each edge to obtain the target relationship top
  • the community topology map (ie community) 200a and the community topology map (ie community) 200b can be obtained from the target relationship topology map (after sampling) 20b, as shown in Figure 7, It can be seen that the edge weights between the nodes in the communities 200a and 200b are both smaller than the weight threshold or there is no edge between the two nodes (that is, the degree of association between users in the two communities is low), for example, Take node k and node i as an example.
  • the edge weight of node k and node i is 0.4, which is less than the weight threshold 0.5, which can indicate that the degree of association between user k corresponding to node k and user i corresponding to node i is low, and the user k and user i are divided into different communities.
  • the weight threshold 0.5 which can indicate that the degree of association between user k corresponding to node k and user i corresponding to node i is low, and the user k and user i are divided into different communities.
  • FIG. 8 is a schematic diagram of a process for determining an abnormal category of a set of target users in an abnormal state according to an embodiment of the present application. As shown in Figure 8, the process may include the following steps.
  • Step S301 Determine the set of target users in the abnormal state as the set of users to be identified.
  • Step S302 Obtain user text data of users in the set of users to be identified, and extract key text data from the user text data.
  • the user text data can be remarks when the user makes a transfer, dialogue information when making a call, etc., and keyword recognition can be performed on the user text data to extract key text data. For example, if the user's remarks when transferring money is "Gambling Debt Repayment", the keyword "Gambling Debt" can be extracted.
  • Step S303 Acquire sensitive source data.
  • the sensitive source data is a set of preset abnormal categories
  • the sensitive source data may include abnormal categories such as gambling, cash out, fraud, robbery, and theft.
  • Step S304 Match the above-mentioned key text data with the above-mentioned sensitive source data, and determine the abnormal category of the above-mentioned set of users to be identified according to the matching result.
  • the above-mentioned key text data can be matched with the above-mentioned sensitive source data.
  • the key text data is "gambling debt”.
  • the matching rate of "gambling debt” and “gambling” can be obtained. If it can reach 90%, the abnormal category of the set of users to be identified can be determined as "gambling".
  • FIG. 9 is a schematic structural diagram of a data identification device provided by an embodiment of the present application.
  • the above-mentioned data recognition device may be a computer program (including program code) running in a computer device, for example, the data recognition device is an application software; the device may be used to execute corresponding steps in the method provided in the embodiments of the present application.
  • the data recognition device 1 may include: a target user set acquisition module 11, an abnormal user determination module 12, a behavior state detection module 13, and a diffusion abnormal user identification module 14.
  • the target user set obtaining module 11 is configured to obtain a target user set; the foregoing target user set includes at least two users who have a social relationship;
  • the abnormal user determination module 12 is used to obtain the default abnormal user, and determine the abnormal user in the above-mentioned target user set according to the above-mentioned default abnormal user;
  • the behavior state detection module 13 is configured to determine the state of the above-mentioned target user set according to the above-mentioned abnormal users;
  • the diffusion abnormal user identification module 14 is configured to identify diffusion among the above-mentioned users to be confirmed based on the social association relationship between the above-mentioned abnormal users and the users to be confirmed in the above-mentioned target user set if the status of the above-mentioned target user set is abnormal.
  • Abnormal users; the above-mentioned users to be confirmed are users other than the above-mentioned abnormal users in the above-mentioned target user set.
  • the target user set acquisition module 11, the abnormal user determination module 12, the behavior state detection module 13, and the proliferation abnormal user identification module 14, for example, can refer to the description of step S101 to step S104 in the embodiment corresponding to FIG. 3. I will not repeat them here.
  • the abnormal user determining module 12 may include: an abnormal user determining unit 121.
  • the abnormal user determining unit 122 is configured to match users in the target user set with the default abnormal user, and determine the user whose matching rate reaches the matching threshold in the target user set as the abnormal user in the target user set.
  • step S102 for the implementation of the abnormal user determining unit 121, refer to the description of step S102 in the embodiment corresponding to FIG. 4, which will not be repeated here.
  • the behavior state detection module 13 may include: a total number of users acquiring unit 131, an abnormal concentration determination unit 132, and a first state determination unit 133.
  • the total number of users acquiring unit 131 is configured to acquire the number of abnormal users and the total number of users in the target user set;
  • the abnormal concentration determination unit 132 is configured to determine the abnormal concentration of the target user set according to the number of abnormal users and the total number of users in the target user set;
  • the first state determining unit 133 is configured to determine the state of the target user set as a normal state if the abnormal concentration is less than the concentration threshold;
  • the first state determining unit 133 is further configured to determine the state of the target user set as an abnormal state if the abnormal concentration is greater than or equal to the concentration threshold.
  • the implementation of the total number of users acquiring unit 131, the abnormal concentration determining unit 132, and the first state determining unit 133 can refer to the description of step S103 in the embodiment corresponding to FIG. 3, which will not be repeated here.
  • the behavior state detection module 13 may include: a behavior feature acquisition unit 134, a feature distribution degree determination unit 135, a feature distribution difference degree determination unit 136, and a second state determination unit 137.
  • the behavior feature acquiring unit 134 is configured to acquire a user's social behavior feature set; the aforementioned user's social behavior feature set includes the social behavior feature of each user in the aforementioned user group;
  • the feature distribution determining unit 135 is configured to determine the first feature distribution of the abnormal user according to the social behavior features in the user's social behavior feature set; the first feature distribution is used to characterize the social behavior characteristics of the abnormal user Number of types;
  • the above-mentioned characteristic distribution degree determining unit 135 is further configured to determine a second characteristic distribution degree of a user in the above-mentioned target user set according to the social behavior characteristics in the above-mentioned user social behavior characteristic set; the above-mentioned second characteristic distribution degree is used to characterize the above-mentioned target user The number of types of social behavior characteristics of users in the collection;
  • the characteristic distribution difference degree determining unit 136 is configured to determine the characteristic distribution difference degree between the abnormal user and the users in the target user set according to the first characteristic distribution concentration degree and the second characteristic distribution degree;
  • the second state determining unit 137 is configured to determine the state of the target user set according to the first characteristic distribution degree and the characteristic distribution difference degree.
  • the second state determining unit 137 is further configured to determine the state of the target user set as a normal state if the difference degree of the feature distribution is less than the difference degree threshold, and the first feature distribution degree is less than the distribution threshold;
  • the second state determining unit 137 is further configured to determine that the state of the target user set is normal if the characteristic distribution difference degree is greater than or equal to the difference degree threshold, and the first characteristic distribution degree is greater than or equal to the distribution threshold value. state;
  • the second state determining unit 137 is further configured to determine the state of the target user set as an abnormal state if the characteristic distribution difference degree is greater than or equal to the difference degree threshold, and the first characteristic distribution degree is less than the distribution threshold.
  • the behavior characteristic acquisition unit 134, the characteristic distribution degree determination unit 135, the characteristic distribution difference degree determination unit 136, and the second state determination unit 137 can be referred to the description of step S103 in the embodiment corresponding to FIG. Do not repeat it.
  • the target user set acquisition module 11 may include: a relationship topology map acquisition unit 111, a sampling path acquisition unit 112, a jump probability determination unit 113, and a target user set determination unit 114.
  • the relationship topology diagram obtaining unit 111 is configured to obtain a relationship topology diagram corresponding to a user group; the above relationship topology diagram includes N nodes k, and the N nodes k are one-to-one corresponding to users in the user group, and N is the user group in the user group.
  • the number of users; the edge weight between two nodes k is determined based on the social relationship between the two users in the above-mentioned user group;
  • the sampling path obtaining unit 112 is configured to obtain the sampling path corresponding to the node k in the relationship topology diagram according to the number of path samples;
  • the jump probability determination unit 113 is configured to determine the jump probability between the node k and the associated node in the sampling path according to the edge weight in the above-mentioned relationship topology graph; the above-mentioned associated node refers to the node in the sampling path except for the above node nodes other than k;
  • the target user set determining unit 114 is configured to update the above-mentioned relationship topology diagram according to the above-mentioned jump probability to obtain an updated relationship topology diagram, and determine the above-mentioned target user set in the above-mentioned updated relationship topology diagram.
  • the implementation of the relational topology map obtaining unit 111, the sampling path obtaining unit 112, the jumping probability determining unit 113, and the target user set determining unit 114 can refer to the description of step S101 in the embodiment corresponding to FIG. 3, which will not be here. Let me repeat it again.
  • the relationship topology graph obtaining unit 111 may include: a user group obtaining subunit 1111, a weight setting subunit 1112, a probability conversion subunit 1113, and a relationship topology graph generating subunit 1114.
  • the user group obtaining subunit 1111 is used to obtain a user group, and each user in the above user group is regarded as a node k;
  • the weight setting subunit 1112 is used to connect the edges between the nodes k corresponding to the users with the social relationship, and set the edges between the nodes k according to the social behavior records between the users with the social relationship.
  • the probability conversion subunit 1113 is configured to perform probability conversion on the aforementioned initial weights to obtain the aforementioned edge weights;
  • the relational topology graph generating subunit 1114 is configured to generate the aforementioned relational topology graph according to the node k corresponding to the aforementioned user group and the aforementioned edge weight.
  • the user group obtaining subunit 1111, the weight setting subunit 1112, the probability conversion subunit 1113, and the relational topology graph generating subunit 1114 can be referred to in the step S101 of the above-mentioned embodiment corresponding to FIG. Description, I will not repeat it here.
  • the jumping probability determining unit 113 may include: an intermediate node obtaining subunit 1131, a connected node pair determining subunit 1132, and a jumping probability determining subunit 1133.
  • the intermediate node obtaining subunit 1131 is configured to obtain an intermediate node between the node k and the associated node in the sampling path if there is no edge between the node k and the associated node; the node k passes through the intermediate node Can reach the above-mentioned associated nodes;
  • the connecting node pair determining subunit 1132 is configured to use two nodes with edges as connecting node pairs among the aforementioned node k, the aforementioned intermediate node, and the aforementioned associated nodes, and obtain the edge weights corresponding to the aforementioned connecting node pairs;
  • the jump probability determination subunit 1133 is configured to determine the jump probability between the above-mentioned node k and the above-mentioned associated node according to the edge weight corresponding to the above-mentioned connected node pair.
  • the intermediate node acquisition subunit 1131, the connection node pair determination subunit 1132, and the jump probability determination subunit 1133 can refer to the description of determining the jump probability in step S101 in the embodiment corresponding to FIG. 3, here It will not be repeated here.
  • the target user set determining unit 114 may include: an update node edge subunit 1141, an edge weight setting subunit 1142, and a target user set determining subunit 1143.
  • the update node edge subunit 1141 is configured to update the connected edges in the above-mentioned relationship topology diagram according to the above-mentioned node k and the above-mentioned associated node to obtain a transitional relationship topology diagram; the above-mentioned node k in the above-mentioned transitional relationship topology diagram is related to the above-mentioned association
  • the nodes are all connected with edges;
  • the edge weight setting subunit 1142 is used to set the jump probability between the node k and the associated node as the edge weight between the node k and the associated node in the transition relationship topology graph to obtain the target relationship Topology;
  • the target user set determining subunit 1143 is configured to determine the target user set in the target relationship topology diagram.
  • the above-mentioned target user set determining subunit 1143 is also used to exponentially increase the above-mentioned jump probability, transform the jump probability obtained after the exponential increase, to obtain the target probability, and update the above-mentioned node k and node k according to the above-mentioned target probability.
  • the above-mentioned target user set determining subunit 1143 is further configured to determine the associated node with the updated edge weight greater than the weight threshold as the important associated node of the above-mentioned node k;
  • the target user set determining subunit 1143 is further configured to divide the target relationship topology map into at least two community topology maps according to the node k and the important associated nodes, and obtain the target community topology in the at least two community topology maps.
  • Figure as the above-mentioned target user set.
  • the implementation of the update node edge subunit 1141, the edge weight setting subunit 1142, and the target user set determining subunit 1143 can refer to the description of step S101 in the embodiment corresponding to FIG. 3, which will not be repeated here.
  • the abnormal diffusion user identification module 14 may include: a first association user determination unit 141 and a first abnormal diffusion user determination unit 142.
  • the first association user determination unit 141 is configured to determine, among the users to be confirmed, users who have a social association relationship with the abnormal user if the state of the target user set is an abnormal state;
  • the first abnormal proliferation user determining unit 142 is configured to determine the user who has a social relationship with the abnormal user as the abnormal proliferation user.
  • the implementation of the first association user determination unit 141 and the first proliferation abnormal user determination unit 142 can refer to the description of step S104 in the embodiment corresponding to FIG. 3, and will not be repeated here.
  • the abnormal diffusion user identification module 14 may include: a second association user determination unit 143 and a second abnormal diffusion user determination unit 144.
  • the second association user determination unit 143 is configured to, if the status of the target user set is an abnormal state, determine users who have a social association relationship with the abnormal user among the users to be confirmed;
  • the second diffusion abnormal user determination unit 144 is configured to obtain abnormal user nodes corresponding to the abnormal users, obtain the associated user nodes corresponding to the users who have a social relationship with the abnormal users, and compare the abnormal user nodes with the associated user nodes. Associated user nodes whose edge weights are greater than the association threshold are determined to be abnormal diffusion nodes, and the users corresponding to the abnormal diffusion nodes are determined to be the abnormal diffusion users.
  • the implementation of the second association user determination unit 143 and the second proliferation abnormal user determination unit 144 may refer to the description of step S104 in the embodiment corresponding to FIG. 3, and details will not be repeated here.
  • the data recognition device 1 may include a target user set acquisition module 11, an abnormal user determination module 12, a behavior state detection module 13, and a proliferation abnormal user identification module 14. It may also include: a user set determination module to be identified 15, The key text data extraction module 16, the sensitive source data acquisition module 17, and the abnormal category determination module 18.
  • the to-be-identified user set determining module 15 is configured to determine the above-mentioned target user set in an abnormal state as the to-be-identified user set;
  • the key text data extraction module 16 is used to obtain user text data of users in the aforementioned user set to be identified, and extract key text data from the aforementioned user text data;
  • Sensitive source data acquisition module 17 for acquiring sensitive source data
  • the abnormal category determination module 18 is configured to match the above-mentioned key text data with the above-mentioned sensitive source data, and determine the abnormal category of the above-mentioned set of users to be identified according to the matching result.
  • the implementation of the user set determination module 15 to be identified, the key text data extraction module 16, the sensitive source data acquisition module 17, and the abnormal category determination module 18 can be referred to the description of step S201 to step S204 in the embodiment corresponding to FIG. 5 , I will not repeat them here.
  • the embodiment of the present application obtains a target user set; the target user set includes at least two users with a social relationship; obtains the default abnormal user, and determines the abnormal user in the target user set according to the default abnormal user; according to the abnormal user , Determine the status of the above-mentioned target user set; if the status of the above-mentioned target user set is an abnormal state, identify diffusion among the above-mentioned users to be confirmed according to the social relationship between the abnormal user and the users to be confirmed in the above-mentioned target user set Abnormal users; the above-mentioned users to be confirmed are users other than the above-mentioned abnormal users in the above-mentioned target user set.
  • FIG. 10 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the apparatus 1 in the embodiment corresponding to FIG. 9 may be applied to the computer device 1000.
  • the computer device 1000 may include a processor 1001, a network interface 1004, and a memory 1005.
  • the computer device 1000 may also It includes: a user interface 1003 and at least one communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface) in some embodiments.
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory. In some embodiments, the memory 1005 may also be at least one storage device located far away from the foregoing processor 1001. As shown in FIG. 10, the memory 1005, which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 can provide network communication functions;
  • the user interface 1003 is mainly used to provide an input interface for the user; and
  • the processor 1001 can be used to call the device control application stored in the memory 1005 Procedure to achieve:
  • the foregoing target user set includes at least two users who have a social relationship;
  • the proliferation abnormal user is identified among the aforementioned users to be confirmed; the aforementioned user to be confirmed is the aforementioned Users other than the above abnormal users in the target user set.
  • the computer device 1000 described in the embodiment of the present application can execute the description of the video data processing method in the foregoing embodiment corresponding to FIG. 3 to FIG.
  • the description of the data processing device 1 will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiments of the present application also provide a computer-readable storage medium, and the aforementioned computer-readable storage medium stores the computer program executed by the aforementioned data processing computer device 1000, and
  • the foregoing computer program includes program instructions.
  • the foregoing processor executes the foregoing program instructions, it can execute the description of the foregoing data processing method in the foregoing embodiment corresponding to FIG. 3 to FIG.
  • the description of the beneficial effects of using the same method will not be repeated.
  • technical details that are not disclosed in the embodiment of the computer-readable storage medium involved in this application please refer to the description of the method embodiment of this application.
  • the foregoing computer-readable storage medium may be the data recognition apparatus provided in any of the foregoing embodiments or the internal storage unit of the foregoing computer equipment, such as the hard disk or memory of the computer equipment.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart media card (SMC), or a secure digital (SD) card equipped on the computer device. Flash card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the computer device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
  • each process and/or structural schematic diagrams of the method flowcharts and/or structural schematic diagrams can be implemented by computer program instructions. Or a block, and a combination of processes and/or blocks in the flowcharts and/or block diagrams.
  • These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing equipment to generate a machine, so that instructions executed by the processor of the computer or other programmable data processing equipment are generated for use.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the schematic structural diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks in the flow chart or the flow chart and/or the structure.

Abstract

一种数据识别方法、装置、设备以及可读存储介质,属于计算机技术领域,该方法包括:获取目标用户集合;上述目标用户集合中包括至少两个具有社交关联关系的用户(S101);获取默认异常用户,根据上述默认异常用户确定上述目标用户集合中的异常用户(S102);根据上述异常用户,确定上述目标用户集合的状态(S103);若上述目标用户集合的状态为异常状态,则根据上述异常用户与上述目标用户集合中的待确认用户之间的社交关联关系,在上述待确认用户中识别扩散异常用户;上述待确认用户为上述目标用户集合中除上述异常用户以外的用户(S104)。采用该方法提高数据识别的准确率。

Description

一种数据识别方法、装置、设备以及可读存储介质
本申请要求于2020年02月11日提交中国专利局、申请号为2020210086855.6、发明名称为“一种数据识别方法、装置、设备以及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种数据识别方法、装置、设备以及可读存储介质。
背景
在日常生活中,赌博、诈骗事件常见,为减少此类事件发生,高效快速的识别出异常用户,显得十分有必要。
在现有技术中,对于异常用户的识别,主要是通过对用户的行为特征数据进行识别,若用户的行为特征数据符合异常用户的行为特征数据,则将该用户确定为异常用户。但可能存在异常用户会模仿正常用户的合法行为,使得这类异常用户所对应的行为特征数据更趋近于合法的行为特征数据,这样会使得将本该为异常的用户识别为正常用户,因此识别的准确度不高。
技术内容
本申请实施例提供一种数据识别方法、装置、设备以及可读存储介质,可以提高数据识别的准确率。
本申请实施例一方面提供了一种数据识别方法,包括:
获取目标用户集合;上述目标用户集合中包括至少两个具有社交关联关系的用户;
获取默认异常用户,根据上述默认异常用户确定上述目标用户集合中的异常用户;
根据上述异常用户,确定上述目标用户集合的状态;
若上述目标用户集合的状态为异常状态,则根据上述异常用户与上述目标用户集合中的待确认用户之间的社交关联关系,在上述待确认用户中识别扩散异常用户;上述待确认用户为上述目标用户集合中除上述异常用户以外的用户。
本申请实施例一方面提供了一种数据识别装置,包括:
目标用户集合获取模块,用于获取目标用户集合;上述目标用户集合中包括至少两个具有社交关联关系的用户;
异常用户确定模块,用于获取默认异常用户,根据上述默认异常用户确定上述目标用户集合中的异常用户;
行为状态检测模块,用于根据上述异常用户,确定上述目标用户集合的状态;
扩散异常用户识别模块,用于若上述目标用户集合的状态为异常状态,则根据上述异常用户与上述目标用户集合中的待确认用户之间的社交关联关系,在上述待确认用户中识别扩散异常用户;上述待确认用户为上述目标用户集合中除上述异常用户以外的用户。
其中,上述异常用户确定模块,包括:
异常用户确定单元,用于将上述目标用户集合中的用户与上述默认异常用户进行匹配,将匹配率达到匹配阈值的用户确定为上述目标用户集合中的异常用户。
其中,上述行为状态检测模块,包括:
用户总数量获取单元,用于获取上述异常用户的数量,获取上述目标用户集合中用户的总数量;
异常浓度确定单元,用于根据上述异常用户的数量以及上述目标用户集合中用户的总数量,确定上述目标用户集合的异常浓度;
第一状态确定单元,用于若上述异常浓度小于浓度阈值,则将上述目标用户集合的状态确定为正常状态;
上述第一状态确定单元,还用于若上述异常浓度大于或等于浓度阈值,则将上述目标用户集合的状态确定为异常状态。
其中,上述行为状态检测模块,包括:
行为特征获取单元,用于获取用户社交行为特征集合;上述用户社交行为特征集合中包括上述用户群中每个用户的社交行为特征;
特征分布度确定单元,用于根据上述用户社交行为特征集合中的社交行为特征,确定上述异常用户的第一特征分布度;上述第一特征分布度用于表征上述异常用户具备的社交行为特征的种类数;
上述特征分布度确定单元,还用于根据上述用户社交行为特征集合中的社交行为特征,确定上述目标用户集合中用户的第二特征分布度;上述第二特征分布度用于表征上述目标用户集合中用户具备的社交行为特征的种类数;
特征分布差异度确定单元,用于根据上述第一特征分布集中度以及上述第二特征分布度,确定上述异常用户与上述目标用户集合中的用户之间的特征分布差异度;
第二状态确定单元,用于根据上述第一特征分布度以及上述特征分布差异度,确定上述目标用户集合的状态。
其中,上述第二状态确定单元,还用于若上述特征分布差异度小于差异度阈值,且上述第一特征分布度小于分布阈值,则将上述目标用户集合的状态确定为正常状态;
上述第二状态确定单元,还用于若上述特征分布差异度大于或等于上述差异度阈值,且上述第一特征分布度大于或等于上述分布阈值,则将上述目标用户集合的状态确定为正常状态;
上述第二状态确定单元,还用于若上述特征分布差异度大于或等于上述差异度阈值,且上述第一特征分布度小于上述分布阈值,则将上述目标用户集合的状态确定为异常状态。
其中,上述目标用户集合获取模块,包括:
关系拓扑图获取单元,用于获取用户群对应的关系拓扑图;上述关系拓扑图包括N个节点k,N个上述节点k与上述用户群中的用户一一对应,N为上述用户群中的用户数;两个节点k之间的边权重是基于上述用户群中的两个用户之间的社交关联关系所确定的;
抽样路径获取单元,用于根据路径抽样数量,在上述关系拓扑图中获取上述节点k对应的抽样路径;
跳转概率确定单元,用于根据上述关系拓扑图中的边权重,确定上述节点k与上述抽样路径中的关联节点之间的跳转概率;上述关联节点是指上述抽样路径中除上述节点k以外的节点;
目标用户集合确定单元,用于根据上述跳转概率更新上述关系拓扑图,得到更新后的关系拓扑图,在上述更新后的关系拓扑图中确定上述目标用户集合。
其中,上述关系拓扑图获取单元,包括:
用户群获取子单元,用于获取用户群,将上述用户群中的每个用户均作为节点k;
权重设置子单元,用于在具有社交关联关系的用户所对应的节点k之间进行边连接,根据上述具有社交关联关系的用户之间的社交行为记录,对上述节点k之间的边设置初始权重;
概率转换子单元,用于将上述初始权重进行概率转换,得到上述边权重;
关系拓扑图生成子单元,用于根据上述用户群对应的节点k以及上述边权重,生成上述关系拓扑图。
其中,上述跳转概率确定单元,包括:
中间节点获取子单元,用于若上述节点k与上述关联节点之间不具有边,则在上述抽样路径中获取上述节点k与上述关联节点之间的中间节点;上述节点k通过上述中间节点可到达上述关联节点;
连接节点对确定子单元,用于在上述节点k、上述中间节点以及上述关联节点中,将具有边的两个节点,作为连接节点对,获取上述连接节点对对应的边权重;
跳转概率确定子单元,用于根据上述连接节点对对应的边权重,确定上述节点k与上述关联节点之间的跳转概率。
其中,上述目标用户集合确定单元,包括:
更新节点边子单元,用于根据上述节点k和上述关联节点,对上述关系拓扑图中所连接的边进行更新,得到过渡关系拓扑图;上述过渡关系拓扑图中的上述节点k与上述关联节点均连接有边;
边权重设置子单元,用于在上述过渡关系拓扑图中,将上述节点k与上述关联节点之间的跳转概率,设置为上述节点k与上述关联节点之间的边权重,得到目标关系拓扑图;
目标用户集合确定子单元,用于在上述目标关系拓扑图中确定上述目标用户集合。
其中,上述目标用户集合确定子单元,还用于将上述跳转概率进行指数增长,将进行指数增长后得到的跳转概率进行概率转换,得到目标概率,根据上述目标概率更新上述节点k与上述关联节点之间的边权重;
上述目标用户集合确定子单元,还用于将更新后的边权重大于权重阈值的关联节点,确定为上述节点k的重要关联节点;
上述目标用户集合确定子单元,还用于根据上述节点k和上述重要关联节点,将上述目标关系拓扑图划分为至少两个社区拓扑图,在上述至少两个社区拓扑图中获取目标社区拓扑图,作为上述目标用户集合。
其中,上述扩散异常用户识别模块,包括:
第一关联关系用户确定单元,用于若上述目标用户集合的状态为异常状态,则在上述待确认用户中确定出与上述异常用户具有社交关联关系的用户;
第一扩散异常用户确定单元,用于将上述与上述异常用户具有社交关联关系的用户确定为上述扩散异常用户。
其中,上述扩散异常用户识别模块,包括:
第二关联关系用户确定单元,用于若上述目标用户集合的状态为异常状态,则在上述待确认用户中确定出与上述异常用户具有社交关联关系的用户;
第二扩散异常用户确定单元,用于获取上述异常用户对应的异常用户节点,获取上述与上述异常用 户具有社交关联关系的用户对应的关联用户节点,将上述异常用户节点与上述关联用户节点之间的边权重大于关联阈值的关联用户节点,确定为扩散异常节点,将上述扩散异常节点对应的用户确定为上述扩散异常用户。
其中,还包括:
待识别用户集合确定模块,用于将上述处于异常状态的上述目标用户集合确定为待识别用户集合;
关键文本数据提取模块,用于获取上述待识别用户集合中用户的用户文本数据,在上述用户文本数据中提取出关键文本数据;
敏感源数据获取模块,用于获取敏感源数据;
异常类别确定模块,用于将上述关键文本数据与上述敏感源数据进行匹配,根据匹配结果确定上述待识别用户集合的异常类别。
本申请实施例一方面提供了一种计算机设备,包括:处理器和存储器;
上述存储器存储有计算机程序,上述计算机程序被上述处理器执行时,使得所诉处理器执行如本申请实施例中的方法。
本申请实施例一方面提供了一种计算机可读存储介质,上述计算机可读存储介质存储有计算机程序,上述计算机程序包括程序指令,上述程序指令当被处理器执行时,执行如本申请实施例中的方法。
本申请实施例通过获取目标用户集合;上述目标用户集合中包括至少两个具有社交关联关系的用户;获取默认异常用户,根据上述默认异常用户确定上述目标用户集合中的异常用户;根据上述异常用户,确定上述目标用户集合的状态;若上述目标用户集合的状态为异常状态,则根据上述异常用户与上述目标用户集合中的待确认用户之间的社交关联关系,在上述待确认用户中识别扩散异常用户;上述待确认用户为上述目标用户集合中除上述异常用户以外的用户。上述可知,通过将具有社交关联关系的用户划分至目标用户集合中,在确定出该目标用户集合中的异常用户,且该目标用户集合为异常状态时,可以在该目标用户集合中获取到与该异常用户具有社交关联关系的用户,直接将上述与该异常用户具有社交关联关系的用户作为扩散异常用户,无需再对每一个用户都进行一次特征匹配,通过社交关联关系即可进行扩散异常用户的识别,因此,即使扩散异常用户具有与正常用户相似的特征,但由于该扩散异常用户与异常用户具有社交关联关系,依然可以被识别出来,从而可以提高识别的准确率。
附图简要说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种网络架构图;
图2A是本申请实施例提供的一种确定扩散异常用户的场景示意图;
图2B是本申请实施例提供的一种确定扩散异常用户的场景示意图;
图3是本申请实施例提供的一种数据识别方法的流程示意图;
图4A是本申请实施例提供的一种确定目标用户集合的状态的场景示意图;
图4B是本申请实施例提供的一种确定目标用户集合的状态的场景示意图;
图5是本申请实施例提供的一种获取目标用户集合的流程示意图;
图6A是本申请实施例提供的一种节点关系列表的示意图;
图6B是本申请实施例提供的一种节点关系的示意图;
图6C是本申请实施例提供的一种包含初始权重的节点关系的示意图;
图6D是本申请实施例提供的一种关系拓扑图的示意图;
图7是本申请实施例提供的一种划分社区拓扑图的场景示意图;
图8是本申请实施例提供的一种确定处于异常状态的目标用户集合的异常类别的流程示意图;
图9是本申请实施例提供的一种数据识别装置的结构示意图;
图10是本申请实施例提供的一种计算机设备的结构示意图。
实施本发明的方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1是本申请实施例提供的一种网络架构图。如图1所示,该网络架构可以包括业务服务器1000以及后台服务器集群。其中,上述后台服务器集群可以包括多个每个后台服务器,如图1所示,例如可以包括后台服务器100a、后台服务器100b、后台服务器100c、…、后台服务器100n。如图1所示,后台服务器100a、后台服务器100b、后台服务器100c、…、后台服务器100n可以分别与上述业务服务 器1000进行网络连接,以便于每个后台服务器可以通过该网络连接与业务服务器1000进行数据交互,以便于上述业务服务器1000可以接收到来自于每个后台服务器的业务数据。
如图1所示的每个后台服务器均与用户终端相对应,可以用于存储对应的用户终端的业务数据。每个用户终端均可以集成安装有目标应用。当该目标应用运行于各用户终端中时,则每个用户终端对应的后台服务器可以对目标应用提供的业务数据进行存储,并与上述图1所示的业务服务器1000之间进行数据交互。其中,该目标应用可以包括具有显示文字、图像、音频以及视频等数据信息功能的应用。如,应用可以为支付应用,该支付应用可以用于用户之间进行资金转账;也可以为社交类应用,如即时通讯应用,可以用于用户之间进行沟通联系。本申请中的业务服务器1000可以从这些应用的后台(如上述后台服务器集群)收集到数据,如,该数据可以为用于表征用户的用户身份信息(如用户id)、用户之间的转账记录以及用户之间的通信记录等,根据收集的数据,业务服务器1000可以将这些数据中的用户作为社区中的用户节点,还可以确定这些用户节点之间的社交关联关系。因此,本文中的社交关联关系是指,用户在使用目标应用过程中发生过任意信息传递行为的关系。信息传递行为,也称为社交行为,包括但不限于以下中的至少一个,用户信息的传递行为(例如添加用户为联系人、关注用户等)、内容信息的传递行为(如即时聊天、音/视频通话、内容转发、留言、回复留言等)、资金交易关系(如支付、转账等),等。实施各实施例的方案时,可以根据目标应用所提供的社交功能、需要识别的数据等因素,选择以上各种社交行为或社交关联关系中的一种或多种作为方案中识别数据的基础。
各实施例的方法可以由一个或多个计算设备执行,例如,图1所示的业务服务器1000以及后台服务器集群中的一台或多台计算设备。计算设备可以根据用户群中用户之间的社交关联关系以及社交行为记录,将用户群划分成至少两个用户集合(后文也称为社区)。例如,计算设备可以根据收集的大量用户之间的社交行为将这些用户分为多个用户集合,使得每个用户与该用户所属用户集合中的用户的社交关联关系与其它用户集合中的用户的社交关联关系相比更紧密。计算设备可以根据已有的异常用户样本,在各用户集合中识别出异常用户,根据每个用户集合中的异常用户确定用户集合是正常状态或是异常状态。若用户集合为异常状态,计算设备则根据该用户集合中的异常用户与该用户集合中的其他用户之间的社交关联关系,确定出该用户集合中的扩散异常用户。
本申请实施例可以在多个用户终端中选择一个用户终端作为目标用户终端。该目标用户终端可以包括:智能手机、平板电脑、桌上型电脑等携带显示和播放数据信息功能的智能终端。例如,本申请实施例可以将图1所示的后台服务器100a对应的用户终端作为该目标用户终端,该目标用户终端中可以集成有上述目标应用,此时,该目标用户终端对应的后台服务器100a可以与业务服务器1000之间进行数据交互。如,大量的用户在使用用户终端中的各种应用时,业务服务器1000通过后台服务器可以检测并收集到这些大量用户之间的社交关联关系。如,用户A与用户B具有通信记录,则业务服务器1000可以确定用户A与用户B之间具有社交关联关系,且该社交关联关系为通信关系。在检测到大量用户且确定出这些用户之间的社交关联关系后,业务服务器1000可以将这些大量用户作为用户群,将该用户群中的每个用户均作为一个节点,并将具有社交关联关系的用户所对应的节点之间进行边连接。根据该具有社交关联关系的用户之间的社交行为记录,对节点之间的边设置边权重。根据该用户群以及该边权重,可以构建生成关系拓扑图。根据节点之间的边权重大小,可以在该关系拓扑图中划分出至少两个不同的社区拓扑图。也就是说,业务服务器1000可以根据用户群中用户之间的社交关联关系以及社交行为记录,将用户群划分成至少两个社区。后续,根据已有的异常用户样本,业务服务器1000可以在上述社区中识别出异常用户,根据每个社区中的异常用户,业务服务器1000可以确定出社区是正常状态或是异常状态。若社区为异常状态,则业务服务器1000可以获取该异常社区中的异常用户。根据该异常社区中异常用户与该异常社区中的非异常用户之间的社交关联关系,业务服务器1000可以在该异常社区的非异常用户中确定出扩散异常用户。这里确定出扩散异常用户的目的在于识别出更大范围的异常用户,因为预先检测出来的异常用户样本可能存在样本数量小,异常用户范围覆盖度不高的情况,进而使得根据异常用户样本在异常社区中识别出的异常用户的覆盖度小,有部分异常用户未被识别出来。所以,为了提高识别的准确率,扩大覆盖度,可以根据异常社区中已识别出的异常用户的社交关联关系来确定出扩散异常用户。
以在一个社区拓扑图中确定扩散异常用户为例,对于确定扩散异常用户,业务服务器1000可以采取以下的实现方式。业务服务器1000可以在划分出的社区拓扑图中选择一个社区拓扑图作为目标用户集合,也就是说,该目标用户集合中包括至少两个具有社交关联关系的用户。业务服务器1000可以获取默认异常用户(即已有的异常用户样本)。根据该默认异常用户,业务服务器1000可以确定出该目标用户集合中的异常用户,根据该异常用户的数量以及该目标用户集合中用户的总数量,业务服务器1000可以检测出该目标用户集合的状态。当该目标用户集合为异常状态时,业务服务器1000可以根据该异常用户与该目标用户集合中的待确认用户之间的社交关联关系,在该待确认用户中识别出扩散异常用户,并将该扩散异常用户也作为异常用户。其中,该待确认用户为该目标用户集合中除该异常用户之外的用 户。在确定出每个关系拓扑图中的异常用户(包括扩散异常用户)后,业务服务器1000可以根据每个关系拓扑图的异常用户生成识别结果,将该识别结果返回至后台服务器。
一些实施例中,后台服务器可以将各自对应的用户终端所对应的大量用户确定为用户群,根据用户群划分成不同的社区拓扑图,进而得到不同的用户集合,在用户集合中识别出异常用户与扩散异常用户,这里的后台服务器识别出异常用户与扩散异常用户的实现方式可以参见上述业务服务器识别出异常用户与扩散异常用户的描述,这里将不再进行赘述。
本申请实施例提供的方法可以由计算机设备执行,计算机设备包括但不限于终端或服务器。
图2A是本申请实施例提供的一种确定扩散异常用户的场景示意图。如图2A所示,以目标用户集合为200a为例,业务服务器2000可以获取到已有的默认异常用户(即已有的异常用户样本),将该默认异常用户与目标用户集合200a中的节点所对应的用户进行匹配,将匹配率达到匹配阈值的用户作为异常用户。如,目标用户集合200a中的用户d与用户k与默认异常用户的匹配率大于了匹配阈值,则可以将用户d与用户k作为异常用户,则目标用户集合200a的用户总数量为5(用户c+用户e+用户d+用户g+用户k),异常用户的数量为2(异常用户d与异常用户k),根据用户总数量5以及异常用户数量2,可以确定出目标用户集合200a的异常浓度为40%,大于了浓度阈值30%,则业务服务器2000可以将目标用户集合200a的状态确定为异常状态,即目标用户集合200a为异常社区。后续,根据异常用户d与异常用户k的社交关联关系(即在目标用户集合200a中是否具有边),可以在异常的目标用户集合200a中确定出扩散异常用户,如,用户d与用户e具有边,且用户d与用户e的边权重为0.8,大于了关联阈值0.75,则可以说明用户e与异常用户d具有很强的关联关系,用户e有极大的概率也为异常用户,则可以将用户e作为扩散异常用户,用户d与用户c之间也具有边,但用户d与用户c之间的边权重为0.56,可以看出,0.56远小于关联阈值0.75,则可以说明用户d与用户c之间虽存在社交关联关系,但关联程度很弱,用户c是异常用户的概率很小,则可以将用户c作为非异常用户。同理,用户k与用户g之间具有边,但用户k与用户g之间的边权重为0.5,0.5远小于关联阈值0.75,则可以将用户g作为非异常用户,用户k与用户e之间具有边,但不是由用户k到达用户e的边,所以可以考虑为用户k无法到达用户e,则对于用户k而言,用户e是非异常用户,但对于用户d而言,用户e是扩散异常用户,则业务与服务器2000可以将用户e确定为扩散异常用户。后续,业务服务器2000可以确定目标用户集合200a中的异常用户,该异常用户可以包括扩散异常用户e、异常用户d以及异常用户k。
图2B是本申请实施例提供的一种确定扩散异常用户的场景示意图。如图2B所示,以上述图2A所对应实施例中的目标用户集合200a为例,业务服务器2000可以在目标用户集合200a中识别出用户d与用户k为异常用户。其中,业务服务器2000在目标用户集合200a中识别出用户d与用户k为异常用户的实现方式可以参见上述图2A中业务服务器2000在目标用户集合200a中识别出用户d与用户k为异常用户的描述,这里不再赘述。根据异常用户d与异常用户k,业务服务器2000可以确定目标用户集合200a为异常状态。根据异常用户d与异常用户k的社交关联关系(即在目标用户集合200a中是否具有边),可以确定出扩散异常用户。如,异常用户d与用户e之间具有边,则可以说明用户e与异常用户d之间具有社交关联关系,用户e有一定概率是异常用户d的同伙,则业务服务器2000可以将用户e确定为扩散异常用户。同理,异常用户d与用户c之间具有边,则业务服务器2000可以将用户c确定为扩散异常用户。同理,异常用户k与用户g之间具有边,则业务服务器2000可以将用户g确定为扩散异常用户。业务服务器2000可以确定目标用户集合200a中的异常用户,该异常用户为扩散异常用户e、异常用户d、异常用户k、扩散异常用户c以及扩散异常用户g。
图3是本申请实施例提供的一种数据识别方法的流程示意图。如图3所示,该方法的流程可以包括以下步骤。
步骤S101,获取目标用户集合,该目标用户集合中包括至少两个具有社交关联关系的用户。
本步骤中,可以从多个用户中确定目标用户集合。多个用户可以是根据预设条件筛选出的多个用户,或者某个后台服务器对应的多个用户,或者社交应用的所有用户(也称为用户群)。确定的目标用户集合满足以下条件:该目标用户集合中的用户之间的社交关联关系的紧密度高于所述目标用户集合中的用户与非所述目标用户集合中的用户的社交关联关系的紧密度。用户之间社交关联关系的紧密度可以根据用户的社交行为记录确定。例如,社交行为记录可以包括,但不限于,用户之间的信息交互频率、信息交互次数、信息交互时长、交互的信息量、交易金额,等。
本申请实施例中,目标用户集合可以为社区拓扑图。该社区拓扑图中包括用户对应的节点、节点之间的边以及每条边的边权重。其中,节点之间的边用于表示节点(用户)之间的社交关联关系,边权重用于表示关联程度。若两个用户具有社交关联关系,则这两个用户对应的节点之间具有边,两个用户的关系越紧密,则关联程度越大,则边权重也会越大。该社区拓扑图可以用于表明节点之间是否具有社交关联关系,以及具有社交关联关系的两个节点的关联程度。其中,这里的社交关联关系可以为支付关系、通信好友关系以及设备关联关系等,如,用户a使用用户b的通信设备(如智能手机)登录过账号,则 可以将用户a与用户b确定为具有设备关联关系。对于社交关联关系的包括范围,除支付关系、通信好友关系以及设备关联关系外,还可以为其他形式的关系(如,两个用户社交账号并不具有好友关系,但通过社交账号进行过对话),本申请对社交关联关系的包括范围不作限制。
目标用户集合可以由用户群对应的关系拓扑图得到,也就是说,该目标用户集合中的节点为用户群的关系拓扑图中的部分节点。根据关系拓扑图中节点之间的边权重(即用户之间的关联程度),可以对关系拓扑图进行划分,从而可以得到至少两个社区拓扑图,在该至少两个社区拓扑图中任选一个可作为目标用户集合。也就是说,根据用户群中用户之间的社交关联关系以及关联程度,可以将用户群划分成至少两个社区,其中,每个社区中的用户之间的关联程度紧密。
步骤S102,获取默认异常用户,根据上述默认异常用户确定上述目标用户集合中的异常用户。
本申请实施例中,该默认异常用户可以为预设的异常用户样本,该异常用户样本可为预先检测到的异常用户。该默认异常用户的数量可以包括至少两个,默认异常用户可以包括用户的属性信息(如id、姓名、指纹等),以属性信息为id为例,可将上述目标用户集合中的每个用户的id与该默认异常用户的id进行匹配,可将该目标用户集合中匹配率达到匹配阈值的用户,确定为该目标用户集合中的异常用户。
该默认异常用户包括<默认异常用户1,1>与<默认异常用户2,2>,即包括默认异常用户1以及默认异常用户1的id为1,还包括默认异常用户2以及默认异常用户2的id为2,目标用户集合中包括{<用户A,1>,<用户B,4>,<用户C,6>},则可以将默认异常用户1的id(即1与2)与该目标用户集合用户的id(即1,4,6)进行匹配,可得到匹配结果为用户A的id1与默认异常用户1的id1匹配,则可将用户A确定为该目标用户集合中的异常用户。
步骤S103,根据上述异常用户,确定上述目标用户集合的状态。
本申请中,可根据异常用户的数量以及该目标用户集合中用户的总数量来对该目标用户集合的状态进行确定。根据该异常用户数量以及该目标用户集合中用户的总数量,可确定出该目标用户集合的异常浓度,其中,该异常浓度是指该目标用户集合中异常用户数量占用户总数量的比例,若该异常浓度小于浓度阈值,则可说明目标用户集合中,异常用户的占比低,则可将该目标用户集合的状态确定为正常状态;若该异常浓度大于浓度阈值,则可说明该目标用户集合中,异常用户的占比高,则可将该目标用户集合的状态确定为异常状态。其中,确定目标用户集合的异常浓度的方法可以如公式(1)所示:
C=N/M                                   (1)
其中,C可用于表示目标用户集合的异常浓度,N可用于表示目标用户集合中异常用户的数量,M可用于表示目标用户集合中用户的总数量。
一些实施例中,可以通过用户社交行为特征集合来确定目标用户集合的状态,例如,获取用户社交行为特征集合。这里的用户社交行为特征集合中包括上述用户群中的每个用户的社交行为特征,也就是说,该用户社交行为特征集合中可以包括检测到的用户群中的每个用户的社交行为特征的历史数据,如用户A去过中心公园与花卉小镇,则可以将用户A去过中心公园与花卉小镇这两个社交行为特征存储至用户社交行为特征集合中。可以理解为,该用户社交行为特征集合可以包括用户使用的通信设备、无线网络以及用户的行为(如频繁去同一个地点)等。根据该用户社交行为特征集合,可以统计出该目标用户集合中异常用户的社交行为特征的种类和数量,根据异常用户具有的每个社交行为特征的分布度,可以确定出信息熵,信息熵越小,则可以表明异常用户在社交行为特征上的分布越集中。信息熵的例如确定方法可以如公式(2)所示:
Figure PCTCN2020126055-appb-000001
其中,H(x)可用于表示信息熵,P(x i)可用于表示用户的每一个社交行为特征的分布度。
如,上述社交行为特征集包括无线网络、用户的行为以及通信设备这三个社交行为特征,则上述公式(2)中的i可以为1、2以及3。则无线网络这一社交行为特征可用x1、x2以及x3表示,用户的行为这一社交行为特征可以用x1、x2以及x3表示,通信设备这一社交行为特征可用x1、x2以及x3表示。以下以无线网络用x1表示、用户的行为用x2表示以及通信设备用x3表示为例,对于无线网络这一社交行为特征,异常用户的数量为50个,在这50个异常用户中,有48个异常用户都使用的同一个无线网络A,有2个异常用户使用的是其他不同的无线网络B,则无线网络这一社交行为特征的数量即为3(1个无线网络A+1个无线网络B+1个无线网络C)。因为在50个异常用户中,有48个异常用户都使用了同一个无线网络A,无线网络的数量小且差异性小则可表明异常用户在无线网络这一社交行为特征上分布集中,可得到异常用户在无线网络这一社交行为特征上的分布度P(无线网络)(即P(x 1)的值为P(无线网络));对于用户的行为这一社交行为特征,有30个异常用户在同一天都去了同一个咖啡店超过10次,有20个异常用户在相同的一天去过20个不同的其他地方,则异常用户在用户的行为这一社交行为特征上的分布数量即为21(即1个咖啡店+20个其他地方),因为在50个异常用户中,有30 个异常用户都在同一天去了同一个咖啡店,则可表明异常用户在用户的行为这一社交行为上分布较集中,可得到异常用户在用户的行为这一社交行为特征上的分布度P(用户的行为)(即P(x 2)的值为P(用户的行为));对于通信设备这一社交行为特征,有10个异常用户使用了同一个通信设备A登录账号,有5个异常用户使用了同一个通信设备B登录账号,有35个异常用户使用了35个不同的其他通信设备登录账号,则异常用户在通信设备这一社交行为特征上的分布数量为37(即1个通信设备A+1个通讯设备B+35个其他通信设备),因为在50个异常用户中,有35个异常用户都使用的是不同的通信设备,通信设备的数量多且差异性大,则可表明异常用户在通信设备这一社交行为特征上分布分散,即集中度低,可得到异常用户在通信设备这一社交行为特征上的分布度P(通信设备)(即P(x 3)的值为P(通信设备))。根据异常用户在无线网络这一社交行为特征上的分布度P(无线网络)、异常用户在用户的行为这一社交行为特征上的分布度P(用户的行为)、异常用户在通信设备这一社交行为特征上的分布度P(通信设备)以及上述公式(2),可得到异常用户的第一特征分布度为H(x)。也就是说,这里的第一特征分布度H(x)是指异常用户在无线网络、用户的行为以及通信设备这三个社交行为特征上的一个总的分布值。
同理,根据该用户社交行为特征集合中的社交行为特征,可确定出目标用户集合中用户(包括异常用户)的第二特征分布度,即目标用户集合整体的特征分布度。其中,对于确定第二特征分布度的例如实现方式可以参见上述确定第一特征分布度的描述,这里将不再进行赘述。根据该第一特征分布度以及该第二特征分布度,可确定出该异常用户与该目标用户集合中用户之间的特征分布差异度(第一特征分布度与第二特征分布度的差异度),若该特征分布差异度小于差异度阈值,且该第一特征分布度小于分布度阈值,则可以说明异常用户的社交行为特征分布集中,且与目标用户集合整体的分布差异小,则可以说明该目标用户集合中的异常用户的社交行为特征为正常且大众化的,则该目标用户集合为正常状态;若该特征分布差异度大于或等于差异度阈值,且该第一特征分布度大于或等于分布阈值,则可以说明异常用户的社交行为特征分布分散,且与目标用户集合整体的分布差异大,则可以说明异常用户之间的社交行为特征不一致,且异常用户与非异常用户之间的社交行为特征也不一致,则可以说明该目标用户集合中的异常用户的社交行为特征是具有小众化的特性,则该目标用户集合为正常状态;若该特征分布差异度大于或等于差异度阈值,且该第一特征分布度小于该分布阈值,则可以说明异常用户的社交行为特征分布集中,异常用户之间的社交行为特征比较一致,且异常用户与该目标用户集合中非异常用户之间的社交行为特征差异很大,则该目标用户集合为异常状态。其中,对于特征分布差异度的例如确定方法,可以如公式(3)所示:
Figure PCTCN2020126055-appb-000002
其中,D KL(P∥Q)可用于表示特征分布差异度,P(i)可用于表示第一特征分布度(即异常用户的社交行为特征的分布度),Q(i)可用于表示第二特征分布度(即目标用户集合中用户整体的社交行为特征的分布度)。
一些实施例中,对于目标用户集合的状态的确定,可以通过目标用户集合的异常浓度来确定,也可以通过用户社交行为特征来确定,还可以通过异常浓度与用户社交行为特征组合的方式来进行确定,即先确定出异常浓度,在异常浓度大于浓度阈值后,再通过用户社交行为特征来确定,也就是说,需同时满足异常浓度大于浓度阈值,且第一特征分布度小于分布阈值,且特征分布差异度大于或等于差异度阈值时,才将目标用户的状态确定为异常状态。
步骤S104,若上述目标用户集合的状态为异常状态,则根据上述异常用户与上述目标用户集合中的待确认用户之间的社交关联关系,在上述待确认用户中识别扩散异常用户;上述待确认用户为上述目标用户集合中除上述异常用户以外的用户。
本申请中,若该目标用户集合的状态为异常状态,则可以在待确认用户中确定出与该异常用户具有社交关联关系的用户,并将该与该异常用户具有社交关联关系的用户确定为扩散异常用户。其中,这里的具有社交关联关系可以是指在异常用户对应的节点所在的社区拓扑图中,异常用户对应的节点与待确认用户对应的节点之间,具有从异常用户出发的边。
以上述图2B为例,异常用户为用户d与用户k,对于节点d,可到达节点e与节点c,对于节点k,可到达节点g,则可以将节点e对应的用户e、节点c对应的用户c以及节点g对应的用户g,均确定为扩散异常用户。
一些实施例中,若该目标用户集合的状态为异常状态,则在该待确认用户中确定出与该异常用户具有社交关联关系的用户,并获取该异常用户对应的异常用户节点,获取该与该异常用户具有社交关联关系的用户对应的关联用户节点,将该异常用户节点与该关联用户节点之间的边权重大于关联阈值的关联用户节点,确定为扩散异常节点,将该扩散异常节点对应的用户确定为该扩散异常用户。
以上述图2A所对应实施例为例,异常用户为用户d与用户k,对于节点d,可到达节点e与节点c, 则可将节点e与节点c确定为节点d的关联用户节点,节点d到关联用户节点e的边权重为0.8,大于了关联阈值0.75,节点d到关联用户节点c的边权重为0.56,远小于关联阈值0.75,则可将关联用户节点e确定为扩散异常节点;对于节点k,可到达节点g,可将节点g确定为节点k的关联用户节点,节点k到关联用户节点g的边权重为0.5,0.5远小于关联阈值0.75,则关联用户节点g不是扩散异常节点。
上述可知,通过将具有社交关联关系的用户划分至目标用户集合中,在确定出该目标用户集合中的异常用户,且该目标用户集合为异常状态时,可以在该目标用户集合中获取到与该异常用户具有社交关联关系的用户,直接将上述与该异常用户具有社交关联关系的用户作为扩散异常用户,无需再对每一个用户都进行一次特征匹配,通过社交关联关系即可进行扩散异常用户的识别。即使扩散异常用户具有与非异常用户相同的特征,但由于该扩散异常用户与异常用户具有社交关联关系,依然可以被识别出来,从而可以提高识别的准确率。
图4A是本申请实施例提供的一种确定目标用户集合的状态的场景示意图。如图4A所示,以目标用户集合为400a为例,其中,目标用户集合400a中的异常用户为用户e与用户f,根据异常用户e与异常用户f,业务服务器可以统计出异常用户的数量为2,根据目标用户集合400a中的用户a、用户b、用户c、用户d、用户e以及用户f,业务服务器可以统计出目标用户集合400a的用户总数量为6,则该目标用户集合400a的异常浓度为2/6=33%,因为该异常浓度33%大于了浓度阈值20%,则业务服务器可以将目标用户集合400a的状态确定为异常状态。
图4B是本申请实施例提供的一种确定目标用户集合的状态的场景示意图。如图4B所示,以目标用户集合为400b为例,其中,目标用户集合400b中的异常用户为用户e、用户f、用户g、用户h以及用户i,用户社交行为特征集合中包括wifi以及用户设备,也就是说,根据该用户社交行为特征集合,可以得知,异常用户h使用的wifi名称为“Z”,异常用户i使用的wifi名称为“X”,异常用户e、异常用户f以及异常用户g使用的wifi名称均为“W”,则可以看出,针对wifi这一社交行为特征,有60%的异常用户都使用同一个wifi,异常用户在wifi这一社交行为特征上的分布较集中,根据这一分布情况,可以得到异常用户在wifi这一社交行为特征上的分布度为P(wifi);同理,根据该用户社交行为特征集合,可以得知,异常用户e使用过的设备为设备A与设备B、异常用户f使用过的设备为设备B与设备C、异常用户g使用过的设备为设备D、异常用户h使用过的设备为设备A与设备E、异常用户使用过的设备为设备B与设备F,可以看出,有3个异常用户使用过同一个设备,即设备B,有2个异常用户使用过同一个设备A,异常用户在用户设备这一社交行为特征上的分布较为集中,根据这一分布情况,可以得到异常用户在用户设备这一社交行为特征上的分布度为P(用户设备)。根据上述异常用户在wifi这一社交行为特征上的分布度P(wifi)以及异常用户在用户设备这一社交行为特征上的分布度P(用户设备)以及上述公式(2),可以得到异常用户在社交行为特征上的第一特征分布度为A;同理,可以求出目标用户集合中用户(包括异常用户e、异常用户f、异常用户g、异常用户h以及异常用户i)整体的社交行为特征的第二特征分布度为B,根据第一特征分布度A、第二特征分布度B以及上述公式(3),可以得到异常用户的社交行为特征分布与目标用户集合400b整体的社交行为特征分布的差异度,即异常用户的特征分布差异度为C,其中,因该第一特征分布度A小于分布度阈值D,且该特征分布差异度C大于差异度阈值E,则业务服务器可以将该目标用户集合400b的状态确定为异常状态。
各实施例中,从多个用户中确定所述目标用户集合时,可以根据收集的所述多个用户之间的社交关系和社交行为,将所述多个用户划分为至少两个用户集合,使得每个用户集合中的用户之间的社交关联关系的紧密度高于不同用户集合中的用户之间的社交关联关系的紧密度;将所述多个用户集合中的每个用户集合作为所述目标用户集合。
一些实施例中,将多个用户分为多个用户集合时,可以根据所述多个用户之间的社交关系和社交行为确定关系拓扑图,所述关系拓扑图中,每个节点对应所述多个用户中的一个用户,连接两个节点的边表示所述两个节点对应的用户之间具有社交关系;根据所述多个用户之间的社交关系和社交行为确定两个用户之间的社交关联关系的紧密度,根据所述紧密度确定所述两个用户对应的节点之间的边的权重;利用聚类算法对所述关系拓扑图划分为至少两个子拓扑图,将所述至少两个子拓扑图中的一个子拓扑图中的节点对应的用户的集合作为所述目标用户集合。
图5是本申请实施例提供的一种获取目标用户集合的流程示意图。如图5所示,该流程可以包括以下步骤。
步骤S201,获取用户群对应的关系拓扑图。该关系拓扑图可以包括N个节点k,N个上述节点k与上述用户群中的用户一一对应,N为上述用户群中的用户数;两个节点k之间的边权重是基于上述用户群中的两个用户之间的社交关联关系所确定的。
本申请中,N可以该用户群中的用户数,在获取到用户群后,可以将用户群中的每个用户均作为节点k,如,将用户A作为节点A,将用户B作为节点B,根据该用户群中的两个用户之间的社交关联关 系,可以确定出该关系拓扑图中两个节点k之间的边权重。一个用户群中有N个用户,每个用户可以对应一个节点k,若两个用户之间具有社交关联关系,则可将这两个用户对应的两个节点k之间进行边连接,根据该具有社交关联关系的用户之间的社交行为记录,可以对这些节点k之间的边设置初始权重,并将该初始权重进行概率转换,将概率转换后的结果作为节点k之间的边的边权重,根据该用户群对应的节点k以及该边权重,可以生成用户群对应的关系拓扑图。其中,这里的社交行为记录可以为具有社交关联关系的用户之间的转账金额、转账频次、通信频次以及通信时长等,两个用户之间的转账金额,或转账频次,或通信频次,或通信时长越大,则对这两个用户的边设置的初始权重越大。其中,这里的概率转换可以指对每条边的初始权重进行标准化处理,如,对于节点i与节点j,节点i与节点j之间存在边,则节点i与节点j之间的边可以表示为Mij,则对于Mij的概率转换,可以如公式(4)所示:
Figure PCTCN2020126055-appb-000003
其中,W ij表示节点i与节点j之间的初始权重,
Figure PCTCN2020126055-appb-000004
表示n个节点与节点j之间的初始权重之和。
以用户群中包括用户A、用户B、用户C以及用户D为例,将用户A作为节点A、将用户B作为节点B、将用户C作为节点C以及将用户D作为节点D,为便于直观展现用户之间的社交关联关系,以下以列表的形式表示节点A、节点B、节点C以及节点D之间的关联关系,如图6A所示的列表可以用于表示用户对应的节点关系列表,其中,该节点关系列表可以由第一表头参数、第二表头参数,以及,第一表头参数和第二表头参数共同对应的数据组成。其中,第一表头参数和第二表头参数共同对应的数据可以包括边权重数据。一个边权重数据对应两个节点,边权重数据可以用于指示两个节点之间的关联程度,边权重越大,则两个节点之间的关联程度越大。其中,第一表头参数可以为行参数,第二表头参数可以为列参数;或者,第一表头参数可以为列参数,第二表头参数可以为行参数。
根据图6A所示的节点关系列表,可得到一个用于表征节点A、节点B、节点C以及节点D之间的关联关系的邻接矩阵A1,邻接矩阵A1如下列矩阵所示:
Figure PCTCN2020126055-appb-000005
  邻接矩阵A1
其中,邻接矩阵A1为4×4的矩阵。邻接矩阵A1中的数值1可以用于表示两个用户之间具有社交关联关系(即节点之间连接有边),数值0可以用于表示两个用户之间不具有社交关联关系(即节点之间未连接边)。如,用户A与用户B之间存在社交关联关系,需要将节点A与节点B进行边连接,则可以将节点A与节点B共同对应的边权重数据12设置为1;用户D与用户A之间不具有社交关联关系,无需将节点D与节点A进行边连接,则可以将节点D与节点A共同对应的边权重数据41设置为0。这里对每个节点都添加了自环,也就是说对每个节点都添加了一条到自己的边,也就是说,将边权重数据11、边权重数据22、边权重数据33以及边权重数据44都设置为1。根据邻接矩阵A1,可以得到用户A、用户B、用户C以及用户D对应的节点关系图,应为图6B所示(将邻接矩阵A1中,数值1对应的节点之间进行边连接,即可得到图6B)。这里对每个节点添加自环边的意义在于后续计算过程中,需要使用到自环边对应的边权重(该边权重为1),即,只需要知道每条自环边的边权重即可,所以在图6B中将不展现出每个节点的自环边。
进一步地,根据用户A、用户B、用户C以及用户D之间的社交行为记录,可以对每条边都设置初始权重,对于用户A与用户B,用户A向用户B转账过两次,其中依次转账金额达到10万,则可以将节点A与节点B的边的初始权重设置为10;对于用户A与用户C,用户A与用户C之间没有社交行为记录(即用户A与用户C之间无转账行为、无通话行为),则可以将节点A与节点B的边的初始权重设置为1;对于用户B与用户C,用户B与用户C之间通信频繁,且每次的通话时长都于20分钟以上,则可以将节点B与节点C的边的初始权重设置为8;对于用户B与用户D,用户B向用户D转账频繁,则可以将节点B与节点D的边的初始权重设置为9。则根据社交行为记录,可以得到包含初始权重的节点关系图6C,根据初始权重与邻接矩阵A1,可以得到一个用于表征节点A、节点B、节点C以及节点D之间的关联关系以及关联程度的邻接矩阵A2,邻接矩阵A2如下列矩阵所示:
Figure PCTCN2020126055-appb-000006
  邻接矩阵A2
邻接矩阵A2为4×4的矩阵。
对邻接矩阵A2中的元素(即初始权重)可以进行概率转换(即标准化处理),例如概率转换的方 法可以为,以元素M12(即节点A至节点B的边的初始权重)为例,可以先获取到节点A到节点B(即元素M12)的初始权重为1,再获取到节点B至节点B的边的初始权重为1,节点C至节点B的初始权重为8,节点D至节点B的边的初始权重为9,即,获取到邻接矩阵A2中元素M12所在列的元素M12、元素M22、元素M32以及元素M42,将元素M12、元素M22、元素M32以及元素M42的值进行相加,可以得到相加后的结果为28,根据元素M12的值10与相加结果28,可以得到元素M12进行概率转换后的结果为10/28=0.36,则可以将0.36作为节点A至节点B的边权重。同理,可以得到其他边的边权重,根据邻接矩阵A2以及每个元素进行概率转换后的边权重,可以得到一个用于表征节点A、节点B、节点C以及节点D之间的关联关系以及关联程度的概率矩阵A3,概率矩阵A3如下列矩阵所示:
Figure PCTCN2020126055-appb-000007
  概率矩阵A3
概率矩阵A3为4×4的矩阵。
每个节点到自身节点的边权重(即元素M11、元素M22、元素M33以及元素M44)无需进行概率转换。
根据节点A、节点B、节点C以及节点D以及节点之间的边权重,可以得到用户群(包括用户A、用户B、用户C以及用户D)对应的关系拓扑图为图6D所示。
步骤S202,根据路径抽样数量,在上述关系拓扑图中获取上述节点k对应的抽样路径。
本申请中,对于关系拓扑图中的每个节点,可以通过游走的方式,计算出每个节点到达该关系拓扑图中其他节点的跳转概率,从而可以得到每个节点的社区归,例如计算方式可以如公式(5)所示:
Expa(M ij)=∑ k=1:nM ik*M kj                      (5)
其中,(M ij)可以用于表示节点i至节点j的跳转概率,M ik可以用于表示节点i到节点k的概率(边权重),M kj可以用于表示节点k到节点j的概率(边权重)。
如,节点A与节点D之间不具有边连接,但节点A与节点B之间具有边连接,节点B与节点C之间具有边连接,节点C与节点D之间具有边连接,则可以说明节点A可以游走3步到达节点D(即节点A-节点B-节点C-节点D)。其中,节点A到节点B的边权重为0.2,节点B到节点C的边权重为0.3,节点C到节点D的边权重为0.4,则根据上述公式(5),可以得到节点A到节点D的跳转概率为0.2×0.3×0.4=0.024。
因为用户群中用户数量庞大,即节点数量多,若计算关系拓扑图中每个节点到其余节点的跳转概率,则规模巨大,可能会造成时间以及空间上的浪费。为节约时间与空间,本方案采用蒙特卡洛(Monte-Carlo,MCL)抽样游走方法来进行计算,即对每个节点的路径进行抽样,从而计算每个节点到该节点的抽样路径中其他节点的跳转概率,也就是说,本方案不用计算每个节点到其他所有节点的概率,只需根据路径抽样数量度每个节点的路径进行抽样,获取每个节点的抽样路径,再根据跳转阈值可以获取到抽样路径中的关联节点,随后计算每个节点到抽样路径中的关联节点的跳转概率即可。因为只计算了每个节点到关系拓扑图中的部分节点的跳转概率,而无需计算每个节点到关系拓扑图中所有节点的跳转概率,这样可以减少大量的计算,从而可以减少时间消耗以及空间消耗,且对于路径抽样数量以及每个节点的跳转次数是可以人为控制调整的,所以进行抽样后所得到的结果也可以控制在误差范围内;同时,由于对数据进行了抽样,所以在用户群即数据规模庞大时,MCL抽样游走方法也可以快速完成计算并得到高准确率的结果。
其中,本申请中的路径抽样数量为非零的正整数,路径抽样数量可以为人为规定的一个数值,也可以为服务器在数值允许范围内,随机生成的一个数值。根据路径抽样数量,可以在用户群对应的关系拓扑图中,获取每个节点k对应的抽样路径,抽样路径是指在以节点k为起始节点的路径中,抽取出路径抽样数量对应的部分路径。根据跳转阈值,可以在每个节点k的抽样路径中,确定出每个节点k的关联节点,其中,该关联节点是在抽样路径中,除节点k以外的节点,例如可以指从节点k开始,在跳转阈值内(含跳转阈值)进行跳转可到达的节点,如,以上述图6D所对应实施例中的关系拓扑图为例,在图6D的关系拓扑图中,以节点A为起始节点的路径有路径A-B-C、路径A-B-C以及路径A-C-B,抽样路径数量为1,也就是说需要在节点A的路径中抽取出一条路径出来作为节点A的抽样路径,如路径A-B-C为节点A的抽样路径;跳转阈值为1,也就是说在路径A-B-C中,从节点A开始,从节点A跳转1步可以到达节点B,则在路径A-B-C中,可将节点B作为节点A的关联节点。该关联阈值是指在抽样路径中,对跳转步数的最大限制,对于关系拓扑图中的每个节点k,将节点k作为起始节点从跳转步数为1开始跳转,每次跳转的步数进行递增,如,节点c的一条抽样路径为c-e-g-k-i-j,跳转阈值为4,则以节点c开始,从节点c跳转1步可以到达节点e,将跳转步数进行加1后,跳转步数1递增 变为2,则跳转2步可以到达节点g(经过节点e到达节点g),将跳转步2递增则变为3,则跳转3步(经过节点e与节点g)可以到达节点k,将跳转步数3递增变为4,则跳转4步(经过节点e、节点g以及节点k)可以到达节点i,则在节点c的抽样路径c-e-g-k-i-j中,可以将节点e、节点g、节点k以及节点i均确定为节点c的关联节点。
步骤S203,根据上述关系拓扑图中的边权重,确定上述节点k与上述抽样路径中的关联节点之间的跳转概率;上述关联节点是指上述抽样路径中除上述节点k以外的节点。
本申请中,根据用户群对应的关系拓扑图中的边权重,可以确定节点k与该关联节点的跳转概率,例如的,若节点k与关联节点之间不具有边,则在节点k的抽样路径中,可以获取到节点k与节点k的关联节点之间的中间节点,该节点k可以通过该中间节点达到关联节点,在该节点k、该中间节点以及该关联节点中,可将具有边的两个节点,作为连接节点对,根据该连接节点对对应的边权重,可以确定出节点k与关联节点之间的跳转概率。
以图6D为例,节点A的抽样路径为A-B-D,跳转阈值为3,跳转步数可为1与2,则节点A的关联节点为节点B与节点D,其中,节点A与节点D之间不具有边,但节点A可以通过节点B到达节点D,则可将节点B作为节点A与节点D之间的中间节点,节点A与节点B之间具有边,节点B与节点C之间具有边,则可将节点A与节点B作为连接节点对AB,可将节点B与节点C作为连接节点对BC,根据上述概率矩阵A3,可得到连接节点对AB之间的边权重为0.36,连接节点对BC之间的边权重为0.8,则节点A与节点C之间的跳转概率可为0.36×0.8=0.288。
步骤S204,根据上述跳转概率更新上述关系拓扑图,得到更新后的关系拓扑图,在上述更新后的关系拓扑图中确定上述目标用户集合。
本申请中,根据跳转概率,可以更新上述关系拓扑图,即,根据节点k与关联节点,可以对上述关系拓扑图中所连接的边进行更新,也就是说,将每个节点k与其不具有边的关联节点,进行边连接(在关系拓扑图中增加了新的边),可得到过渡关系拓扑图。如,以图6D所对应实施例为例,节点A的关联节点为节点B与节点D,其中,节点A可通过节点B到达节点D,则可将节点A与节点D进行边连接,并将边加上方向,用于指示该边是由节点A到节点D的。在该过渡关系拓扑图中,可将节点k与关联节点之间的跳转概率,设置为节点k与关联节点之间的边权重,得到目标关系拓扑图,该目标关系拓扑图即为更新后的关系拓扑图。
以图6D所对应实施例为例,节点A的抽样路径为A-B-D,可根据上述概率矩阵A3得到节点A到节点D的跳转概率为0.36×0.9=0.324,节点B的抽样路径为B-A-C,可得到节点B到节点C的跳转概率为0.83×0.1=0.083,节点C的抽样路径为C-A-B-D,可得到节点C都节点B的跳转概率为0.08×0.36=0.029,节点C到节点D的跳转概率为0.08×0.36×0.9=0.026,节点D的抽样路径为D-B-A,则节点D到节点A的跳转概率为0.32×0.83=0.266。将该跳转概率作为边权重,则可将上述概率矩阵A3进行更新,可得到一个用于表征节点A、节点B、节点C以及节点D之间的关联关系以及关联程度的概率矩阵A4,概率矩阵A4如下列矩阵所示:
Figure PCTCN2020126055-appb-000008
  概率矩阵A4
概率矩阵A4为4×4的矩阵,上述概率矩阵A4中的元素0,表示节点之间无法到达。如,以元素M13(即节点A到节点C的边权重)为例,虽在概率矩阵A3中,节点A到节点C存在概率0.1(也就是节点A可以到达节点C,节点A与节点C之间存在边),但因为对节点A的抽取路径为A-B-D,则对节点A的其他未抽取路径不再考虑,只需要考虑节点A到节点B以及节点A到节点D(即概率矩阵A4中的元素M12以及元素M14)。
进一步地,在上述目标关系拓扑图中,可以将该目标关系拓扑图中的边权重(跳转概率)进行凸的变换,即,将边权重进行指数增长,并将进行指数增长后得到的跳转概率进行概率转换(即标准化处理)。通过凸的变换后,可以得到目标概率。根据该目标概率可更新节点k以及节点k的关联节点之间的边权重,在这些更新后的边权重中,若存在大于权重阈值的关联节点,则可将该更新后的边权重大于或等于权重阈值的关联节点,确定为节点k的重要关联节点,根据节点k、以及节点k的重要关联节点,可将该目标关系拓扑图划分为至少两个社区拓扑图,在该至少两个社区拓扑图中获取目标社区拓扑图,可作为目标用户集合。
其中,对跳转概率进行指数增长,并将进行指数增长后得到的跳转概率进行概率转换(标准化处理),即对跳转概率进行凸的变换,得到目标概率的例如方法可以如公式(6)所示:
Figure PCTCN2020126055-appb-000009
其中,Γ r(M ij)用于表示节点i到节点j的目标概率,Mij用于表示节点i到节点j的边权重,(M ij) r用于表示节点i到节点j的边权重进行r次指数增长,
Figure PCTCN2020126055-appb-000010
表示n个节点到节点j的边权重分别进行r次指数增长后的权重之和。
以上述概率矩阵A4以及r为3为例,对于节点B到节点A的目标概率(即Γ r(M 21)),可以先将M 21进行3次指数增长,即0.83×0.83×0.83=0.572,元素M 11、元素M 21、元素M 31以及元素M 41分别进行3次指数增长后的和为0 3+0.83 3+0.08 3+0.266=0.591,则Γ r(M 21)可为0.572/0.591=0.968;对于节点D到节点A的目标概率(即Γ r(M 41)),可以先将M 41进行3次指数增长,即0.266×0.266×0.266=0.019,元素M 11、元素M 21、元素M 31以及元素M 41分别进行3次指数增长后的和为0 3+0.83 3+0.08 3+0.266=0.591,则Γ r(M 41)可为0.019/0.591=0.032。元素M 21为0.83,进行指数增长并标准化处理后的值为0.968,元素M 41为0.266,进行指数增长并标准化处理后的值为0.032,可以看出,通过对元素进行指数增长并标准化处理,可以使元素(边权重)大的值变得更大(如0.83变为0.968),可以使元素(边权重)小的值变得更小(如0.266变为0.032)。也就是说,本方案通过MCL抽样游走方法以及凸的变换,可以使用户之间的关联程度紧密的变得更紧密,使用户之间的关联程度弱的变得更弱,这样更有利于社区的划分,使得划分的结果更准确。
一些实施例中,在划分社区拓扑图前,可以设置迭代次数,以使获取抽样路径到计算目标概率的步骤可以重复进行多次,也就是说,第一次进行对每个节点k的随机抽样,再计算得到节点之间的目标概率后,可将该目标概率作为节点之间的边权重,进行第二次随机抽样并计算节点之间的目标概率,在第二次的抽样路径中,可以以目标概率为边权重来计算节点之间的新的目标概率,这样重复直到达到迭代次数后,可将最终的目标概率确定为稳定的概率,再通过稳定的目标概率进行社区拓扑图的划分。
上述可知,通过将具有社交关联关系的用户划分至目标用户集合中,在确定出该目标用户集合中的异常用户,且该目标用户集合为异常状态时,可以在该目标用户集合中获取到与该异常用户具有社交关联关系的用户,直接将上述与该异常用户具有社交关联关系的用户作为扩散异常用户,无需再对每一个用户都进行一次特征匹配,通过社交关联关系即可进行扩散异常用户的识别,因此,即使扩散异常用户具有与非异常用户相同的特征,但由于该扩散异常用户与异常用户具有社交关联关系,依然可以被识别出来,从而可以提高识别的准确率。
为便于理解,进一步地,请参见图7,是本申请实施例提供的一种划分社区拓扑图的场景示意图。如图7所示,业务服务器1000可以将终端A对应的用户a、终端B对应的用户b、…、终端K对应的用户k,确定为一个用户群{a,b,c,e,f,g,i,j,k},业务服务器1000可以将用户群中的每一个用户都作为一个节点,根据用户之间的社交关联关系,业务服务器1000可以在节点之间进行边连接,生成用户群{a,b,c,e,f,g,i,j,k}所分别对应的关系拓扑图,后续,根据用户之间的社交行为记录,可以对该关系拓扑图中的边确定边权重,如图7所示,节点c与节点e的边权重为0.7、节点e与节点d的边权重为0.8、节点e与节点g的边权重为0.6、节点g与节点k的边权重为0.5、节点k与节点i的边权重为0.4、节点i与节点j的边权重为0.8、节点i与节点a的边权重为0.7、节点i与节点b的边权重为0.5,根据抽样路径数量2,业务服务器1000可以对关系拓扑图(抽样前)20a中的节点进行路径抽样,得到每个节点对应的抽样路径,以下以节点b为例,其他节点与节点b的获取抽样路径的方式一致,这里将不再进行赘述。以节点b为起始节点的路径有b-i-j、b-i-a、b-i-k-g-e-c以及b-i-k-g-e-d共4条,业务服务器1000可以在b-i-j、b-i-a、b-i-k-g-e-c以及b-i-k-g-e-d这4条路径中,抽取出b-i-j以及b-i-k-g-e-c这2条路径,并将b-i-j以及b-i-k-g-e-c作为节点b的抽样路径,后续,业务服务器1000可以获取到跳转阈值为2,根据跳转阈值2,如图7所示,在b-i-j这条抽样路径中,在节点b的位置处跳2次(从节点b跳至与节点b相连接的节点i,再从节点i跳至与节点i相连接的节点j),可以到达节点j,即节点b与节点j之间虽不具有边,但具有间接连接的关系,则业务服务器1000可以将节点b与节点j之间进行边连接,并将该边加上方向,用于表示这条边是由节点b到达节点j的,根据节点b与节点i的边权重0.5,以及节点i与节点j的边权重0.8,业务服务器1000可以得到节点b与节点j的边权重为0.4;在b-i-k-g-e-c这条抽样路径中,以节点b位置开始,跳2次可以到达的节点为节点k,则在b-i-k-g-e-c这条抽样路径中,虽然节点g、节点e以及节点c都在这条抽样路径中,但业务服务器1000无需计算节点b与节点g、节点e以及节点c之间的跳转概率,只需计算节点b至节点k之间的跳转概率,根据节点b与节点i的边权重0.5,以及节点i与节点k的边权重0.4,业务服务器1000可以得到节点b到达节点k的跳转概率为0.2,业务服务器1000可以将节点b与节点k进行边连接,并将该边加上方向,用于表示这条边是由节点b到达节点j的,并将0.2作为节点b与节点k的边权重,业务服务器1000可以将抽样路径中的除节点b以外的节点(即节点i、节点j以及节点k)作为节点b的关联节点,则对节点b进行路径抽样后,可以得到节点b与节点b的关联节点(即节点i、节点j以及节点k)的边权重分别为0.5(节点b到节点i)、0.4(节点b到节点j)以及0.2(节点b到节点)。同理,业务服务器1000可以得到其他节点的抽样路径以及其他节点到达关联节点的跳转概率,则每个节点的抽样 路径以及该节点到达其关联节点的跳转概率可以如表1所示:
表1
Figure PCTCN2020126055-appb-000011
在表1中,列数据为起始节点,行数据为到达节点,以节点a为例,节点a到达节点b的跳转概率为0.35,节点a到达节点i的跳转概率为0.7,节点a到达节点k的跳转概率为0.28,从表1可以看出,大于或等于权重阈值0.5的边权重有:节点a到节点i的跳转概率为0.7,节点b到节点i的跳转概率为0.5,节点c到节点d的跳转概率为0.56,节点c到节点e的跳转概率为0.7,节点d到节点c的跳转概率为0.56,节点d到节点e的跳转概率为0.8,节点e到节点d的跳转概率为0.8,节点e到节点g的跳转概率为0.6,节点g到节点k的跳转概率为0.5,节点i到节点a的跳转概率为0.7,节点j到节点a的跳转概率为0.7,节点j到节点i的跳转概率为0.8,则业务服务器1000可以将跳转概率作为每条边的边权重,得到目标关系拓扑图(抽样后)20b,可以将边权重大于权重阈值的节点划分至一个社区中,即业务服务器1000可以将节点c、节点e、节点d、节点g以及节点k划分至一个社区中,将节点i、节点j、节点a以及节点b划分至一个社区中,由此,可以由目标关系拓扑图(抽样后)20b来得到社区拓扑图(即社区)200a与社区拓扑图(即社区)200b,如图7所示,可以看出,社区200a与社区200b中的节点之间的边权重都小于了权重阈值或者两个节点之间不具有边(也就是两个社区中的用户之间的关联程度低),如,以节点k与节点i为例,节点k与节点i的边权重为0.4,小于权重阈值0.5,则可以表明节点k对应的用户k与节点i对应的用户i之间的关联程度低,可以将用户k与用户i划分至不同的社区中,以节点c与节点j为例,节点c与节点j之间并未有边,则表1中不具有节点c到节点j或节点j到节点c的跳转概率,可以表明节点c与节点j之间的关联程度低,可以将节点c与节点j划分至不同的社区中。
图8是本申请实施例提供的一种确定处于异常状态的目标用户集合的异常类别的流程示意图。如图8所示,该流程可以包括以下步骤。
步骤S301,将上述处于异常状态的上述目标用户集合确定为待识别用户集合。
步骤S302,获取上述待识别用户集合中用户的用户文本数据,在上述用户文本数据中提取出关键文本数据。
本申请中,用户文本数据可以为用户在进行转账时的备注信息以及在进行通话时的对话信息等,可以对用户文本数据进行关键字识别,以提取出关键文本数据。如,用户在转账时的备注信息为“赌债偿还”,则可以提取关键字“赌债”。
步骤S303,获取敏感源数据。
本申请中,敏感源数据为预设的异常类别集合,该敏感源数据可以包括赌博、套现、诈骗、抢劫、偷窃等异常类别。
步骤S304,将上述关键文本数据与上述敏感源数据进行匹配,根据匹配结果确定上述待识别用户集合的异常类别。
上述可知,通过将具有社交关联关系的用户划分至目标用户集合中,在确定出该目标用户集合中的异常用户,且该目标用户集合为异常状态时,可以在该目标用户集合中获取到与该异常用户具有社交关联关系的用户,直接将上述与该异常用户具有社交关联关系的用户作为扩散异常用户,无需再对每一个用户都进行一次特征匹配,通过社交关联关系即可进行扩散异常用户的识别,因此,即使扩散异常用户具有与非异常用户相同的特征,但由于该扩散异常用户与异常用户具有社交关联关系,依然可以被识别出来,从而可以提高识别的准确率。
本申请中,可将上述关键文本数据与上述敏感源数据进行匹配,如,关键文本数据为“赌债”,与敏 感源数据进行匹配后,可得到“赌债”与“赌博”的匹配率可达到90%,则可将该待识别用户集合的异常类别确定为“赌博”。
图9是本申请实施例提供的一种数据识别装置的结构示意图。上述数据识别装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如该数据识别装置为一个应用软件;该装置可以用于执行本申请实施例提供的方法中的相应步骤。如图9所示,该数据识别装置1可以包括:目标用户集合获取模块11、异常用户确定模块12、行为状态检测模块13以及扩散异常用户识别模块14。
目标用户集合获取模块11,用于获取目标用户集合;上述目标用户集合中包括至少两个具有社交关联关系的用户;
异常用户确定模块12,用于获取默认异常用户,根据上述默认异常用户确定上述目标用户集合中的异常用户;
行为状态检测模块13,用于根据上述异常用户,确定上述目标用户集合的状态;
扩散异常用户识别模块14,用于若上述目标用户集合的状态为异常状态,则根据上述异常用户与上述目标用户集合中的待确认用户之间的社交关联关系,在上述待确认用户中识别扩散异常用户;上述待确认用户为上述目标用户集合中除上述异常用户以外的用户。
其中,目标用户集合获取模块11、异常用户确定模块12、行为状态检测模块13以及扩散异常用户识别模块14的例如实现方式可以参见上述图3所对应实施例中的步骤S101-步骤S104的描述,这里将不再进行赘述。
请参见图9,异常用户确定模块12可以包括:异常用户确定单元121。
异常用户确定单元122,用于将上述目标用户集合中的用户与上述默认异常用户进行匹配,将上述目标用户集合中匹配率达到匹配阈值的用户确定为上述目标用户集合中的异常用户。
其中,异常用户确定单元121的例如实现方式可以参见上述图4所对应实施例中步骤S102的描述,这里将不再进行赘述。
请参见图9,行为状态检测模块13可以包括:用户总数量获取单元131、异常浓度确定单元132以及第一状态确定单元133。
用户总数量获取单元131,用于获取上述异常用户的数量,获取上述目标用户集合中用户的总数量;
异常浓度确定单元132,用于根据上述异常用户的数量以及上述目标用户集合中用户的总数量,确定上述目标用户集合的异常浓度;
第一状态确定单元133,用于若上述异常浓度小于浓度阈值,则将上述目标用户集合的状态确定为正常状态;
上述第一状态确定单元133,还用于若上述异常浓度大于或等于浓度阈值,则将上述目标用户集合的状态确定为异常状态。
其中,用户总数量获取单元131、异常浓度确定单元132以及第一状态确定单元133的例如实现方式可以参见上述图3所对应实施例中步骤S103的描述,这里将不再进行赘述。
请参见图9,行为状态检测模块13可以包括:行为特征获取单元134、特征分布度确定单元135、特征分布差异度确定单元136以及第二状态确定单元137。
行为特征获取单元134,用于获取用户社交行为特征集合;上述用户社交行为特征集合中包括上述用户群中每个用户的社交行为特征;
特征分布度确定单元135,用于根据上述用户社交行为特征集合中的社交行为特征,确定上述异常用户的第一特征分布度;上述第一特征分布度用于表征上述异常用户具备的社交行为特征的种类数;
上述特征分布度确定单元135,还用于根据上述用户社交行为特征集合中的社交行为特征,确定上述目标用户集合中用户的第二特征分布度;上述第二特征分布度用于表征上述目标用户集合中用户具备的社交行为特征的种类数;
特征分布差异度确定单元136,用于根据上述第一特征分布集中度以及上述第二特征分布度,确定上述异常用户与上述目标用户集合中的用户之间的特征分布差异度;
第二状态确定单元137,用于根据上述第一特征分布度以及上述特征分布差异度,确定上述目标用户集合的状态。
其中,上述第二状态确定单元137,还用于若上述特征分布差异度小于差异度阈值,且上述第一特征分布度小于分布阈值,则将上述目标用户集合的状态确定为正常状态;
上述第二状态确定单元137,还用于若上述特征分布差异度大于或等于上述差异度阈值,且上述第一特征分布度大于或等于上述分布阈值,则将上述目标用户集合的状态确定为正常状态;
上述第二状态确定单元137,还用于若上述特征分布差异度大于或等于上述差异度阈值,且上述第一特征分布度小于上述分布阈值,则将上述目标用户集合的状态确定为异常状态。
其中,行为特征获取单元134、特征分布度确定单元135、特征分布差异度确定单元136以及第二 状态确定单元137的例如实现方式可以参见上述图3所对应实施例中步骤S103的描述,这里将不再进行赘述。
请参见图9,目标用户集合获取模块11可以包括:关系拓扑图获取单元111、抽样路径获取单元112、跳转概率确定单元113以及目标用户集合确定单元114。
关系拓扑图获取单元111,用于获取用户群对应的关系拓扑图;上述关系拓扑图包括N个节点k,N个上述节点k与上述用户群中的用户一一对应,N为上述用户群中的用户数;两个节点k之间的边权重是基于上述用户群中的两个用户之间的社交关联关系所确定的;
抽样路径获取单元112,用于根据路径抽样数量,在上述关系拓扑图中获取上述节点k对应的抽样路径;
跳转概率确定单元113,用于根据上述关系拓扑图中的边权重,确定上述节点k与上述抽样路径中的关联节点之间的跳转概率;上述关联节点是指上述抽样路径中除上述节点k以外的节点;
目标用户集合确定单元114,用于根据上述跳转概率更新上述关系拓扑图,得到更新后的关系拓扑图,在上述更新后的关系拓扑图中确定上述目标用户集合。
其中,关系拓扑图获取单元111、抽样路径获取单元112、跳转概率确定单元113以及目标用户集合确定单元114的例如实现方式可以参见上述图3所对应实施例中步骤S101的描述,这里将不再进行赘述。
请参见图9,关系拓扑图获取单元111可以包括:用户群获取子单元1111、权重设置子单元1112、概率转换子单元1113以及关系拓扑图生成子单元1114。
用户群获取子单元1111,用于获取用户群,将上述用户群中的每个用户均作为节点k;
权重设置子单元1112,用于在具有社交关联关系的用户所对应的节点k之间进行边连接,根据上述具有社交关联关系的用户之间的社交行为记录,对上述节点k之间的边设置初始权重;
概率转换子单元1113,用于将上述初始权重进行概率转换,得到上述边权重;
关系拓扑图生成子单元1114,用于根据上述用户群对应的节点k以及上述边权重,生成上述关系拓扑图。
其中,用户群获取子单元1111、权重设置子单元1112、概率转换子单元1113以及关系拓扑图生成子单元1114的例如实现方式可以参见上述图3所对应实施例中步骤S101中获取关系拓扑图的描述,这里将不再进行赘述。
请参见图9,跳转概率确定单元113可以包括:中间节点获取子单元1131、连接节点对确定子单元1132以及跳转概率确定子单元1133。
中间节点获取子单元1131,用于若上述节点k与上述关联节点之间不具有边,则在上述抽样路径中获取上述节点k与上述关联节点之间的中间节点;上述节点k通过上述中间节点可到达上述关联节点;
连接节点对确定子单元1132,用于在上述节点k、上述中间节点以及上述关联节点中,将具有边的两个节点,作为连接节点对,获取上述连接节点对对应的边权重;
跳转概率确定子单元1133,用于根据上述连接节点对对应的边权重,确定上述节点k与上述关联节点之间的跳转概率。
其中,中间节点获取子单元1131、连接节点对确定子单元1132以及跳转概率确定子单元1133的例如实现方式可以参见上述图3所对应实施例中步骤S101中对于确定跳转概率的描述,这里将不再进行赘述。
请参见图9,目标用户集合确定单元114可以包括:更新节点边子单元1141、边权重设置子单元1142以及目标用户集合确定子单元1143。
更新节点边子单元1141,用于根据上述节点k和上述关联节点,对上述关系拓扑图中所连接的边进行更新,得到过渡关系拓扑图;上述过渡关系拓扑图中的上述节点k与上述关联节点均连接有边;
边权重设置子单元1142,用于在上述过渡关系拓扑图中,将上述节点k与上述关联节点之间的跳转概率,设置为上述节点k与上述关联节点之间的边权重,得到目标关系拓扑图;
目标用户集合确定子单元1143,用于在上述目标关系拓扑图中确定上述目标用户集合。
其中,上述目标用户集合确定子单元1143,还用于将上述跳转概率进行指数增长,将进行指数增长后得到的跳转概率进行概率转换,得到目标概率,根据上述目标概率更新上述节点k与上述关联节点之间的边权重;
上述目标用户集合确定子单元1143,还用于将更新后的边权重大于权重阈值的关联节点,确定为上述节点k的重要关联节点;
上述目标用户集合确定子单元1143,还用于根据上述节点k和上述重要关联节点,将上述目标关系拓扑图划分为至少两个社区拓扑图,在上述至少两个社区拓扑图中获取目标社区拓扑图,作为上述目标用户集合。
其中,更新节点边子单元1141、边权重设置子单元1142以及目标用户集合确定子单元1143的例如实现方式可以参见上述图3所对应实施例中步骤S101的描述,这里将不再进行赘述。
请参见图9,扩散异常用户识别模块14可以包括:第一关联关系用户确定单元141以及第一扩散异常用户确定单元142。
第一关联关系用户确定单元141,用于若上述目标用户集合的状态为异常状态,则在上述待确认用户中确定出与上述异常用户具有社交关联关系的用户;
第一扩散异常用户确定单元142,用于将上述与上述异常用户具有社交关联关系的用户确定为上述扩散异常用户。
其中,第一关联关系用户确定单元141以及第一扩散异常用户确定单元142的例如实现方式可以参见上述图3所对应实施例中步骤S104的描述,这里将不再进行赘述。
请参见图9,扩散异常用户识别模块14可以包括:第二关联关系用户确定单元143以及第二扩散异常用户确定单元144。
第二关联关系用户确定单元143,用于若上述目标用户集合的状态为异常状态,则在上述待确认用户中确定出与上述异常用户具有社交关联关系的用户;
第二扩散异常用户确定单元144,用于获取上述异常用户对应的异常用户节点,获取上述与上述异常用户具有社交关联关系的用户对应的关联用户节点,将上述异常用户节点与上述关联用户节点之间的边权重大于关联阈值的关联用户节点,确定为扩散异常节点,将上述扩散异常节点对应的用户确定为上述扩散异常用户。
其中,第二关联关系用户确定单元143以及第二扩散异常用户确定单元144的例如实现方式可以参见上述图3所对应实施例中步骤S104的描述,这里将不再进行赘述。
请参见图9,该数据识别装置1可以包括目标用户集合获取模块11、异常用户确定模块12、行为状态检测模块13以及扩散异常用户识别模块14,还可以包括:待识别用户集合确定模块15、关键文本数据提取模块16、敏感源数据获取模块17以及异常类别确定模块18。
待识别用户集合确定模块15,用于将上述处于异常状态的上述目标用户集合确定为待识别用户集合;
关键文本数据提取模块16,用于获取上述待识别用户集合中用户的用户文本数据,在上述用户文本数据中提取出关键文本数据;
敏感源数据获取模块17,用于获取敏感源数据;
异常类别确定模块18,用于将上述关键文本数据与上述敏感源数据进行匹配,根据匹配结果确定上述待识别用户集合的异常类别。
其中,待识别用户集合确定模块15、关键文本数据提取模块16、敏感源数据获取模块17以及异常类别确定模块18的例如实现方式可以参见上述图5所对应实施例中步骤S201-步骤S204的描述,这里将不再进行赘述。
本申请实施例通过获取目标用户集合;上述目标用户集合中包括至少两个具有社交关联关系的用户;获取默认异常用户,根据上述默认异常用户确定上述目标用户集合中的异常用户;根据上述异常用户,确定上述目标用户集合的状态;若上述目标用户集合的状态为异常状态,则根据上述异常用户与上述目标用户集合中的待确认用户之间的社交关联关系,在上述待确认用户中识别扩散异常用户;上述待确认用户为上述目标用户集合中除上述异常用户以外的用户。上述可知,通过将具有社交关联关系的用户划分至目标用户集合中,在确定出该目标用户集合中的异常用户,且该目标用户集合为异常状态时,可以在该目标用户集合中获取到与该异常用户具有社交关联关系的用户,直接将上述与该异常用户具有社交关联关系的用户作为扩散异常用户,无需再对每一个用户都进行一次特征匹配,通过社交关联关系即可进行扩散异常用户的识别,从而,即使扩散异常用户具有与非异常用户相同的特征,但由于该扩散异常用户与异常用户具有社交关联关系,依然可以被识别出来,从而可以提高识别的准确率。
进一步地,请参见图10,是本申请实施例提供的一种计算机设备的结构示意图。如图10所示,上述图9所对应实施例中的装置1可以应用于上述计算机设备1000,上述计算机设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,上述计算机设备1000还包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004一些实施例中可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005一些实施例中还可以是至少一个位于远离前述处理器1001的存储装置。如图10所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
在图10所示的计算机设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现:
获取目标用户集合;上述目标用户集合中包括至少两个具有社交关联关系的用户;
获取默认异常用户,根据上述默认异常用户确定上述目标用户集合中的异常用户;
根据上述异常用户,确定上述目标用户集合的状态;
若上述目标用户集合的状态为异常状态,则根据上述异常用户与上述目标用户集合中的待确认用户之间的社交关联关系,在上述待确认用户中识别扩散异常用户;上述待确认用户为上述目标用户集合中除上述异常用户以外的用户。
应当理解,本申请实施例中所描述的计算机设备1000可执行前文图3到图8所对应实施例中对该视频数据处理方法的描述,也可执行前文图9所对应实施例中对该视频数据处理装置1的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且上述计算机可读存储介质中存储有前文提及的数据处理的计算机设备1000所执行的计算机程序,且上述计算机程序包括程序指令,当上述处理器执行上述程序指令时,能够执行前文图3到图8所对应实施例中对上述数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
上述计算机可读存储介质可以是前述任一实施例提供的数据识别装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本申请实施例的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、装置、产品或设备固有的其他步骤单元。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例提供的方法及相关装置是参照本申请实施例提供的方法流程图和/或结构示意图来描述的,例如可由计算机程序指令实现方法流程图和/或结构示意图的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。这些计算机程序指令可提供到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或结构示意图一个方框或多个方框中指定的功能的装置。这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或结构示意图一个方框或多个方框中指定的功能。这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或结构示意一个方框或多个方框中指定的功能的步骤。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (18)

  1. 一种数据识别方法,由计算设备执行包括:
    从多个用户中确定目标用户集合;所述目标用户集合中包括至少两个具有社交关联关系的用户,且所述目标用户集合中用户之间的社交关联关系的紧密度高于所述目标用户集合中的用户与非所述目标用户集合中的用户的社交关联关系的紧密度;
    获取默认异常用户,根据所述默认异常用户确定所述目标用户集合中的异常用户;
    根据所述异常用户,确定所述目标用户集合的状态;
    若所述目标用户集合的状态为异常状态,则根据所述异常用户与所述目标用户集合中的待确认用户之间的社交关联关系,在所述待确认用户中识别扩散异常用户;所述待确认用户为所述目标用户集合中除所述异常用户以外的用户。
  2. 根据权利要求1所述的方法,其中,所述获取默认异常用户,根据所述默认异常用户确定所述目标用户集合中的异常用户,包括:
    将所述目标用户集合中的用户与所述默认异常用户进行匹配,将匹配率达到匹配阈值的用户确定为所述目标用户集合中的异常用户。
  3. 根据权利要求1所述的方法,其中,所述根据所述异常用户,确定所述目标用户集合的状态,包括:
    获取所述异常用户的数量,获取所述目标用户集合中用户的总数量;
    根据所述异常用户的数量以及所述目标用户集合中用户的总数量,确定所述目标用户集合的异常浓度;
    若所述异常浓度小于浓度阈值,则将所述目标用户集合的状态确定为正常状态;
    若所述异常浓度大于或等于浓度阈值,则将所述目标用户集合的状态确定为异常状态。
  4. 根据权利要求1所述的方法,其中,所述根据所述异常用户,确定所述目标用户集合的状态,包括:
    获取用户社交行为特征集合;所述用户社交行为特征集合中包括所述用户群中每个用户的社交行为特征;
    根据所述用户社交行为特征集合中的社交行为特征,确定所述异常用户的第一特征分布度;所述第一特征分布度用于表征所述异常用户具备的社交行为特征的种类数;
    根据所述用户社交行为特征集合中的社交行为特征,确定所述目标用户集合中用户的第二特征分布度;所述第二特征分布度用于表征所述目标用户集合中用户具备的社交行为特征的种类数;
    根据所述第一特征分布集中度以及所述第二特征分布度,确定所述异常用户与所述目标用户集合中的用户之间的特征分布差异度;
    根据所述第一特征分布度以及所述特征分布差异度,确定所述目标用户集合的状态。
  5. 根据权利要求4所述的方法,其中,所述根据所述第一特征分布度以及所述特征分布差异度,确定所述目标用户集合的状态,包括:
    若所述特征分布差异度小于差异度阈值,且所述第一特征分布度小于分布阈值,则将所述目标用户集合的状态确定为正常状态;
    若所述特征分布差异度大于或等于所述差异度阈值,且所述第一特征分布度大于或等于所述分布阈值,则将所述目标用户集合的状态确定为正常状态;
    若所述特征分布差异度大于或等于所述差异度阈值,且所述第一特征分布度小于所述分布阈值,则将所述目标用户集合的状态确定为异常状态。
  6. 根据权利要求1所述的方法,其中,从多个用户中确定所述目标用户集合包括:
    根据收集的所述多个用户之间的社交关系和社交行为,将所述多个用户划分为至少两个用户集合,使得每个用户集合中的用户之间的社交关联关系的紧密度高于不同用户集合中的用户之间的社交关联关系的紧密度;
    将所述多个用户集合中的一个用户集合作为所述目标用户集合。
  7. 根据权利要求6所述的方法,其中,所述将多个用户分为多个用户集合,包括:
    根据所述多个用户之间的社交关系和社交行为确定关系拓扑图,所述关系拓扑图中,每个节点对应所述多个用户中的一个用户,连接两个节点的边表示所述两个节点对应的用户之间具有社交关系;
    根据所述多个用户之间的社交关系和社交行为确定两个用户之间的社交关联关系的紧密度,根据所述紧密度确定所述两个用户对应的节点之间的边的权重;
    利用聚类算法将所述关系拓扑图划分为至少两个子拓扑图,将所述至少两个子拓扑图中的一个子拓扑图中的节点对应的用户的集合作为所述目标用户集合。
  8. 根据权利要求7所述的方法,其中,所述利用聚类算法将所述关系拓扑图划分为至少两个子拓扑图,包括:
    根据路径抽样数量,在所述关系拓扑图中获取第一节点对应的抽样路径;
    根据所述关系拓扑图中的边权重,确定所述第一节点与所述抽样路径中的关联节点之间的跳转概率;所述关联节点是指所述抽样路径中除所述第一节点以外的节点;
    根据所述跳转概率更新所述关系拓扑图,得到更新后的关系拓扑图,对所述更新后的关系拓扑图进行划分得到所述至少两个子拓扑图。
  9. 根据权利要求7所述的方法,其中,所述根据所述紧密度确定所述两个用户对应的节点之间的边的权重,包括:
    将所述紧密度设置为所述两个节点之间的边的初始权重;
    将所述初始权重进行概率转换,得到所述边权重。
  10. 根据权利要求8所述的方法,其中,所述根据所述关系拓扑图中的边权重,确定所述第一节点与所述抽样路径中的关联节点之间的跳转概率,包括:
    若所述第一节点与所述关联节点之间不具有边,则在所述抽样路径中获取所述第一节点与所述关联节点之间的中间节点;所述第一节点通过所述中间节点可到达所述关联节点;
    在所述第一节点、所述中间节点以及所述关联节点中,将具有边的两个节点,作为连接节点对,获取所述连接节点对对应的边权重;
    根据所述连接节点对对应的边权重,确定所述第一节点与所述关联节点之间的跳转概率。
  11. 根据权利要求8所述的方法,其中,所述根据所述跳转概率更新所述关系拓扑图,包括:
    根据所述第一节点和所述关联节点,对所述关系拓扑图中所连接的边进行更新,得到过渡关系拓扑图;所述过渡关系拓扑图中的所述第一节点与所述关联节点均连接有边;
    在所述过渡关系拓扑图中,将所述第一节点与所述关联节点之间的跳转概率,设置为所述第一节点与所述关联节点之间的边权重,得到所述更新后的关系拓扑图。
  12. 根据权利要求8所述的方法,其中,对所述更新后的关系拓扑图进行划分得到所述至少两个子拓扑图,包括:
    将所述跳转概率进行指数增长,将进行指数增长后得到的跳转概率进行概率转换,得到目标概率,根据所述目标概率更新所述第一节点与所述关联节点之间的边权重;
    将更新后的边权重大于权重阈值的关联节点,确定为所述第一节点的重要关联节点;
    根据所述第一节点和所述重要关联节点,将所述目标关系拓扑图划分为至少两个子拓扑图。
  13. 根据权利要求1所述的方法,其中,所述若所述目标用户集合的状态为异常状态,则根据所述异常用户与所述目标用户集合中的待确认用户之间的社交关联关系,在所述待确认用户中识别扩散异常用户,包括:
    若所述目标用户集合的状态为异常状态,则在所述待确认用户中确定出与所述异常用户具有社交关联关系的用户;
    将所述与所述异常用户具有社交关联关系的用户确定为所述扩散异常用户。
  14. 根据权利要求7所述的方法,其中,所述若所述目标用户集合的状态为异常状态,则根据所述异常用户与所述目标用户集合中的待确认用户之间的社交关联关系,在所述待确认用户中识别扩散异常用户,包括:
    若所述目标用户集合的状态为异常状态,则在所述待确认用户中确定出与所述异常用户具有社交关联关系的用户;
    获取所述异常用户对应的异常用户节点,获取所述与所述异常用户具有社交关联关系的用户对应的关联用户节点,将所述异常用户节点与所述关联用户节点之间的边权重大于关联阈值的关联用户节点,确定为扩散异常节点,将所述扩散异常节点对应的用户确定为所述扩散异常用户。
  15. 根据权利要求1所述的方法,进一步包括:
    将所述处于异常状态的所述目标用户集合确定为待识别用户集合;
    获取所述待识别用户集合中用户的用户文本数据,在所述用户文本数据中提取出关键文本数据;
    获取敏感源数据;
    将所述关键文本数据与所述敏感源数据进行匹配,根据匹配结果确定所述待识别用户集合的异常类别。
  16. 一种数据识别装置,包括:
    目标用户集合获取模块,用于获取目标用户集合;上述目标用户集合中包括至少两个具有社交关联关系的用户;
    异常用户确定模块,用于获取默认异常用户,根据上述默认异常用户确定上述目标用户集合中的异常用户;
    行为状态检测模块,用于根据上述异常用户,确定上述目标用户集合的状态;
    扩散异常用户识别模块,用于若上述目标用户集合的状态为异常状态,则根据上述异常用户与上述目标用户集合中的待确认用户之间的社交关联关系,在上述待确认用户中识别扩散异常用户;上述待确认用户为上述目标用户集合中除上述异常用户以外的用户。
  17. 一种计算机设备,包括:处理器和存储器;
    所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至15中任一项所述方法的步骤。
  18. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行如权利要求1至15中任一项所述的方法。
PCT/CN2020/126055 2020-02-11 2020-11-03 一种数据识别方法、装置、设备以及可读存储介质 WO2021159766A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/672,814 US20220172090A1 (en) 2020-02-11 2022-02-16 Data identification method and apparatus, and device, and readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010086855.6A CN111339436B (zh) 2020-02-11 2020-02-11 一种数据识别方法、装置、设备以及可读存储介质
CN202010086855.6 2020-02-11

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/672,814 Continuation US20220172090A1 (en) 2020-02-11 2022-02-16 Data identification method and apparatus, and device, and readable storage medium

Publications (1)

Publication Number Publication Date
WO2021159766A1 true WO2021159766A1 (zh) 2021-08-19

Family

ID=71183384

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/126055 WO2021159766A1 (zh) 2020-02-11 2020-11-03 一种数据识别方法、装置、设备以及可读存储介质

Country Status (3)

Country Link
US (1) US20220172090A1 (zh)
CN (1) CN111339436B (zh)
WO (1) WO2021159766A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339436B (zh) * 2020-02-11 2021-05-28 腾讯科技(深圳)有限公司 一种数据识别方法、装置、设备以及可读存储介质
CN113946758B (zh) * 2020-06-30 2023-09-19 腾讯科技(深圳)有限公司 一种数据识别方法、装置、设备及可读存储介质
CN112370793A (zh) * 2020-11-25 2021-02-19 上海幻电信息科技有限公司 用户账号的风险控制方法及装置
CN112929348B (zh) * 2021-01-25 2022-11-25 北京字节跳动网络技术有限公司 信息处理方法及装置、电子设备和计算机可读存储介质
CN113393250A (zh) * 2021-06-09 2021-09-14 北京沃东天骏信息技术有限公司 一种信息处理方法及装置、存储介质
CN113326178A (zh) * 2021-06-22 2021-08-31 北京奇艺世纪科技有限公司 一种异常账号传播方法、装置、电子设备和存储介质
CN113590798B (zh) * 2021-08-09 2024-03-26 北京达佳互联信息技术有限公司 对话意图识别、用于识别对话意图的模型的训练方法
CN116055385A (zh) * 2022-12-30 2023-05-02 中国联合网络通信集团有限公司 路由方法、管理节点、路由节点及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103581355A (zh) * 2012-08-02 2014-02-12 北京千橡网景科技发展有限公司 用户行为异常处理方法和设备
WO2017037444A1 (en) * 2015-08-28 2017-03-09 Statustoday Ltd Malicious activity detection on a computer network and network metadata normalisation
CN107093090A (zh) * 2016-10-25 2017-08-25 北京小度信息科技有限公司 异常用户识别方法及装置
CN108615119A (zh) * 2018-05-09 2018-10-02 平安普惠企业管理有限公司 一种异常用户的识别方法及设备
CN109255024A (zh) * 2017-07-12 2019-01-22 车伯乐(北京)信息科技有限公司 一种异常用户同党的搜索方法,装置,及系统
CN111339436A (zh) * 2020-02-11 2020-06-26 腾讯科技(深圳)有限公司 一种数据识别方法、装置、设备以及可读存储介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577987A (zh) * 2012-07-20 2014-02-12 阿里巴巴集团控股有限公司 一种风险用户的识别方法和装置
US9092502B1 (en) * 2013-02-25 2015-07-28 Leidos, Inc. System and method for correlating cloud-based big data in real-time for intelligent analytics and multiple end uses
US20180365697A1 (en) * 2017-06-16 2018-12-20 Nec Laboratories America, Inc. Suspicious remittance detection through financial behavior analysis
CN107730262B (zh) * 2017-10-23 2021-09-24 创新先进技术有限公司 一种欺诈识别方法和装置
US11055383B2 (en) * 2017-11-08 2021-07-06 Coupa Software Incorporated Automatically identifying risk in contract negotiations using graphical time curves of contract history and divergence
CN109495378B (zh) * 2018-12-28 2021-03-12 广州华多网络科技有限公司 检测异常帐号的方法、装置、服务器及存储介质
CN110070364A (zh) * 2019-03-27 2019-07-30 北京三快在线科技有限公司 基于图模型检测团伙欺诈的方法和装置、存储介质
CN110555564A (zh) * 2019-09-06 2019-12-10 中国农业银行股份有限公司 一种客户关联风险的预测方法及装置
CN110517097B (zh) * 2019-09-09 2024-02-02 广东莞银信息科技股份有限公司 识别异常用户的方法、装置、设备及存储介质
CN110706026A (zh) * 2019-09-25 2020-01-17 精硕科技(北京)股份有限公司 一种异常用户的识别方法、识别装置及可读存储介质
CN110689084B (zh) * 2019-09-30 2022-03-01 北京明略软件系统有限公司 一种异常用户识别方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103581355A (zh) * 2012-08-02 2014-02-12 北京千橡网景科技发展有限公司 用户行为异常处理方法和设备
WO2017037444A1 (en) * 2015-08-28 2017-03-09 Statustoday Ltd Malicious activity detection on a computer network and network metadata normalisation
CN107093090A (zh) * 2016-10-25 2017-08-25 北京小度信息科技有限公司 异常用户识别方法及装置
CN109255024A (zh) * 2017-07-12 2019-01-22 车伯乐(北京)信息科技有限公司 一种异常用户同党的搜索方法,装置,及系统
CN108615119A (zh) * 2018-05-09 2018-10-02 平安普惠企业管理有限公司 一种异常用户的识别方法及设备
CN111339436A (zh) * 2020-02-11 2020-06-26 腾讯科技(深圳)有限公司 一种数据识别方法、装置、设备以及可读存储介质

Also Published As

Publication number Publication date
CN111339436A (zh) 2020-06-26
US20220172090A1 (en) 2022-06-02
CN111339436B (zh) 2021-05-28

Similar Documents

Publication Publication Date Title
WO2021159766A1 (zh) 一种数据识别方法、装置、设备以及可读存储介质
CN108615119B (zh) 一种异常用户的识别方法及设备
CN110046929B (zh) 一种欺诈团伙识别方法、装置、可读存储介质及终端设备
US11509687B2 (en) Detection of a malicious entity within a network
CN110648195B (zh) 一种用户识别方法、装置、计算机设备
TW201944306A (zh) 確定高風險用戶的方法及裝置
AU2019101565A4 (en) User data sharing method and device
CN111090780A (zh) 可疑交易信息的确定方法及装置、存储介质、电子设备
CN111125118B (zh) 关联数据查询方法、装置、设备及介质
CN116383753A (zh) 基于物联网的异常行为提示方法、装置、设备及介质
CN112861963A (zh) 训练实体特征提取模型的方法、装置和存储介质
CN111340574B (zh) 风险用户的识别方法、装置和电子设备
CN110210884B (zh) 确定用户特征数据的方法、装置、计算机设备及存储介质
US11348115B2 (en) Method and apparatus for identifying risky vertices
CN111401478B (zh) 数据异常识别方法以及装置
CN113630495B (zh) 涉诈订单预测模型训练方法和装置,订单预测方法和装置
CN110457600B (zh) 查找目标群体的方法、装置、存储介质和计算机设备
CN117555905B (zh) 一种业务处理方法、装置、设备、存储介质及程序产品
CN109561406A (zh) 一种sim卡的选择方法、装置、系统、电子设备和介质
CN117439982A (zh) 资源管理方法、装置、计算机设备和存储介质
CN117196628A (zh) 团伙欺诈检测方法、装置、计算机设备和可读存储介质
CN116384742A (zh) 交易风险的检测方法、装置和服务器
CN115423599A (zh) 一种信用评测方法、装置、电子设备及存储介质
CN117035971A (zh) 银行网点的风险控制方法、装置、电子设备及存储介质
CN112215690A (zh) 基于多关联网络的黑产团伙分析方法、装置、计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918968

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 09/12/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20918968

Country of ref document: EP

Kind code of ref document: A1