WO2020224222A1 - 目标群组检测方法、装置、计算机设备及存储介质 - Google Patents

目标群组检测方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020224222A1
WO2020224222A1 PCT/CN2019/118114 CN2019118114W WO2020224222A1 WO 2020224222 A1 WO2020224222 A1 WO 2020224222A1 CN 2019118114 W CN2019118114 W CN 2019118114W WO 2020224222 A1 WO2020224222 A1 WO 2020224222A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
group
matrix
user
similarity
Prior art date
Application number
PCT/CN2019/118114
Other languages
English (en)
French (fr)
Inventor
陈啟柱
陈振
黄剑飞
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2020224222A1 publication Critical patent/WO2020224222A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Definitions

  • the embodiments of the present application relate to the field of network security technologies, and in particular, to a target group detection method, device, computer equipment, and storage medium.
  • unsupervised learning is usually used to detect fraudulent crowds, and the social relationships of the crowd to be detected are used to determine whether the crowd to be detected has fraudulent behaviors.
  • the embodiments of the present application provide a target group detection method, device, computer equipment and storage medium, which can solve the problem of poor detection of fraudulent crowd due to immature technology, weak dependence on tags, and based on social relationships.
  • the technical scheme is as follows:
  • a target group detection method includes:
  • each feature column in the data to be detected to obtain multiple feature groups each feature column corresponds to at least one feature group, and each feature column includes features of the same feature dimension of different users;
  • the similarity matrix is obtained according to the indicator matrix and the feature association matrix corresponding to the multiple feature columns.
  • the elements in the similarity matrix are used to represent the similarity between users among multiple users, wherein the feature association of each feature column.
  • the elements of the matrix are used to represent the similarity between the feature groups in each feature column, and the elements of the indicator matrix of each feature column are used to represent the feature groups to which the multiple users belong;
  • the method before the obtaining the similarity matrix according to the indicator matrix and the feature correlation matrix corresponding to the multiple feature columns, the method further includes:
  • Each indicator matrix is input into the characteristic correlation function to obtain a corresponding characteristic correlation matrix.
  • the characteristic correlation function is used to obtain the corresponding characteristic correlation matrix according to the elements in the indicator matrix by means of machine learning.
  • the obtaining the similarity matrix according to the indicator matrix and the feature correlation matrix corresponding to the multiple feature columns includes:
  • the detecting based on the multiple feature groups and the multiple user groups to determine the target group among the multiple user groups includes:
  • a target group among the multiple user groups is determined.
  • the creating an edge between nodes that meet the target condition to obtain a graph model includes:
  • a third edge is created between nodes corresponding to the users that meet the third condition, and the weight of the third edge is the similarity between the users to obtain a graph model.
  • the performing feature extraction on the graph model according to the multiple user groups to obtain multiple group feature matrices includes:
  • the determining a target group in the multiple user groups according to the multiple feature vectors includes:
  • the user group when the evaluation value of the user group is greater than the target threshold, the user group is determined to be the target group; when the evaluation value of the user group is not greater than the target threshold, the user group is determined Not the target group.
  • a target group detection device which includes:
  • the grouping module is configured to group each feature column in the data to be detected to obtain a plurality of feature groups, where each feature column corresponds to at least one feature group, and each feature column includes features of the same feature dimension of different users ;
  • the first obtaining module is configured to obtain a similarity matrix according to the indication matrix and the feature correlation matrix corresponding to the multiple feature columns, and the elements in the similarity matrix are used to represent the similarity between users among multiple users, wherein , The elements of the feature association matrix of each feature column are used to represent the similarity between the feature groups in each feature column, and the elements of the indicator matrix of each feature column are used to represent the feature groups to which the multiple users belong ;
  • a clustering module configured to perform clustering according to the similarity matrix to obtain multiple user groups
  • the detection module is configured to perform detection based on the multiple feature groups and the multiple user groups, and determine a target group in the multiple user groups, where the target group is a group with target characteristics.
  • the device further includes:
  • the second obtaining module is configured to obtain the indicator matrix corresponding to each feature column to obtain multiple indicator matrices
  • the input module is configured to input each indicator matrix into the characteristic correlation function to obtain a corresponding characteristic correlation matrix.
  • the characteristic correlation function is used to obtain the corresponding characteristic correlation matrix according to the elements in the indicator matrix by means of machine learning.
  • the first obtaining module is further configured to input the indicator matrix and the feature association matrix corresponding to the multiple feature columns into the similarity calculation function to obtain the similarity matrix, and the similarity The calculation function is used to obtain the similarity between users among the multiple users according to the elements of the indication matrix and the elements of the characteristic correlation matrix.
  • the detection module is further configured to create edges between nodes that meet the target condition according to the multiple feature groups and the multiple users as nodes to obtain a graph model; Perform feature extraction on the graph model according to the multiple user groups to obtain multiple group feature matrices, each user group corresponding to a group feature matrix; according to the multiple group feature matrices, multiple corresponding groups are obtained Feature vector; according to the multiple feature vectors, a target group in the multiple user groups is determined.
  • the in-detection module is further configured to create a first edge between the node corresponding to the feature group meeting the first condition and the node corresponding to the user, and the weight of the first edge Is the affiliation between the user and the feature group; a second edge is created between the nodes corresponding to the feature group that meets the second condition, and the weight of the second edge is the weight between the feature groups Similarity; a third edge is created between nodes corresponding to the users that meet the third condition, and the weight of the third edge is the similarity between the users to obtain a graph model.
  • the detection module is further configured to obtain, for each user group in the plurality of user groups, a group feature map corresponding to each user group, and the group The group feature map is a part of the graph model; feature extraction is performed on each node in the group feature map to obtain a corresponding group feature matrix, and the elements in the group feature matrix are used to represent all Describe the characteristics of the nodes in the group feature graph.
  • it is further configured to obtain an average feature vector according to the multiple feature vectors, where the average feature vector is an average value of the multiple feature vectors; and according to the average feature vector and The feature vector of the group feature matrix corresponding to each user group to obtain the evaluation value of each user group; for each user group, when the evaluation value of the user group is greater than the target threshold, the user group is determined to be the target group Group; when the evaluation value of the user group is not greater than the target threshold, it is determined that the user group is not the target group.
  • a computer device in one aspect, includes one or more processors and one or more memories, and at least one instruction is stored in the one or more memories, and the at least one instruction is controlled by the one or more
  • the processor loads and executes to implement the operations performed by the target group detection method in any of the foregoing possible implementation manners.
  • a computer-readable storage medium stores at least one instruction, and the at least one instruction is loaded and executed by a processor to implement the target group detection method in any one of the above-mentioned possible implementation modes. Action performed.
  • each feature column in the data to be detected multiple feature groups are obtained, and the similarity matrix is obtained according to the indicator matrix and feature correlation matrix corresponding to the multiple feature columns, and clustering is performed according to the similarity matrix to obtain multiple user groups ; Perform detection based on multiple feature groups and multiple user groups to determine a target group in the multiple user groups, and the target group is a group with target characteristics.
  • this application groups all feature dimensions of users, and obtains a similarity matrix containing the similarity between users, which is obtained by clustering the similarity matrix
  • FIG. 1 is a schematic diagram of data to be detected provided by an embodiment of the present application
  • FIG. 2 is a flowchart of a target group detection method provided by an embodiment of the present application
  • FIG. 3 is a flowchart of another target group detection method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a graph model provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a group feature map provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a target group detection device provided by an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a computer device provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the embodiments of the present application can be applied to a scenario where a group with target characteristics is selected from a user.
  • the above-mentioned users may be users who have been screened, or users who have not been screened, or users in a certain area, or users who have certain contacts, and this application does not limit this.
  • the above-mentioned target characteristic may be a certain specific behavior characteristic, such as a fraudulent behavior, or a certain specific attribute characteristic.
  • the above-mentioned data to be detected can be a data table. Each row of the data table is a user’s multiple characteristics.
  • Each column of the data table is a characteristic dimension of the user. Each user has the same characteristic dimension. The characteristic dimension.
  • FIG. 1 is a schematic diagram of data to be detected according to an embodiment of the present application.
  • each row of data represents a user, including 10 users, each user has a unique identifier userid, and three characteristic dimensions of phone, city, and uuid.
  • Fig. 2 is a flowchart of a target group detection method provided by an embodiment of the present application. Referring to Figure 2, this embodiment includes:
  • step 201 the computer device groups each feature column in the data to be detected to obtain multiple feature groups.
  • each feature column corresponds to at least one feature group, and each feature column includes features of the same feature dimension of different users.
  • step 202 the computer device obtains the similarity matrix according to the indication matrix and the feature association matrix corresponding to the multiple feature columns.
  • the elements in the similarity matrix are used to represent the similarity between users among multiple users
  • the elements of the feature association matrix of each feature column are used to represent the similarity between feature groups in each feature column.
  • the elements of the indicator matrix of the feature column are used to indicate feature groups to which multiple users belong.
  • step 203 the computer device performs clustering according to the similarity matrix to obtain multiple user groups.
  • step 204 the computer device performs detection based on multiple feature groups and multiple user groups, and determines a target group in the multiple user groups, where the target group is a group with target characteristics.
  • the method provided by the embodiment of the application obtains multiple feature groups by grouping each feature column in the data to be detected, and obtains the similarity matrix according to the indicator matrix and feature correlation matrix corresponding to the multiple feature columns, and performs processing according to the similarity matrix.
  • Clustering to obtain multiple user groups performing detection based on multiple feature groups and multiple user groups to determine a target group in the multiple user groups, and the target group is a group with target characteristics.
  • this application groups all feature dimensions of users, and obtains a similarity matrix containing the similarity between users, which is obtained by clustering the similarity matrix
  • FIG. 3 is a flowchart of another target group detection method provided by an embodiment of the present application. Referring to Figure 3, this embodiment includes:
  • step 301 the computer device groups each feature column in the data to be detected to obtain multiple feature groups.
  • each feature column corresponds to at least one feature group, and each feature column includes features of the same feature dimension of different users.
  • the above-mentioned data to be detected can be data with N+1 rows and M+1 columns.
  • the first row of the data to be detected can be the header of a table, or a field in the database, or it can be empty.
  • N is a positive integer greater than zero.
  • Each row in the N rows can represent a user or any individual to be detected. In this application, each row represents a user as an example.
  • the data to be detected includes the data of N users data.
  • the leftmost column of the data to be detected is a unique identification column, which is used to distinguish different individuals. When the individual is a user, it is used to distinguish different users.
  • M is a positive integer greater than zero, and each of the M columns starting from the second column may represent a feature dimension, that is, the data to be detected includes M feature dimensions, and each user has the same feature dimension.
  • the column corresponding to each feature dimension can be called a feature column. For any one of the feature columns, all feature data of the same feature dimension of the user is stored in the feature column.
  • the computer device can group each feature column according to preset rules, and each feature column can be divided into at least one feature group, that is, for any one of the feature columns, the feature column can be divided into two One feature group or three feature groups or more feature groups.
  • each feature column may be a process of binning the data of each feature column, each feature group corresponds to a bucket, and each feature column corresponds to at least one bucket. .
  • the data to be detected has 11 rows and 4 columns of data, including 10 users and 3 feature dimensions. Each user has 3 feature dimensions. These 3 feature dimensions They are phone, city, and uuid, which correspond to a feature column.
  • the way for the computer device to bucket each feature column can be as follows: When the computer device buckets the feature column corresponding to the phone dimension, it divides the data starting with 134, 135 and 136 into one bucket, which will start with 170 and 171 Divide the data into one bucket, divide the remaining data into one bucket, and get three buckets.
  • the computer When the computer is designed to divide the feature column corresponding to the city dimension into buckets, it divides Beijing and Tianjin into one bucket, and Chongqing and Chengdu into one bucket to get two buckets.
  • the computer device groups the feature columns corresponding to the uuid dimension, it divides c0**87 into one bucket, divides NULL into one bucket, and divides the remaining F6**32 into one bucket to obtain three buckets. In this way, a total of eight barrels were obtained.
  • step 302 the computer device obtains an indication matrix corresponding to each characteristic column.
  • the indicator matrix corresponding to each feature column can also be said to be the indicator matrix corresponding to each feature dimension.
  • the computer device constructs an indicator matrix corresponding to the feature column according to whether the user's feature is included in the feature group of the feature column, wherein the elements of the indicator matrix corresponding to each feature column are used to represent multiple users The characteristic group it belongs to. One element is used to represent a user.
  • the computer device groups each feature column in the data to be tested in FIG. 1 to obtain eight feature groups.
  • the feature column corresponding to the phone dimension is divided into three feature groups, which are called p1, p2, and p3 for the convenience of description, where p1 includes data starting with 134, 135, and 136, and p2 includes data starting with 170 and 171 , P3 includes data starting with 156, 131, and 130.
  • the characteristics of users with userid 0, 1, 2, and 3 are included in p1
  • the characteristics of users with userid 4, 5, and 6 are included in p2
  • the characteristics of users with userid 7, 8 and 9 are included in p1.
  • the following indicator matrix A phone can be obtained.
  • the first element A phone line indicates characteristics of the first four dimensions of phone users belonging to the group wherein p1, characterized 0,1,2 i.e. userid and dimensions phone 3 belonging to the user group characteristic P1;
  • a of the phone The two-row elements indicate that the phone-dimension features of the middle three users belong to feature group p2, that is, the phone-dimension features of users whose userid is 4, 5, and 6 belong to feature group p2;
  • the third-row element of A phone represents the last three users
  • the features of the phone dimension belong to the feature group p3, that is, the features of the phone dimensions of users whose userid is 7, 8 and 9 belong to the feature group p3.
  • the computer device may obtain the phone with an indication of the matrix corresponding to the dimension A phone Similarly manner, obtaining matrix A indicates the dimension city and city uuid dimensions corresponding indication corresponding matrix A uuid.
  • step 303 the computer device inputs each indicator matrix into the characteristic correlation function to obtain a corresponding characteristic correlation matrix.
  • the characteristic correlation function is used to obtain the corresponding characteristic correlation matrix according to the elements in the indicator matrix by means of machine learning.
  • each feature column corresponds to an indicator matrix
  • each corresponding feature column also corresponds to a feature incidence matrix.
  • the elements of the feature association matrix of each feature column are used to represent the similarity between feature groups in each feature column.
  • the first computer device latitude m feature matrix A m indicate corresponding features into the correlation function f (Q m), where m is greater than zero positive Integer.
  • K is the number of training samples, for the kth sample, and S k are known quantities, Equivalent to the above-mentioned A phone , A city and A uuid , Sk can be a square matrix with a value of 0-1.
  • K training samples means that when the amount of data to be tested is very large, the data to be tested can be divided into K training samples according to rows, and each training sample contains a certain number of rows of data.
  • the number of rows contained in each training sample can be the same, that is, each training sample contains Row data; the number of rows contained in each training sample can also be different.
  • the feature correlation matrix corresponding to each feature column can be set through expert experience, and the computer device obtains the input feature correlation matrix.
  • This method is suitable for the data to be detected with a small number of rows and columns. For example, when the number of rows and columns is not more than 20 or the number of rows and columns is not more than 50, this application does not specifically limit this.
  • the data to be detected in FIG. 1, and in step 302 indicating the acquired matrix A phone, A city and A uuid an example, phone feature dimensions corresponding feature associated with the corresponding feature columns matrix Q phone, Q phone in The element represents the similarity between feature groups, that is, the similarity between p1, p2, and p3.
  • q 12 represents the similarity between p1 and p2
  • q 23 represents the similarity between p2 and p3, and so on.
  • the value of Q phone can be obtained according to expert experience, or can be obtained according to the above-mentioned feature correlation function f(Q m ).
  • a set of values of Q phone are exemplarily given here for convenience of explanation.
  • q 11 represents the self-similarity of p1, set to 0.7
  • q 22 represents the self-similarity of p2, set to 0.7
  • q 33 represents the self-similarity of p3, set to 0.9
  • other similarities are set to:
  • the characteristic correlation matrix Q city and Q uuid can be obtained in the same way.
  • step 304 the computer device obtains the similarity matrix according to the indication matrix and the feature association matrix corresponding to the multiple feature columns.
  • the computer device inputs the indicator matrix and the feature correlation matrix corresponding to the multiple feature columns into the similarity calculation function to obtain the similarity matrix.
  • the similarity calculation function is used to obtain users from multiple users according to the elements of the indicator matrix and the feature correlation matrix.
  • the similarity between That is, the elements in the similarity matrix are used to represent the similarity between users among multiple users.
  • each feature dimension corresponds to an indicator matrix
  • each feature dimension corresponds to a feature correlation matrix
  • the computer device obtains M indicator matrices and M feature associations from the data to be tested Matrix, and M indicator matrices and M feature association matrices have a one-to-one correspondence.
  • the computer device inputs M indicator matrices and M feature correlation matrices into the similarity calculation function.
  • the similarity calculation function may be an optimization problem, and the optimal solution is obtained to obtain the similarity matrix.
  • S represents the similarity matrix to be obtained
  • Q m represents the m-th characteristic incidence matrix among the M characteristic incidence matrices
  • Am represents the m-th indicator matrix among the M indicator matrices.
  • the indication matrices A phone , A city and A uuid obtained in step 302 and the characteristic correlation matrixes Q phone , Q city and Q uuid obtained in step 303 as examples.
  • step 305 the computer device performs clustering according to the similarity matrix to obtain multiple user groups.
  • the computer device clusters the similarity matrix obtained in step 304 based on the clustering algorithm, and groups the users in the data to be detected to obtain multiple user groups.
  • Each user group contains at least one user and the same user Will not belong to different user groups, that is, the user groups do not overlap each other.
  • the number of user groups can be represented by D.
  • the clustering algorithm can be spectral clustering, modularity-based Girvan-Newman community discovery algorithm or Fast Newman community algorithm, etc.
  • the choice of clustering algorithm can be based on actual conditions.
  • the scene is determined by a clustering configuration file, which is used to configure clustering parameters.
  • the user group when the clustered user group contains less than the first user number threshold, the user group can be merged, and the user group with a smaller number of users can be combined to the similarity.
  • a higher user group which can avoid excessive splitting, resulting in users with higher similarity being distributed in different groups.
  • the merged user group is more in line with the actual situation; when the user group obtained by clustering contains
  • the user group when the user is greater than the second user number threshold, the user group can be split, and the user group with a larger number of users can be further split, so that the users can be more finely divided, and the similarity granularity will be finer. Make the final test result more accurate.
  • the first user quantity threshold may be 3, 5, or 8, and the second user quantity threshold may be 15, 25, or 30, etc., which is not specifically limited in this application.
  • step 306 the computer device performs detection based on multiple feature groups and multiple user groups, and determines a target group in the multiple user groups, where the target group is a group with target characteristics.
  • the computer device After the computer device obtains multiple user groups, it can perform detection based on the multiple feature groups obtained in step 301. Correspondingly, this step can be implemented through the following steps (1) to (4).
  • the computer device uses multiple feature groups and multiple users as nodes to create edges between nodes that meet the target condition to obtain a graph model.
  • the computer equipment constructs the graph model, it can regard both the feature group and the user as the nodes of the graph model.
  • a first edge is created between the node corresponding to the feature group that satisfies the first condition and the node corresponding to the user, and the weight of the first edge is the affiliation between the user and the feature group. Since the elements in the indicator matrix are used to represent the subordination relationship between the user's feature and the feature group, the computer device can create the first edge according to the indicator matrix obtained in the foregoing steps. When a certain feature of a user is included in a feature group, a first edge is created between the user and the user group. In a possible implementation manner, the element value of the indicator matrix may be used as the weight of the first side. In another possible implementation manner, the weights of the first side are all set to 1.
  • the first edge between the node corresponding to the feature group and the node corresponding to the user can be created only according to the indicator matrix, and the weight of the first edge can be set according to the importance of the feature dimension, For feature dimensions with higher importance, the weight of the first edge can be set to a larger value, and for feature dimensions with lower importance, the weight of the first edge can be set to a smaller value.
  • the first edge between the nodes corresponding to users with userid 0, 1, 2, and 3 and the feature group p1 can be created, and the nodes corresponding to users with userid 4, 5, and 6 can also be created.
  • the first edge between the feature group p2, and the first edge between the node corresponding to the users whose userid is 7, 8, and 9 and the feature group p3 can also be created.
  • the weight of the first side is set to 1.
  • the weight of the first edge is set to any value such as 0.7, 0.5, or 1.3.
  • a second edge is created between the nodes corresponding to the feature groups that meet the second condition, and the weight of the second edge is the similarity between the feature groups. Since the elements in the feature incidence matrix are used to represent the similarity between feature groups in each feature column, when the computer device creates the second edge, it can be created based on the feature incidence matrix obtained in the above steps. In a possible implementation, when the similarity between the two feature groups is not zero, a second edge is created between the nodes corresponding to the two feature groups. In another possible implementation manner, when the similarity between two feature groups is greater than a preset feature group similarity threshold, a second edge is created between nodes corresponding to the two feature groups.
  • the second edge may be created only between nodes corresponding to feature groups in the same feature dimension. In another possible implementation manner, a second edge may be created between the nodes corresponding to all the obtained feature groups. In the embodiment of the present application, a second edge is created between the corresponding nodes of two feature groups that have a non-zero similarity and belong to the same feature dimension.
  • the second edge between the nodes corresponding to the feature group p1 and the feature group p2 can be created, and the weight of the second edge is 0.2; the feature group p1 and the node corresponding to the feature group p3 can be created The weight of the second edge is 0.5; the second edge between the nodes corresponding to the feature group p2 and the feature group p3 can be created, and the weight of the second edge is 0.2.
  • a third edge is created between nodes corresponding to users who meet the third condition, and the weight of the third edge is the similarity between users. Since the elements in the similarity matrix are similarities between users among multiple users, when the computer device creates the third side, it can be created according to the similarity matrix obtained in the above steps. In a possible implementation, when the similarity between two users is not zero, a third edge is created between the nodes corresponding to the two users. In another possible implementation manner, when the similarity between two users is greater than a preset user similarity threshold, a third edge is created between the nodes corresponding to the two users.
  • the third side among the 10 users contained in the data to be detected as shown in Figure 1 can be created.
  • the similarity between the user with userid 0 and the user with userid 2 is (2.05 ⁇ (1/3)), create a third edge between the user whose userid is 0 and the node corresponding to the user whose userid is 2, the weight of the third edge is (2.05 ⁇ (1/3)); userid is 1
  • the similarity between the user and the user with userid 5 is (0.7 ⁇ (1/3)).
  • the weight of is (0.7 ⁇ (1/3)); the third side between 10 users is created in turn.
  • FIG. 4 is a schematic diagram of a graph model provided by an embodiment of the present application, and the graph model is constructed according to the data to be detected in FIG.
  • the graph model includes 10 user nodes, namely node 0, node 1, node 2, node 3, node 4, node 5, node 6, node 7, node 8, and node 9, and also includes 8 feature group nodes, respectively Node p1, node p2 and node p3 corresponding to the feature group divided by the feature column corresponding to the phone feature dimension, node c1 and node c2 corresponding to the feature group divided by the feature column corresponding to the city feature dimension, and corresponding to the uuid feature dimension The node u1, node u2, and node u3 corresponding to the feature group divided by the feature column.
  • the graph model shown in Figure 4 draws the first edge and the second edge, but does not draw the third edge between the nodes corresponding to the user.
  • first side, second side, and third side are only set for the convenience of description and distinguishing different sides, and there is no order relationship, and the above-mentioned first side, second side and third side
  • the order of creation is not fixed, and any one of them can be created first, which is not specifically limited in this application.
  • the computer device performs feature extraction on the graph model according to multiple user groups to obtain multiple group feature matrices, and each user group corresponds to a group feature matrix.
  • the computer device obtains the group feature map corresponding to each user group from the graph model.
  • the group feature map is a part of the graph model and includes only the corresponding users in the user group.
  • the number of nodes and the nodes corresponding to the feature group can be expressed as T.
  • the computer device After obtaining multiple group feature maps, the computer device performs feature extraction on each node in each group feature map to obtain the corresponding group feature matrix.
  • the elements in the group feature matrix are the nodes in the group feature map. feature.
  • extract the Egonet (Egocentric Network, self-centered) feature of each node.
  • the Egonet feature of each node includes: the number of neighbor nodes of the node, and the weight of the associated edge of the node. Sum and the sum of the number of triangles with this node as the vertex, etc.
  • the number of extracted features can be expressed as E, and E is a positive integer greater than zero, for example, it can be 3, 5, or 8, which is not specifically limited in this application.
  • a T ⁇ E group feature matrix can be obtained, thereby obtaining multiple group feature matrices.
  • a user group includes three users whose userid is 0, 4, and 7, and the group feature map corresponding to the user group is obtained from the graph model shown in FIG. 4, as shown in FIG. 5, which is an embodiment of this application
  • a schematic diagram of a group feature graph provided includes node 0, node 4, node 7, node p1, node p2, node p3, node c1, node c2, node u1, node u2, and node u3.
  • the Egonet features extracted by node 0 are: 5 neighbor nodes, the sum of weights (4.71 ⁇ (1/3)), and two triangles.
  • An 11 ⁇ 3 group feature matrix is obtained.
  • the computer equipment obtains multiple corresponding feature vectors according to multiple group feature matrices.
  • the computer device calculates the feature vector of each group feature matrix to obtain multiple feature vectors.
  • the group feature matrix can be decomposed by the principle of SVD (Singular Value Decomposition, singular value decomposition) to obtain the main feature vector of the group feature matrix, and use the main feature vector as the group
  • SVD Single Value Decomposition, singular value decomposition
  • the feature vector of the group feature matrix where the feature vector may be a column vector, and the dimension of the column vector is (E ⁇ 1).
  • the computer device determines the target group among the multiple user groups according to the multiple feature vectors, and the target group is a group with target characteristics.
  • this step can be implemented through the following steps (4-1) to (4-3).
  • the computer device can calculate the average value of multiple feature vectors based on multiple feature vectors, thereby obtaining the average feature vector, where the average feature vector is the average of multiple feature vectors, and the average feature vector can represent Is Vavg .
  • the computer device can obtain the evaluation value of each user group according to the average feature vector and the feature vector of the group feature matrix corresponding to each user group.
  • the computer device After obtaining the average feature vector and the feature vector of the group feature matrix, the computer device obtains the evaluation value Z of each user group.
  • the eigenvector of the corresponding group feature matrix can be expressed as V d
  • the evaluation value Z d of the dth user group can be expressed as:
  • the computer device determines that the user group is the target group, and the target group is a group with target characteristics; when the evaluation value of the user group is not When it is greater than the target threshold, it is determined that the user group is not the target group.
  • target threshold may be a value between 0 and 2
  • target threshold may be set according to actual application scenarios, which is not specifically limited in this application.
  • the similarity matrix is obtained according to the indicator matrix and the feature correlation matrix corresponding to the multiple feature columns, and the similarity matrix is collected.
  • Class multiple user groups are obtained; detection is performed based on multiple feature groups and multiple user groups, and a target group in the multiple user groups is determined, and the target group is a group with target characteristics.
  • this application groups all feature dimensions of users, and obtains a similarity matrix containing the similarity between users, which is obtained by clustering the similarity matrix
  • FIG. 6 is a schematic diagram of a target group detection device provided by an embodiment of the present application.
  • the device includes: a grouping module 601, a first obtaining module 602, a clustering module 603, and a detection module 604.
  • the grouping module 601 is configured to group each feature column in the data to be detected to obtain multiple feature groups, each feature column corresponds to at least one feature group, and each feature column includes features of the same feature dimension of different users;
  • the first obtaining module 602 is configured to obtain a similarity matrix according to the indication matrix and the feature association matrix corresponding to the multiple feature columns, and the elements in the similarity matrix are used to represent the similarity between users among multiple users, wherein, The elements of the feature association matrix of each feature column are used to represent the similarity between the feature groups in each feature column, and the elements of the indicator matrix of each feature column are used to represent the feature groups to which multiple users belong;
  • the clustering module 603 is configured to perform clustering according to the similarity matrix to obtain multiple user groups;
  • the detection module 604 is configured to perform detection based on multiple feature groups and multiple user groups, and determine a target group in the multiple user groups, where the target group is a group with target characteristics.
  • the device further includes:
  • the second obtaining module is configured to obtain the indicator matrix corresponding to each feature column to obtain multiple indicator matrices
  • the input module is configured to input each indicator matrix into the characteristic correlation function to obtain a corresponding characteristic correlation matrix.
  • the characteristic correlation function is used to obtain the corresponding characteristic correlation matrix according to the elements in the indicator matrix by means of machine learning.
  • the first obtaining module 602 is further configured to input the indicator matrix and the feature association matrix corresponding to the multiple feature columns into the similarity calculation function to obtain the similarity matrix.
  • the similarity calculation function uses Based on the elements of the indicator matrix and the elements of the characteristic correlation matrix, the similarity between users among multiple users is obtained.
  • the detection module 604 is also configured to create edges between nodes that meet the target condition based on multiple feature groups and multiple users as nodes to obtain a graph model; according to multiple user groups Perform feature extraction on the graph model to obtain multiple group feature matrices, each user group corresponds to a group feature matrix; according to multiple group feature matrices, multiple corresponding feature vectors are obtained; according to multiple feature vectors, multiple Target group in user groups.
  • the detection module 604 is further configured to create a first edge between the node corresponding to the feature group that meets the first condition and the node corresponding to the user, and the weight of the first edge is the user and The subordination between feature groups; create a second edge between the nodes corresponding to the feature group that meets the second condition, and the weight of the second edge is the similarity between the feature groups; in the node corresponding to the user that meets the third condition Create a third edge between users, and the weight of the third edge is the similarity between users to obtain a graph model.
  • the detection module 604 is further configured to obtain a group feature map corresponding to each user group for each user group in the multiple user groups, and the group feature map is a graph model Part: Perform feature extraction on each node in each group feature map to obtain the corresponding group feature matrix.
  • the elements in the group feature matrix are used to represent the features of the nodes in the group feature map.
  • it is also configured to obtain an average feature vector based on multiple feature vectors, where the average feature vector is the average of multiple feature vectors; according to the average feature vector and the group corresponding to each user group The feature vector of the feature matrix to obtain the evaluation value of each user group; for each user group, when the evaluation value of the user group is greater than the target threshold, the user group is determined as the target group; when the evaluation value of the user group is not greater than the target threshold When the user group is not the target group.
  • the similarity matrix is obtained according to the indicator matrix and the feature correlation matrix corresponding to the multiple feature columns, and the similarity matrix is collected.
  • Class multiple user groups are obtained; detection is performed based on multiple feature groups and multiple user groups, and a target group in the multiple user groups is determined, and the target group is a group with target characteristics.
  • this application groups all feature dimensions of users, and obtains a similarity matrix containing the similarity between users, which is obtained by clustering the similarity matrix
  • the target group detection device provided in the above embodiment detects the target group
  • only the division of the above-mentioned functional modules is used as an example.
  • the above-mentioned functions can be assigned to different functions according to needs.
  • Module completion means dividing the internal structure of the computer equipment into different functional modules to complete all or part of the functions described above.
  • the target group detection device provided in the foregoing embodiment and the target group detection method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
  • FIG. 7 is a structural block diagram of a computer device 700 provided by an embodiment of the present application.
  • the computer equipment 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, a moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, a moving picture expert compression standard Audio level 4) Player, laptop or desktop computer.
  • the computer device 700 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and other names.
  • the computer device 700 includes a processor 701 and a memory 702.
  • the processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 701 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve.
  • the processor 701 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 701 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen.
  • the processor 701 may also include an AI (Artificial Intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 702 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 702 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 702 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 701 to realize the target group provided in the method embodiment of the present application. Group detection method.
  • the computer device 700 may optionally further include: a peripheral device interface 703 and at least one peripheral device.
  • the processor 701, the memory 702, and the peripheral device interface 703 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 703 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 704, a touch display screen 705, a camera 706, an audio circuit 707, a positioning component 708, and a power supply 709.
  • the peripheral device interface 703 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 701 and the memory 702.
  • the processor 701, the memory 702, and the peripheral device interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 701, the memory 702, and the peripheral device interface 703 or The two can be implemented on separate chips or circuit boards, which are not limited in this embodiment.
  • the radio frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 704 communicates with a communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 704 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so on.
  • the radio frequency circuit 704 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: metropolitan area network, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network.
  • the display screen 705 is used to display UI (User Interface, user interface).
  • the UI can include graphics, text, icons, videos, and any combination thereof.
  • the display screen 705 also has the ability to collect touch signals on or above the surface of the display screen 705.
  • the touch signal may be input to the processor 701 as a control signal for processing.
  • the display screen 705 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 705 may be one display screen 705, which is provided with the front panel of the computer device 700; in other embodiments, there may be at least two display screens 705, which are respectively set on different surfaces of the computer device 700 or in a folded design.
  • the display screen 705 may be a flexible display screen, which is arranged on the curved surface or the folding surface of the computer device 700.
  • the display screen 705 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 705 may be made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).
  • the camera assembly 706 is used to capture images or videos.
  • the camera assembly 706 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the computer device, and the rear camera is set on the back of the computer device.
  • the camera assembly 706 may also include a flash.
  • the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • the audio circuit 707 may include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 701 for processing, or input to the radio frequency circuit 704 to implement voice communication.
  • the microphone can also be an array microphone or an omnidirectional acquisition microphone.
  • the speaker is used to convert the electrical signal from the processor 701 or the radio frequency circuit 704 into sound waves.
  • the speaker can be a traditional membrane speaker or a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for purposes such as distance measurement.
  • the audio circuit 707 may also include a headphone jack.
  • the positioning component 708 is used to locate the current geographic location of the computer device 700 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 708 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Granus system of Russia, or the Galileo system of the European Union.
  • the power supply 709 is used to supply power to various components in the computer device 700.
  • the power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries.
  • the rechargeable battery may support wired charging or wireless charging.
  • the rechargeable battery can also be used to support fast charging technology.
  • the computer device 700 further includes one or more sensors 710.
  • the one or more sensors 710 include, but are not limited to, an acceleration sensor 711, a gyroscope sensor 712, a pressure sensor 713, a fingerprint sensor 714, an optical sensor 715, and a proximity sensor 716.
  • the acceleration sensor 711 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the computer device 700.
  • the acceleration sensor 711 may be used to detect the components of the gravitational acceleration on three coordinate axes.
  • the processor 701 may control the touch screen 705 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 711.
  • the acceleration sensor 711 may also be used for the collection of game or user motion data.
  • the gyroscope sensor 712 can detect the body direction and rotation angle of the computer device 700, and the gyroscope sensor 712 can cooperate with the acceleration sensor 711 to collect the user's 3D actions on the computer device 700.
  • the processor 701 can implement the following functions according to the data collected by the gyroscope sensor 712: motion sensing (for example, changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 713 may be disposed on the side frame of the computer device 700 and/or the lower layer of the touch screen 705.
  • the processor 701 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 713.
  • the processor 701 controls the operability controls on the UI interface according to the user's pressure operation on the touch display screen 705.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 714 is used to collect the user's fingerprint.
  • the processor 701 can identify the user's identity based on the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 can identify the user's identity based on the collected fingerprint. When it is recognized that the user's identity is a trusted identity, the processor 701 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 714 may be provided on the front, back or side of the computer device 700. When the computer device 700 is provided with a physical button or a manufacturer logo, the fingerprint sensor 714 can be integrated with the physical button or the manufacturer logo.
  • the optical sensor 715 is used to collect the ambient light intensity.
  • the processor 701 may control the display brightness of the touch screen 705 according to the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 705 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 705 is decreased.
  • the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 706 according to the ambient light intensity collected by the optical sensor 715.
  • the proximity sensor 716 also called a distance sensor, is usually arranged on the front panel of the computer device 700.
  • the proximity sensor 716 is used to collect the distance between the user and the front of the computer device 700.
  • the processor 701 controls the touch screen 705 to switch from the on-screen state to the off-screen state; when the proximity sensor 716 When it is detected that the distance between the user and the front of the computer device 700 is gradually increasing, the processor 701 controls the touch display screen 705 to switch from the on-screen state to the on-screen state.
  • FIG. 7 does not constitute a limitation on the computer device 700, and may include more or fewer components than shown in the figure, or combine certain components, or adopt different component arrangements.
  • FIG. 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 800 may have relatively large differences due to different configurations or performance, and may include one or more processors (Central Processing Units, CPU) 801 And one or more memories 802, where at least one instruction is stored in the memory 802, and the at least one instruction is loaded and executed by the processor 801 to implement the methods provided in the foregoing method embodiments.
  • the computer device may also have components such as a wired or wireless network interface, a keyboard, an input and output interface for input and output, and the computer device may also include other components for implementing device functions, which will not be repeated here.
  • a computer-readable storage medium such as a memory including instructions, which may be executed by a processor in a computer device to complete the target group detection method in the foregoing embodiment.
  • the computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种目标群组检测方法、装置、计算机设备及存储介质,属于网络安全技术领域。所述方法包括:对待检测数据中每个特征列进行分组,得到多个特征组,根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,根据所述相似度矩阵进行聚类,得到多个用户组,根据所述多个特征组和所述多个用户组进行检测,确定所述多个用户组中的目标群组,所述目标群组为具有目标特性的群组。相较于仅依赖社交关系以及标签来对用户进行分组,对用户的各个特征维度都进行了分组,获取了包含用户之间的相似度的相似度矩阵,通过对相似度矩阵聚类得到的多个用户组进行检测,从而确定出具有目标特性的群组,准确率较高,检测效果好。

Description

目标群组检测方法、装置、计算机设备及存储介质
本申请要求于2019年5月5日提交的申请号为201910367835.3、发明名称为“目标群组检测方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及网络安全技术领域,特别涉及一种目标群组检测方法、装置、计算机设备及存储介质。
背景技术
随着互联网技术的快速发展,互联网已经与人们的生活息息相关,然而互联网在给人们的生活带来了极大便利的同时,也给不法分子带来了可乘之机。例如,互联网欺诈案例越来越多,且往往是团伙作案,由于互联网的特性,实施诈骗犯罪的团伙往往使用高科技手段掩饰团伙成员之间的关系,且具有案件类型复杂多变、技术手段更新迭代快、成团成规模、数据量大等特征,导致反欺诈工作困难重重。
现阶段通常采用基于无监督学习的方式来进行欺诈人群的检测,通过待检测人群的社交关系来判定待检测人群是否有欺诈行为。
然而上述技术存在的问题是,由于技术不成熟,对标签具有弱依赖,且依据的是社交关系,导致欺诈人群检测效果不好。
发明内容
本申请实施例提供了一种目标群组检测方法、装置、计算机设备及存储介质,能够解决由于技术不成熟,标签具有弱依赖,且依据的是社交关系,导致欺诈人群检测效果不好的问题。该技术方案如下:
一方面,提供了一种目标群组检测方法,该方法包括:
对待检测数据中每个特征列进行分组,得到多个特征组,所述每个特征列对应至少一个特征组,所述每个特征列包括不同用户的同一特征维度的特征;
根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,所述相似度矩阵中的元素用于表示多个用户中用户之间的相似度,其中,每个特征列的特征关联矩阵的元素用于表示所述每个特征列中特征组之间的相似度,每个特征列的指示矩阵的元素用于表示所述多个用户所属的特征组;
根据所述相似度矩阵进行聚类,得到多个用户组;
根据所述多个特征组和所述多个用户组进行检测,确定所述多个用户组中的目标群组,所述目标群组为具有目标特性的群组。
在一种可能的实现方式中,所述根据多个特征列对应的指示矩阵与特征关 联矩阵,获取相似度矩阵之前,所述方法还包括:
获取所述每个特征列对应的指示矩阵,得到多个指示矩阵;
将每个指示矩阵输入特征关联函数中,得到对应的特征关联矩阵,所述特征关联函数用于通过机器学习的方式根据指示矩阵中的元素获取对应的特征关联矩阵。
在另一种可能的实现方式中,所述根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,包括:
将多个特征列对应的指示矩阵和特征关联矩阵输入相似度计算函数中,得到相似度矩阵,所述相似度计算函数用于根据所述指示矩阵的元素和所述特征关联矩阵的元素获取所述多个用户中用户之间的相似度。
在另一种可能的实现方式中,所述根据所述多个特征组和所述多个用户组进行检测,确定所述多个用户组中的目标群组,包括:
根据所述多个特征组和所述多个用户作为节点,在满足目标条件的节点之间创建边,得到图模型;
根据所述多个用户组对所述图模型进行特征提取,得到多个群组特征矩阵,每个用户组对应一个群组特征矩阵;
根据所述多个群组特征矩阵,得到对应的多个特征向量;
根据所述多个特征向量,确定所述多个用户组中的目标群组。
在另一种可能的实现方式中,所述在满足目标条件的节点之间创建边,得到图模型,包括:
在满足第一条件的特征组对应的节点和用户对应的节点之间创建第一边,所述第一边的权重为所述用户和所述特征组之间的从属关系;
在满足第二条件的所述特征组对应的节点之间创建第二边,所述第二边的权重为所述特征组之间的相似度;
在满足第三条件的所述用户对应的节点之间创建第三边,所述第三边的权重为所述用户之间的相似度,得到图模型。
在另一种可能的实现方式中,所述根据所述多个用户组对所述图模型进行特征提取,得到多个群组特征矩阵,包括:
对于所述多个用户组中的每个用户组,获取所述每个用户组对应的群组特征图,所述群组特征图为所述图模型的一部分;
对所述每个群组特征图中的每个节点进行特征提取,得到对应的群组特征矩阵,所述群组特征矩阵中的元素用于表示所述群组特征图中节点的特征。
在另一种可能的实现方式中,所述根据所述多个特征向量,确定所述多个用户组中的目标群组,包括:
根据所述多个特征向量,获取平均特征向量,所述平均特征向量为所述多个特征向量的平均值;
根据所述平均特征向量和每个用户组对应的群组特征矩阵的特征向量,获取每个用户组的评估值;
对于每个用户组,当所述用户组的评估值大于目标阈值时,判定所述用户 组为目标群组;当所述用户组的评估值不大于所述目标阈值时,判定所述用户组不是目标群组。
一方面,提供了一种目标群组检测装置,该装置包括:
分组模块,被配置为对待检测数据中每个特征列进行分组,得到多个特征组,所述每个特征列对应至少一个特征组,所述每个特征列包括不同用户的同一特征维度的特征;
第一获取模块,被配置为根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,所述相似度矩阵中的元素用于表示多个用户中用户之间的相似度,其中,每个特征列的特征关联矩阵的元素用于表示所述每个特征列中特征组之间的相似度,每个特征列的指示矩阵的元素用于表示所述多个用户所属的特征组;
聚类模块,被配置为根据所述相似度矩阵进行聚类,得到多个用户组;
检测模块,被配置为根据所述多个特征组和所述多个用户组进行检测,确定所述多个用户组中的目标群组,所述目标群组为具有目标特性的群组。
在一种可能的实现方式中,所述装置还包括:
第二获取模块,被配置为获取所述每个特征列对应的指示矩阵,得到多个指示矩阵;
输入模块,被配置为将每个指示矩阵输入特征关联函数中,得到对应的特征关联矩阵,所述特征关联函数用于通过机器学习的方式根据指示矩阵中的元素获取对应的特征关联矩阵。
在另一种可能的实现方式中,所述第一获取模块,还被配置为将多个特征列对应的指示矩阵和特征关联矩阵输入相似度计算函数中,得到相似度矩阵,所述相似度计算函数用于根据所述指示矩阵的元素和所述特征关联矩阵的元素获取所述多个用户中用户之间的相似度。
在另一种可能的实现方式中,所述检测模块,还被配置为根据所述多个特征组和所述多个用户作为节点,在满足目标条件的节点之间创建边,得到图模型;根据所述多个用户组对所述图模型进行特征提取,得到多个群组特征矩阵,每个用户组对应一个群组特征矩阵;根据所述多个群组特征矩阵,得到对应的多个特征向量;根据所述多个特征向量,确定所述多个用户组中的目标群组。
在另一种可能的实现方式中,所述在检测模块,还被配置为在满足第一条件的特征组对应的节点和用户对应的节点之间创建第一边,所述第一边的权重为所述用户和所述特征组之间的从属关系;在满足第二条件的所述特征组对应的节点之间创建第二边,所述第二边的权重为所述特征组之间的相似度;在满足第三条件的所述用户对应的节点之间创建第三边,所述第三边的权重为所述用户之间的相似度,得到图模型。
在另一种可能的实现方式中,所述检测模块,还被配置为对于所述多个用户组中的每个用户组,获取所述每个用户组对应的群组特征图,所述群组特征图为所述图模型的一部分;对所述每个群组特征图中的每个节点进行特征提取, 得到对应的群组特征矩阵,所述群组特征矩阵中的元素用于表示所述群组特征图中节点的特征。
在另一种可能的实现方式中,还被配置为根据所述多个特征向量,获取平均特征向量,所述平均特征向量为所述多个特征向量的平均值;根据所述平均特征向量和每个用户组对应的群组特征矩阵的特征向量,获取每个用户组的评估值;对于每个用户组,当所述用户组的评估值大于目标阈值时,判定所述用户组为目标群组;当所述用户组的评估值不大于所述目标阈值时,判定所述用户组不是目标群组。
一方面,提供了一种计算机设备,该计算机设备包括一个或多个处理器和一个或多个存储器,该一个或多个存储器中存储有至少一条指令,该至少一条指令由该一个或多个处理器加载并执行以实现如上述任一种可能实现方式的目标群组检测方法所执行的操作。
一方面,提供了一种计算机可读存储介质,该存储介质中存储有至少一条指令,该至少一条指令由处理器加载并执行以实现如上述任一种可能实现方式的目标群组检测方法所执行的操作。
本申请实施例提供的技术方案带来的有益效果至少包括:
通过对待检测数据中每个特征列进行分组,得到多个特征组,根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,根据相似度矩阵进行聚类,得到多个用户组;根据多个特征组和多个用户组进行检测,确定多个用户组中的目标群组,目标群组为具有目标特性的群组。相较于仅依赖社交关系以及标签来对用户进行分组,本申请对用户的各个特征维度都进行了分组,获取了包含用户之间的相似度的相似度矩阵,通过对相似度矩阵聚类得到的多个用户组进行检测,从而确定出具有目标特性的群组,准确率较高,检测效果好。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的待检测数据的示意图;
图2是本申请实施例提供的一种目标群组检测方法的流程图;
图3是本申请实施例提供的另一种目标群组检测方法的流程图;
图4是本申请实施例提供的一种图模型的示意图;
图5是本申请实施例提供的一种群组特征图的示意图;
图6是本申请实施例提供的一种目标群组检测装置的示意图;
图7是本申请实施例提供的一种计算机设备的结构框图;
图8是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请实施例可以应用于在用户中选择出具有目标特性的群组的场景。上述用户可以是经过筛选的用户,也可以是未经过筛选的用户,还可以是某个区域的用户,或者是具有一定联系的用户,本申请对此不进行限制。上述目标特性,可以为具有某种特定行为特性,如欺诈行为,也可以为具有某种特定属性特性等。首先获取上述用户的待检测数据,上述待检测数据可以为一个数据表格,该数据表格的每行为一个用户的多个特征,该数据表格的每列为用户的一个特征维度,每个用户具有相同的特征维度。
例如,图1是本申请实施例提供的一种待检测数据的示意图。参见图1所示,每行数据代表一个用户,包括10个用户,每个用户具有唯一标识userid,以及phone、city、uuid三个特征维度。
图2是本申请实施例提供的一种目标群组检测方法的流程图。参见图2,该实施例包括:
在步骤201中,计算机设备对待检测数据中每个特征列进行分组,得到多个特征组。
其中,每个特征列对应至少一个特征组,每个特征列包括不同用户的同一特征维度的特征。
在步骤202中,计算机设备根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵。
其中,相似度矩阵中的元素用于表示多个用户中用户之间的相似度,每个特征列的特征关联矩阵的元素用于表示每个特征列中特征组之间的相似度,每个特征列的指示矩阵的元素用于表示多个用户所属的特征组。
在步骤203中,计算机设备根据相似度矩阵进行聚类,得到多个用户组。
在步骤204中,计算机设备根据多个特征组和多个用户组进行检测,确定多个用户组中的目标群组,目标群组为具有目标特性的群组。
本申请实施例提供的方法,通过对待检测数据中每个特征列进行分组,得到多个特征组,根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,根据相似度矩阵进行聚类,得到多个用户组;根据多个特征组和多个用户组进行检测,确定多个用户组中的目标群组,目标群组为具有目标特性的群组。相较于仅依赖社交关系以及标签来对用户进行分组,本申请对用户的各个特征维度都进行了分组,获取了包含用户之间的相似度的相似度矩阵,通过对相似度矩阵聚类得到的多个用户组进行检测,从而确定出具有目标特性的群组,准确率较高,检测效果好。
图3是本申请实施例提供的另一种目标群组检测方法的流程图。参见图3,该实施例包括:
在步骤301中,计算机设备对待检测数据中每个特征列进行分组,得到多个特征组。
其中,每个特征列对应至少一个特征组,每个特征列包括不同用户的同一特征维度的特征。
上述待检测数据可以是具有N+1行M+1列的数据,其中,待检测数据的第一行为属性行,可以是表格的表头,也可以是数据库的字段,还可以为空,本申请对此不进行具体限制。N为大于零的正整数,N行中的每一行可以代表一个用户或者任何一个待检测的个体,本申请以每行代表一个用户为例进行说明,则该待检测数据中包括N个用户的数据。待检测数据的最左侧一列为唯一标识列,用于区分不同的个体,当个体为用户时,用于区分不同的用户。M为大于零的正整数,从第二列开始的M列中的每一列可以代表一个特征维度,即该待检测数据中包括M个特征维度,且每个用户都具有相同的特征维度。每个特征维度所对应的列可以称为特征列,对于其中任意一个特征列,该特征列中保存的都是用户同一特征维度的特征数据。
步骤301在实现时,计算机设备可以根据预设的规则对每个特征列进行分组,每个特征列可以被分成至少一个特征组,即对于其中任意一个特征列,该特征列可以被分为两个特征组或者三个特征组或者更多特征组。
在一种可能的实现方式中,上述对每个特征列进行分组的过程可以是对每个特征列的数据进行分桶的过程,每个特征组对应一个桶,每个特征列对应至少一个桶。
例如,以图1中的待检测数据为例,该待检测数据具有11行4列的数据,包括10个用户和3个特征维度,每个用户都有3个特征维度,这3个特征维度分别为phone、city和uuid,分别对应一个特征列。计算机设备对每个特征列进行分桶的方式可以如下:计算机设备对phone维度对应的特征列进行分桶时,将以134、135和136开头的数据划分为一个桶,将以170和171开头的数据划分为一个桶,将剩余的数据划分为一个桶,得到三个桶。计算机设别对city维度对应的特征列进行分桶时,将北京和天津划分为一个桶,将重庆和成都划分为一个桶,得到两个桶。计算机设备对uuid维度对应的特征列进行分组时,将c0**87划分为一个桶,将NULL划分为一个桶,剩余的F6**32划分为一个桶,得到三个桶。这样一共得到了八个桶。
在步骤302中,计算机设备获取每个特征列对应的指示矩阵。
上述每个特征列对应的指示矩阵也可以说是每个特征维度对应的指示矩阵。对于每个特征列,计算机设备根据用户的特征是否包含在该特征列的特征组里来构造该特征列对应的指示矩阵,其中,每个特征列对应的指示矩阵的元素用于表示多个用户所属的特征组,一个元素用于表示一个用户。
例如,在步骤301中,计算机设备对图1中的待测试数据中每个特征列进 行分组,得到八个特征组。其中,phone维度对应的特征列,被分为三个特征组,为了便于说明称为p1、p2和p3,其中p1包括以134、135和136开头的数据,p2包括以170和171开头的数据,p3包括以156、131和130开头的数据。userid为0、1、2和3的用户具有的特征包含在p1中,userid为4、5和6的用户具有的特征包含在p2中,userid为7、8和9的用户具有的特征包含在p3中。由此可以获取以下指示矩阵A phone
Figure PCTCN2019118114-appb-000001
其中,A phone的第一行元素表示前四个用户的phone维度的特征属于特征组p1,即userid为0、1、2和3的用户的phone维度的特征属于特征组p1;A phone的第二行元素表示中间三个用户的phone维度的特征属于特征组p2,即userid为4、5和6的用户的phone维度的特征属于特征组p2;A phone的第三行元素表示最后三个用户phone维度的特征属于特征组p3,即userid为7、8和9的用户的phone维度的特征属于特征组p3。
相应的,计算机设备可以通过与获取phone维度对应的指示矩阵A phone同理的方式,获取city维度对应的指示矩阵A city和uuid维度对应的指示矩阵A uuid
Figure PCTCN2019118114-appb-000002
Figure PCTCN2019118114-appb-000003
在步骤303中,计算机设备将每个指示矩阵输入到特征关联函数中,得到对应的特征关联矩阵,该特征关联函数用于通过机器学习的方式根据指示矩阵中的元素获取对应的特征关联矩阵。
由于每个特征列对应一个指示矩阵,相应的每个特征列也对应一个特征关联矩阵。其中,每个特征列的特征关联矩阵的元素用于表示每个特征列中特征组之间的相似度。
对于M个特征纬度中的第m个特征纬度对应的特征列,计算机设备将第m个特征纬度对应的指示矩阵A m带入特征关联函数f(Q m)中,其中m为大于零的正整数。
Figure PCTCN2019118114-appb-000004
其中,K为训练样本的个数,对于第k个样本,
Figure PCTCN2019118114-appb-000005
和S k为已知量,
Figure PCTCN2019118114-appb-000006
相当于上述的A phone、A city和A uuid,S k可以为取值为0-1的方阵。
需要说明的是,上述K个训练样本是指,当待检测数据的数据量非常大时,可以将待检测数据按照行划分为K个训练样本,每个训练样本包含一定数量行的数据,每个训练样本包含的行数可以相同,即每个训练样本包含
Figure PCTCN2019118114-appb-000007
行的数据;每个训练样本包含的行数也可以不同。
在一种可能的实现方式中,可以通过专家经验来设置每个特征列对应的特征关联矩阵,计算机设备获取输入的特征关联矩阵,该方式适用于待检测数据的行和列的数量较少的情况,例如行和列的数量均不大于20或者行和列的数量均不大于50时等,本申请对此不进行具体限制。
例如,以图1中的待检测数据,以及步骤302中获取的指示矩阵A phone、A city和A uuid为例,phone特征维度对应的特征列对应的特征关联矩阵为Q phone,Q phone中的元素表示特征组之间的相似度,即p1、p2和p3之间的相似度。
Figure PCTCN2019118114-appb-000008
其中,q 12表示p1和p2之间的相似度,q 23表示p2和p3之间的相似度,以此类推。Q phone的取值可以根据专家经验获得,也可以根据上述特征关联函数f(Q m)获得,这里示例性的给出了一组Q phone的取值,仅是为了便于说明。q 11代表p1的自相似度,设为0.7,q 22代表p2的自相似度,设为0.7,q 33代表p3的自相似度,设为0.9,其他相似度设置为:q 12=q 21=0.2,q 13=q 31=0.5,q 23=q 32=0.2。由此可获取以下特征关联矩阵Q phone
Figure PCTCN2019118114-appb-000009
相应的,可以通过同样的方式,获取特征关联矩阵Q city和Q uuid
Figure PCTCN2019118114-appb-000010
Figure PCTCN2019118114-appb-000011
在步骤304中,计算机设备根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵。
计算机设备将多个特征列对应的指示矩阵和特征关联矩阵输入相似度计算函数中,得到相似度矩阵,相似度计算函数用于根据指示矩阵的元素和特征关 联矩阵的元素获取多个用户中用户之间的相似度。即,相似度矩阵中的元素用于表示多个用户中用户之间的相似度。
对于待检测数据中包含的M个特征维度,每个特征维度对应一个指示矩阵,每个特征维度对应一个特征关联矩阵,即计算机设备从待检测数据中获取到了M个指示矩阵以及M个特征关联矩阵,且M个指示矩阵和M个特征关联矩阵一一对应。计算机设备将M个指示矩阵和M个特征关联矩阵,输入相似度计算函数中,该相似度计算函数可以为优化问题,求取其中的最优解,得到相似度矩阵。
相似度计算函数:
Figure PCTCN2019118114-appb-000012
该相似度计算函数的最优解为:
Figure PCTCN2019118114-appb-000013
其中,S表示待求取的相似度矩阵,Q m表示M个特征关联矩阵中的第m个特征关联矩阵,A m表示M个指示矩阵中的第m个指示矩阵。
例如,以图1中的待检测数据,步骤302中获取到的指示矩阵A phone、A city和A uuid以及步骤303中获取到的特征关联矩阵Q phone、Q city和Q uuid为例。将上述三个指示矩阵和三个特征关联矩阵输入相似度计算函数中,得到相似度矩阵S。
Figure PCTCN2019118114-appb-000014
在步骤305中,计算机设备根据相似度矩阵进行聚类,得到多个用户组。
计算机设备基于聚类算法对上述步骤304中获取到的相似度矩阵进行聚类,对待检测数据中的用户进行分组,得到多个用户组,每个用户组中包含至少一个用户,且同一个用户不会属于不同的用户组,即用户组之间不相互重叠。用户组的个数可以用D来表示。
需要说明的是,聚类算法可以为谱聚类、基于模块度的Girvan-Newman(格莱文-纽曼)社团发现算法或者Fast Newman(快速纽曼)社区算法等,聚类算 法的选择可以根据实际场景由聚类配置文件进行确定,该聚类配置文件用于配置聚类参数。
在一种可能的实现方式中,当聚类得到的用户组中包含的用户少于第一用户数量阈值时,可以对用户组进行合并操作,将用户数量较少的用户组合并到与其相似性较高的用户组去,这样可以避免由于过度拆分,导致相似性较高的用户分布在不同的群组中,合并后的用户组更符合实际情况;当聚类得到的用户组中包含的用户大于第二用户数量阈值时,可以对用户组进行拆分操作,将用户数量较多的用户组进行进一步的拆分,这样可以对用户进行更细致的划分,相似度粒度会更精细些,使得最终的检测结果更加准确。其中,第一用户数量阈值可以为3、5或者8等,第二用户数量阈值可以为15、25或者30等,本申请对此不进行具体限制。
在步骤306中,计算机设备根据多个特征组和多个用户组进行检测,确定多个用户组中的目标群组,目标群组为具有目标特性的群组。
计算机设备获取到多个用户组后,可以基于步骤301中获取的多个特征组,进行检测。相应的,本步骤可以通过以下步骤(1)至(4)来实现。
(1)计算机设备根据多个特征组和多个用户作为节点,在满足目标条件的节点之间创建边,得到图模型。
首先,计算机设备在构造图模型时,可以将特征组和用户均作为图模型的节点。
然后,在满足第一条件的特征组对应的节点和用户对应的节点之间创建第一边,第一边的权重为用户和特征组之间的从属关系。由于指示矩阵中的元素用于表示用户的特征和特征组之间的从属关系,因此计算机设备创建第一边时可以根据上述步骤中获得的指示矩阵来创建。当一个用户的某个特征包含在一个特征组中时,在该用户和该用户组之间创建第一边。在一种可能的实现方式中,可以将指示矩阵的元素值作为第一边的权重。在另一种可能的实现方式中,将第一边的权重均设置为1。在另一种可能的实现方式中,可以仅根据指示矩阵来创建特征组对应的节点与用户对应的节点之间的第一边,第一边的权重可以根据特征维度的重要性进行设定,对于重要性较高的特征维度,第一边的权重可以设置为较大的值,对于重要性较低的特征维度,第一边的权重可以设置为较小的值。
例如,根据指示矩阵A phone可以创建userid为0、1、2和3的用户对应的节点与特征组p1之间的第一边,还可以创建userid为4、5和6的用户对应的节点与特征组p2之间的第一边,还可以创建userid为7、8和9的用户对应的节点与特征组p3之间的第一边。根据指示矩阵A phone的元素值,将上述第一边的权重设置为1。或者,根据特征维度的重要性,将第一边的权重设置为0.7、0.5或1.3等任一值。
再然后,在满足第二条件的特征组对应的节点之间创建第二边,第二边的权重为特征组之间的相似度。由于特征关联矩阵中的元素用于表示每个特征列中特征组之间的相似度,因此计算机设备创建第二边时,可以根据上述步骤中 获得的特征关联矩阵来创建。在一种可能的实现方式中,当两个特征组之间的相似度不为零时,在该两个特征组对应的节点之间创建第二边。在另一种可能的实现方式中,当两个特征组之间的相似度大于预设的特征组相似度阈值时,在该两个特征组对应的节点之间创建第二边。在另一种可能的实现方式中,可以仅在同一特征维度下的特征组对应的节点之间创建第二边。在另一种可能的实现方式中,可以在全部获得的特征组对应的节点之间创建第二边。本申请实施例中,是在相似度不为零的且属于同一特征维度的两个特征组的对应的节点之间创建第二边。
例如,根据特征关联矩阵Q phone可以创建特征组p1和特征组p2对应的节点之间的第二边,该第二边的权重为0.2;可以创建特征组p1和特征组p3对应的节点之间的第二边,该第二边的权重为0.5;可以创建特征组p2和特征组p3对应的节点之间的第二边,该第二边的权重为0.2。
再然后,在满足第三条件的用户对应的节点之间创建第三边,第三边的权重为用户之间的相似度。由于相似度矩阵中的元素为多个用户中用户之间的相似度,因此计算机设备创建第三边时,可以根据上述步骤中获得的相似度矩阵来创建。在一种可能的实现方式中,当两个用户之间的相似度不为零时,在该两个用户对应的节点之间创建第三边。在另一个可能的实现方式中,当两个用户之间的相似度大于预设的用户相似度阈值时,在该两个用户对应的节点之间创建第三边。
例如,根据相似度矩阵S,可以创建图1所示的待检测数据中包含的10个用户之间的第三边,userid为0的用户和userid为2的用户之间的相似度为(2.05×(1/3)),创建userid为0的用户和userid为2的用户对应的节点之间的第三边,该第三边的权重为(2.05×(1/3));userid为1的用户和userid为5的用户之间的相似度为(0.7×(1/3)),创建userid为1的用户和userid为5的用户对应的节点之间的第三边,该第三边的权重为(0.7×(1/3));依次创建10个用户两两之间的第三边。
例如,参考图4所示,图4是本申请实施例提供的一种图模型的示意图,该图模型根据图1中的待检测数据构造而成。图模型中包括10个用户节点,分别为节点0、节点1、节点2、节点3、节点4、节点5、节点6、节点7、节点8和节点9,还包括8个特征组节点,分别为phone特征维度对应的特征列划分出的特征组对应的节点p1、节点p2和节点p3,city特征维度对应的特征列划分出的特征组对应的节点c1和节点c2,以及uuid特征维度对应的特征列划分出的特征组对应的节点u1、节点u2和节点u3。图4所示的图模型中画出了第一边和第二边,但是并未画出用户对应的节点之间的第三边。
需要说明的是,上述第一边、第二边和第三边仅是为了便于说明和区分不同的边而设定的,不存在顺序关系,并且上述第一边、第二边和第三边的创建顺序不固定,可以先创建其中任意一个,本申请对此不进行具体限定。
(2)计算机设备根据多个用户组对图模型进行特征提取,得到多个群组特征矩阵,每个用户组对应一个群组特征矩阵。
对于多个用户组中的每个用户组,计算机设备从图模型中获取每个用户组对应的群组特征图,该群组特征图为图模型的一部分,仅包括该用户组中的用户对应的节点以及特征组对应的节点,节点个数可以表示为T。
计算机设备获取多个群组特征图后,对每个群组特征图中的每个节点进行特征提取,得到对应的群组特征矩阵,群组特征矩阵中的元素为群组特征图中节点的特征。在提取每个节点的特征时,提取每个节点的Egonet(Egocentric Network,以自我为中心的)特征,每个节点的Egonet特征包括:该节点的邻居节点个数,该节点关联边的权重之和以及以该节点为顶点的三角形个数之和等。提取的特征的个数可以表示为E,E为大于零的正整数,例如可以为3个、5个或者8个,本申请对此不进行具体限制。
计算机设备提取每个群组特征图中节点的特征后,可以得到一个T×E的群组特征矩阵,从而得到多个群组特征矩阵。
例如,一个用户组中包括userid为0、4、7的三个用户,从图4所示的图模型中获取该用户组对应的群组特征图,如图5所示,为本申请实施例提供的一种群组特征图的示意图,包括节点0、节点4、节点7、节点p1、节点p2、节点p3、节点c1、节点c2、节点u1、节点u2和节点u3。其中节点0提取到的Egonet特征有:5个邻居节点、权重之和(4.71×(1/3))以及两个三角形。得到一个11×3的群组特征矩阵。
(3)计算机设备根据多个群组特征矩阵,得到对应的多个特征向量。
计算机设备在获取到多个群组特征矩阵后,对于每个群组特征矩阵,计算其特征向量,得到多个特征向量。在一种可能的实现方式中,可以通过SVD(Singular Value Decomposition,奇异值分解)原理对上述群组特征矩阵进行分解,得到该群组特征矩阵的主特征向量,将该主特征向量作为该群组特征矩阵的特征向量,其中,该特征向量可以为列向量,该列向量的维度为(E×1)。
(4)计算机设备根据多个特征向量,确定多个用户组中的目标群组,目标群组为具有目标特性的群组。
计算机设备获取到多个特征向量后,可以根据该多个特征向量,从多个用户组中确定具有目标特性的群组。相应的,本步骤可以通过以下步骤(4-1)至(4-3)来实现。
(4-1)计算机设备可以根据多个特征向量,计算多个特征向量的平均值,从而获取到平均特征向量,其中,平均特征向量即为多个特征向量的平均值,平均特征向量可以表示为V avg
(4-2)计算机设备可以根据平均特征向量和每个用户组对应的群组特征矩阵的特征向量,获取每个用户组的评估值。
计算机设备获取到平均特征向量和群组特征矩阵的特征向量后,获取每个用户组的评估值Z。对于D个用户组中的第d个用户组,其对应的群组特征矩阵的特征向量可以表示为V d,则第d个用户组的评估值Z d可以表示为:
Figure PCTCN2019118114-appb-000015
(4-3)对于每个用户组,计算机设备当用户组的评估值大于目标阈值时,判定用户组为目标群组,目标群组为具有目标特性的群组;当用户组的评估值不大于目标阈值时,判定用户组不是目标群组。
需要说明的是,上述目标阈值可以为0到2之间的值,该目标阈值可以根据实际应用场景进行设定,本申请对此不进行具体限制。
在本申请实施例中,通过对待检测数据中每个特征列进行分组,得到多个特征组,根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,根据相似度矩阵进行聚类,得到多个用户组;根据多个特征组和多个用户组进行检测,确定多个用户组中的目标群组,目标群组为具有目标特性的群组。相较于仅依赖社交关系以及标签来对用户进行分组,本申请对用户的各个特征维度都进行了分组,获取了包含用户之间的相似度的相似度矩阵,通过对相似度矩阵聚类得到的多个用户组进行检测,从而确定出具有目标特性的群组,准确率较高,检测效果好。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。
图6是本申请实施例提供的一种目标群组检测装置的示意图。参见图6,该装置包括:分组模块601、第一获取模块602、聚类模块603和检测模块604。
分组模块601,被配置为对待检测数据中每个特征列进行分组,得到多个特征组,每个特征列对应至少一个特征组,每个特征列包括不同用户的同一特征维度的特征;
第一获取模块602,被配置为根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,相似度矩阵中的元素用于表示多个用户中用户之间的相似度,其中,每个特征列的特征关联矩阵的元素用于表示每个特征列中特征组之间的相似度,每个特征列的指示矩阵的元素用于表示多个用户所属的特征组;
聚类模块603,被配置为根据相似度矩阵进行聚类,得到多个用户组;
检测模块604,被配置为根据多个特征组和多个用户组进行检测,确定多个用户组中的目标群组,目标群组为具有目标特性的群组。
在一种可能的实现方式中,装置还包括:
第二获取模块,被配置为获取每个特征列对应的指示矩阵,得到多个指示矩阵;
输入模块,被配置为将每个指示矩阵输入特征关联函数中,得到对应的特征关联矩阵,特征关联函数用于通过机器学习的方式根据指示矩阵中的元素获取对应的特征关联矩阵。
在另一种可能的实现方式中,第一获取模块602,还被配置为将多个特征列对应的指示矩阵和特征关联矩阵输入相似度计算函数中,得到相似度矩阵,相似度计算函数用于根据指示矩阵的元素和特征关联矩阵的元素获取多个用户中用户之间的相似度。
在另一种可能的实现方式中,检测模块604,还被配置为根据多个特征组和多个用户作为节点,在满足目标条件的节点之间创建边,得到图模型;根据多个用户组对图模型进行特征提取,得到多个群组特征矩阵,每个用户组对应一个群组特征矩阵;根据多个群组特征矩阵,得到对应的多个特征向量;根据多个特征向量,确定多个用户组中的目标群组。
在另一种可能的实现方式中,在检测模块604,还被配置为在满足第一条件的特征组对应的节点和用户对应的节点之间创建第一边,第一边的权重为用户和特征组之间的从属关系;在满足第二条件的特征组对应的节点之间创建第二边,第二边的权重为特征组之间的相似度;在满足第三条件的用户对应的节点之间创建第三边,第三边的权重为用户之间的相似度,得到图模型。
在另一种可能的实现方式中,检测模块604,还被配置为对于多个用户组中的每个用户组,获取每个用户组对应的群组特征图,群组特征图为图模型的一部分;对每个群组特征图中的每个节点进行特征提取,得到对应的群组特征矩阵,群组特征矩阵中的元素用于表示群组特征图中节点的特征。
在另一种可能的实现方式中,还被配置为根据多个特征向量,获取平均特征向量,平均特征向量为多个特征向量的平均值;根据平均特征向量和每个用户组对应的群组特征矩阵的特征向量,获取每个用户组的评估值;对于每个用户组,当用户组的评估值大于目标阈值时,判定用户组为目标群组;当用户组的评估值不大于目标阈值时,判定用户组不是目标群组。
在本申请实施例中,通过对待检测数据中每个特征列进行分组,得到多个特征组,根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,根据相似度矩阵进行聚类,得到多个用户组;根据多个特征组和多个用户组进行检测,确定多个用户组中的目标群组,目标群组为具有目标特性的群组。相较于仅依赖社交关系以及标签来对用户进行分组,本申请对用户的各个特征维度都进行了分组,获取了包含用户之间的相似度的相似度矩阵,通过对相似度矩阵聚类得到的多个用户组进行检测,从而确定出具有目标特性的群组,准确率较高,检测效果好。
需要说明的是:上述实施例提供的目标群组检测装置在检测目标群组时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将计算机设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的目标群组检测装置与目标群组检测方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图7是本申请实施例提供的计算机设备700的结构框图。该计算机设备700可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。计算机设备700还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,计算机设备700包括有:处理器701和存储器702。
处理器701可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器701可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器701也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器701可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器701还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器702可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器702还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器702中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器701所执行以实现本申请中方法实施例提供的目标群组检测方法。
在一些实施例中,计算机设备700还可选包括有:外围设备接口703和至少一个外围设备。处理器701、存储器702和外围设备接口703之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口703相连。具体地,外围设备包括:射频电路704、触摸显示屏705、摄像头706、音频电路707、定位组件708和电源709中的至少一种。
外围设备接口703可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器701和存储器702。在一些实施例中,处理器701、存储器702和外围设备接口703被集成在同一芯片或电路板上;在一些其他实施例中,处理器701、存储器702和外围设备接口703中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路704用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路704通过电磁信号与通信网络以及其他通信设备进行通信。射频电路704将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路704包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块 卡等等。射频电路704可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:城域网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。显示屏705用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏705是触摸显示屏时,显示屏705还具有采集在显示屏705的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器701进行处理。此时,显示屏705还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏705可以为一个,设置计算机设备700的前面板;在另一些实施例中,显示屏705可以为至少两个,分别设置在计算机设备700的不同表面或呈折叠设计;在再一些实施例中,显示屏705可以是柔性显示屏,设置在计算机设备700的弯曲表面上或折叠面上。甚至,显示屏705还可以设置成非矩形的不规则图形,也即异形屏。显示屏705可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件706用于采集图像或视频。可选地,摄像头组件706包括前置摄像头和后置摄像头。通常,前置摄像头设置在计算机设备的前面板,后置摄像头设置在计算机设备的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件706还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路707可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器701进行处理,或者输入至射频电路704以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在计算机设备700的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器701或射频电路704的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路707还可以包括耳机插孔。
定位组件708用于定位计算机设备700的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件708可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统、俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。
电源709用于为计算机设备700中的各个组件进行供电。电源709可以是交流电、直流电、一次性电池或可充电电池。当电源709包括可充电电池时,该可充电电池可以支持有线充电或无线充电。该可充电电池还可以用于支持快 充技术。
在一些实施例中,计算机设备700还包括有一个或多个传感器710。该一个或多个传感器710包括但不限于:加速度传感器711、陀螺仪传感器712、压力传感器713、指纹传感器714、光学传感器715以及接近传感器716。
加速度传感器711可以检测以计算机设备700建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器711可以用于检测重力加速度在三个坐标轴上的分量。处理器701可以根据加速度传感器711采集的重力加速度信号,控制触摸显示屏705以横向视图或纵向视图进行用户界面的显示。加速度传感器711还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器712可以检测计算机设备700的机体方向及转动角度,陀螺仪传感器712可以与加速度传感器711协同采集用户对计算机设备700的3D动作。处理器701根据陀螺仪传感器712采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器713可以设置在计算机设备700的侧边框和/或触摸显示屏705的下层。当压力传感器713设置在计算机设备700的侧边框时,可以检测用户对计算机设备700的握持信号,由处理器701根据压力传感器713采集的握持信号进行左右手识别或快捷操作。当压力传感器713设置在触摸显示屏705的下层时,由处理器701根据用户对触摸显示屏705的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器714用于采集用户的指纹,由处理器701根据指纹传感器714采集到的指纹识别用户的身份,或者,由指纹传感器714根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器701授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器714可以被设置计算机设备700的正面、背面或侧面。当计算机设备700上设置有物理按键或厂商Logo时,指纹传感器714可以与物理按键或厂商Logo集成在一起。
光学传感器715用于采集环境光强度。在一个实施例中,处理器701可以根据光学传感器715采集的环境光强度,控制触摸显示屏705的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏705的显示亮度;当环境光强度较低时,调低触摸显示屏705的显示亮度。在另一个实施例中,处理器701还可以根据光学传感器715采集的环境光强度,动态调整摄像头组件706的拍摄参数。
接近传感器716,也称距离传感器,通常设置在计算机设备700的前面板。接近传感器716用于采集用户与计算机设备700的正面之间的距离。在一个实施例中,当接近传感器716检测到用户与计算机设备700的正面之间的距离逐渐变小时,由处理器701控制触摸显示屏705从亮屏状态切换为息屏状态;当接近传感器716检测到用户与计算机设备700的正面之间的距离逐渐变大时, 由处理器701控制触摸显示屏705从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图7中示出的结构并不构成对计算机设备700的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图8是本申请实施例提供的一种计算机设备的结构示意图,该计算机设备800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(Central Processing Units,CPU)801和一个或一个以上的存储器802,其中,该存储器802中存储有至少一条指令,该至少一条指令由该处理器801加载并执行以实现上述各个方法实施例提供的方法。当然,该计算机设备还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算机设备还可以包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括指令的存储器,上述指令可由计算机设备中的处理器执行以完成上述实施例中目标群组检测方法。例如,该计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种目标群组检测方法,其特征在于,所述方法包括:
    对待检测数据中每个特征列进行分组,得到多个特征组,所述每个特征列对应至少一个特征组,所述每个特征列包括不同用户的同一特征维度的特征;
    根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,所述相似度矩阵中的元素用于表示多个用户中用户之间的相似度,其中,每个特征列的特征关联矩阵的元素用于表示所述每个特征列中特征组之间的相似度,每个特征列的指示矩阵的元素用于表示所述多个用户所属的特征组;
    根据所述相似度矩阵进行聚类,得到多个用户组;
    根据所述多个特征组和所述多个用户组进行检测,确定所述多个用户组中的目标群组,所述目标群组为具有目标特性的群组。
  2. 根据权利要求1所述的方法,其特征在于,所述根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵之前,所述方法还包括:
    获取所述每个特征列对应的指示矩阵,得到多个指示矩阵;
    将每个指示矩阵输入特征关联函数中,得到对应的特征关联矩阵,所述特征关联函数用于通过机器学习的方式根据指示矩阵中的元素获取对应的特征关联矩阵。
  3. 根据权利要求1所述的方法,其特征在于,所述根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,包括:
    将多个特征列对应的指示矩阵和特征关联矩阵输入相似度计算函数中,得到相似度矩阵,所述相似度计算函数用于根据所述指示矩阵的元素和所述特征关联矩阵的元素获取所述多个用户中用户之间的相似度。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述多个特征组和所述多个用户组进行检测,确定所述多个用户组中的目标群组,包括:
    根据所述多个特征组和所述多个用户作为节点,在满足目标条件的节点之间创建边,得到图模型;
    根据所述多个用户组对所述图模型进行特征提取,得到多个群组特征矩阵,每个用户组对应一个群组特征矩阵;
    根据所述多个群组特征矩阵,得到对应的多个特征向量;
    根据所述多个特征向量,确定所述多个用户组中的目标群组。
  5. 根据权利要求4所述的方法,其特征在于,所述在满足目标条件的节点之间创建边,得到图模型,包括:
    在满足第一条件的特征组对应的节点和用户对应的节点之间创建第一边,所述第一边的权重为所述用户和所述特征组之间的从属关系;
    在满足第二条件的所述特征组对应的节点之间创建第二边,所述第二边的权重为所述特征组之间的相似度;
    在满足第三条件的所述用户对应的节点之间创建第三边,所述第三边的权重为所述用户之间的相似度,得到图模型。
  6. 根据权利要求4所述的方法,其特征在于,所述根据所述多个用户组对所述图模型进行特征提取,得到多个群组特征矩阵,包括:
    对于所述多个用户组中的每个用户组,获取所述每个用户组对应的群组特征图,所述群组特征图为所述图模型的一部分;
    对所述每个群组特征图中的每个节点进行特征提取,得到对应的群组特征矩阵,所述群组特征矩阵中的元素用于表示所述群组特征图中节点的特征。
  7. 根据权利要求4所述的方法,其特征在于,所述根据所述多个特征向量,确定所述多个用户组中的目标群组,包括:
    根据所述多个特征向量,获取平均特征向量,所述平均特征向量为所述多个特征向量的平均值;
    根据所述平均特征向量和每个用户组对应的群组特征矩阵的特征向量,获取每个用户组的评估值;
    对于每个用户组,当所述用户组的评估值大于目标阈值时,判定所述用户组为目标群组;当所述用户组的评估值不大于所述目标阈值时,判定所述用户组不是目标群组。
  8. 一种目标群组检测装置,其特征在于,所述装置包括:
    分组模块,被配置为对待检测数据中每个特征列进行分组,得到多个特征组,所述每个特征列对应至少一个特征组,所述每个特征列包括不同用户的同一特征维度的特征;
    第一获取模块,被配置为根据多个特征列对应的指示矩阵与特征关联矩阵,获取相似度矩阵,所述相似度矩阵中的元素用于表示多个用户中用户之间的相似度,其中,每个特征列的特征关联矩阵的元素用于表示所述每个特征列中特征组之间的相似度,每个特征列的指示矩阵的元素用于表示所述多个用户所属的特征组;
    聚类模块,被配置为根据所述相似度矩阵进行聚类,得到多个用户组;
    检测模块,被配置为根据所述多个特征组和所述多个用户组进行检测,确定所述多个用户组中的目标群组,所述目标群组为具有目标特性的群组。
  9. 一种计算机设备,其特征在于,所述计算机设备包括一个或多个处理器和一个或多个存储器,所述一个或多个存储器中存储有至少一条指令,所述至 少一条指令由所述一个或多个处理器加载并执行以实现如权利要求1至权利要求7任一项所述的目标群组检测方法所执行的操作。
  10. 一种非临时性计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如权利要求1至权利要求7任一项所述的目标群组检测方法所执行的操作。
PCT/CN2019/118114 2019-05-05 2019-11-13 目标群组检测方法、装置、计算机设备及存储介质 WO2020224222A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910367835.3 2019-05-05
CN201910367835.3A CN110083791B (zh) 2019-05-05 2019-05-05 目标群组检测方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020224222A1 true WO2020224222A1 (zh) 2020-11-12

Family

ID=67418624

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118114 WO2020224222A1 (zh) 2019-05-05 2019-11-13 目标群组检测方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110083791B (zh)
WO (1) WO2020224222A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925990A (zh) * 2021-02-26 2021-06-08 上海哔哩哔哩科技有限公司 目标群体分类方法及装置
CN113205183A (zh) * 2021-04-23 2021-08-03 北京达佳互联信息技术有限公司 物品推荐网络训练方法、装置、电子设备及存储介质
CN113378020A (zh) * 2021-06-08 2021-09-10 深圳Tcl新技术有限公司 相似观影用户的获取方法、设备和计算机可读存储介质
CN118644225A (zh) * 2024-05-29 2024-09-13 南京启征信息技术有限公司 一种基于多智能体强化学习的变电站运维决策方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083791B (zh) * 2019-05-05 2020-04-24 北京三快在线科技有限公司 目标群组检测方法、装置、计算机设备及存储介质
CN110602101B (zh) * 2019-09-16 2021-01-01 北京三快在线科技有限公司 网络异常群组的确定方法、装置、设备及存储介质
CN110781247B (zh) * 2019-09-23 2021-11-26 华为技术有限公司 向量聚类方法、装置及存储介质
CN111401959B (zh) * 2020-03-18 2023-09-29 多点(深圳)数字科技有限公司 风险群体的预测方法、装置、计算机设备及存储介质
CN111753154B (zh) * 2020-06-22 2024-03-19 北京三快在线科技有限公司 用户数据处理方法、装置、服务器及计算机可读存储介质
CN111598713B (zh) * 2020-07-24 2021-12-14 北京淇瑀信息科技有限公司 基于相似度权重更新的团伙识别方法、装置及电子设备
CN112540749B (zh) * 2020-11-16 2023-10-24 南方电网数字平台科技(广东)有限公司 微服务划分方法、装置、计算机设备和可读存储介质
CN114764480A (zh) 2021-01-04 2022-07-19 腾讯科技(深圳)有限公司 群组类型识别方法、装置、计算机设备及介质
CN112905476B (zh) * 2021-03-12 2023-08-11 网易(杭州)网络有限公司 测试的执行方法及装置、电子设备、存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008165A (zh) * 2014-05-29 2014-08-27 华东师范大学 一种基于网络拓扑结构和节点属性的社团检测方法
WO2016191821A1 (en) * 2015-06-02 2016-12-08 Ecocraft Systems Pty Ltd A self inflating personal safety device
CN107426177A (zh) * 2017-06-13 2017-12-01 努比亚技术有限公司 一种用户行为聚类分析方法及终端、计算机可读存储介质
CN109117943A (zh) * 2018-07-24 2019-01-01 中国科学技术大学 利用多属性信息增强网络表征学习的方法
CN110083791A (zh) * 2019-05-05 2019-08-02 北京三快在线科技有限公司 目标群组检测方法、装置、计算机设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107786943B (zh) * 2017-11-15 2020-09-01 北京腾云天下科技有限公司 一种用户分群方法及计算设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008165A (zh) * 2014-05-29 2014-08-27 华东师范大学 一种基于网络拓扑结构和节点属性的社团检测方法
WO2016191821A1 (en) * 2015-06-02 2016-12-08 Ecocraft Systems Pty Ltd A self inflating personal safety device
CN107426177A (zh) * 2017-06-13 2017-12-01 努比亚技术有限公司 一种用户行为聚类分析方法及终端、计算机可读存储介质
CN109117943A (zh) * 2018-07-24 2019-01-01 中国科学技术大学 利用多属性信息增强网络表征学习的方法
CN110083791A (zh) * 2019-05-05 2019-08-02 北京三快在线科技有限公司 目标群组检测方法、装置、计算机设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925990A (zh) * 2021-02-26 2021-06-08 上海哔哩哔哩科技有限公司 目标群体分类方法及装置
CN113205183A (zh) * 2021-04-23 2021-08-03 北京达佳互联信息技术有限公司 物品推荐网络训练方法、装置、电子设备及存储介质
CN113205183B (zh) * 2021-04-23 2024-05-14 北京达佳互联信息技术有限公司 物品推荐网络训练方法、装置、电子设备及存储介质
CN113378020A (zh) * 2021-06-08 2021-09-10 深圳Tcl新技术有限公司 相似观影用户的获取方法、设备和计算机可读存储介质
CN118644225A (zh) * 2024-05-29 2024-09-13 南京启征信息技术有限公司 一种基于多智能体强化学习的变电站运维决策方法

Also Published As

Publication number Publication date
CN110083791B (zh) 2020-04-24
CN110083791A (zh) 2019-08-02

Similar Documents

Publication Publication Date Title
WO2020224222A1 (zh) 目标群组检测方法、装置、计算机设备及存储介质
US11244170B2 (en) Scene segmentation method and device, and storage medium
EP3985990A1 (en) Video clip positioning method and apparatus, computer device, and storage medium
WO2020228519A1 (zh) 字符识别方法、装置、计算机设备以及存储介质
CN110059652B (zh) 人脸图像处理方法、装置及存储介质
CN111104980B (zh) 确定分类结果的方法、装置、设备及存储介质
CN111489378A (zh) 视频帧特征提取方法、装置、计算机设备及存储介质
US11386586B2 (en) Method and electronic device for adding virtual item
CN110570460A (zh) 目标跟踪方法、装置、计算机设备及计算机可读存储介质
CN111178343A (zh) 基于人工智能的多媒体资源检测方法、装置、设备及介质
CN110675412A (zh) 图像分割方法、图像分割模型的训练方法、装置及设备
CN110853124B (zh) 生成gif动态图的方法、装置、电子设备及介质
CN113627413B (zh) 数据标注方法、图像比对方法及装置
CN112581358B (zh) 图像处理模型的训练方法、图像处理方法及装置
WO2023066373A1 (zh) 确定样本图像的方法、装置、设备及存储介质
CN110675473B (zh) 生成gif动态图的方法、装置、电子设备及介质
CN110728167A (zh) 文本检测方法、装置及计算机可读存储介质
CN113343709B (zh) 意图识别模型的训练方法、意图识别方法、装置及设备
CN113407774B (zh) 封面确定方法、装置、计算机设备及存储介质
CN113222771B (zh) 一种基于知识图谱确定目标群体的方法、装置及电子设备
CN112001442B (zh) 特征检测方法、装置、计算机设备及存储介质
CN110166275A (zh) 信息处理方法、装置及存储介质
CN112925922A (zh) 获取地址的方法、装置、电子设备及介质
CN113762054A (zh) 图像识别方法、装置、设备及可读存储介质
CN116244491A (zh) 资源推荐方法、装置、设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19928145

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19928145

Country of ref document: EP

Kind code of ref document: A1