CN111753154B

CN111753154B - User data processing method, device, server and computer readable storage medium

Info

Publication number: CN111753154B
Application number: CN202010574802.9A
Authority: CN
Inventors: 陈振
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2024-03-19
Anticipated expiration: 2040-06-22
Also published as: CN111753154A

Abstract

The application discloses a user data processing method, a user data processing device, a server and a computer readable storage medium, and belongs to the technical field of Internet. The method comprises the steps of obtaining feature data of at least one object to be identified, wherein the feature data comprises at least one of environment data, registration data, equipment data and historical behavior data of the object to be identified; combining the feature data of at least one object to be identified to obtain m feature combinations meeting the reference condition; obtaining m clusters to be identified according to feature data corresponding to the m feature combinations, wherein the m clusters to be identified correspond to the m feature combinations; and clustering the m clusters to be identified, and screening target clusters meeting preset conditions. When the target clusters meeting the preset conditions are screened, the characteristic data of the objects to be identified in the clusters to be identified are considered, so that the target clusters are determined more accurately, and the accuracy and the reliability of user data processing are improved.

Description

User data processing method, device, server and computer readable storage medium

Technical Field

The embodiment of the application relates to the technical field of internet, in particular to a user data processing method, a device, a server and a computer readable storage medium.

Background

In recent years, with the rapid development of internet technology, electronic commerce, third party payment and other online services have exploded, and internet fraud crimes have become more and more serious, so a user data processing method is needed to identify a target cluster in the internet, such as a fraud cluster in the internet.

In the related art, an object to be identified related to a determined target object is mined based on the determined target object; constructing a relation network based on the object to be identified and the target object, and carrying out cluster discovery on the relation network to obtain at least one cluster included in the relation network, wherein each cluster comprises a plurality of objects to be identified and target objects; according to the correlation between the object to be identified in the cluster and the cluster, determining the object to be identified, the correlation of which does not meet the reference correlation, in the cluster, and removing the object to be identified, the correlation of which does not meet the reference correlation, so as to obtain the target cluster.

However, the above-mentioned user data processing method performs cluster recognition based on the determined target object, so that the recognition of the target cluster is limited, and when the determined target object does not exist in the server, the accuracy and reliability of the recognition of the target cluster are reduced.

Disclosure of Invention

The embodiment of the application provides a user data processing method, a device, a server and a computer readable storage medium, which can be used for solving the problems in the related art. The technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a method for processing user data, where the method includes:

acquiring feature data of at least one object to be identified, wherein the feature data comprises at least one of environment data, registration data, equipment data and historical behavior data of the object to be identified;

combining the feature data of the at least one object to be identified to obtain m feature combinations meeting the reference condition, wherein m is an integer greater than or equal to 1;

obtaining m clusters to be identified according to the feature data corresponding to the m feature combinations, wherein the m clusters to be identified correspond to the m feature combinations;

and clustering the m clusters to be identified, and screening target clusters meeting preset conditions.

In a possible implementation manner, the combining the feature data of the at least one object to be identified to obtain m feature combinations meeting the reference condition includes:

freely combining the feature data of the at least one object to be identified to obtain n feature combinations, wherein each feature combination comprises k feature data, n is an integer greater than m, and k is an integer greater than or equal to 1;

Calculating scores of the n feature combinations based on feature scores of feature data included in the n feature combinations;

sorting according to the scores of the n feature combinations to obtain n sorted feature combinations;

among the n feature combinations after the sorting, m feature combinations satisfying the reference condition are determined.

In one possible implementation manner, the clustering the m clusters to be identified, and screening the target clusters meeting the preset condition includes:

respectively matching a label for the m clusters to be identified, wherein the label is used for identifying the clusters to be identified;

updating the label of the cluster to be identified according to the label of the neighbor cluster adjacent to the cluster to be identified, and obtaining the label after the updating of the cluster to be identified;

clustering the clusters to be identified with the same label in the labels after updating the clusters to be identified to obtain candidate clusters, wherein the candidate clusters comprise a plurality of clusters to be identified;

and screening target clusters meeting preset conditions from the candidate clusters.

In one possible implementation manner, the updating the label of the cluster to be identified according to the label of the neighboring cluster adjacent to the cluster to be identified to obtain the label after the update of the cluster to be identified includes:

Updating the label of the cluster to be identified according to the label of the neighbor cluster adjacent to the cluster to be identified according to the following formula to obtain the label after updating the cluster to be identified:

wherein argmax is a maximum argument function, i represents the ith cluster to be identified, j represents a neighbor cluster j adjacent to the ith cluster to be identified, and W _i,j The weight between the ith cluster to be identified and the neighbor cluster j is the number of common objects to be identified included in the clusters to be identified and the neighbor clusters, N is the number of neighbor clusters, A _N Is the nth neighbor cluster.

In a possible implementation manner, the selecting a target cluster meeting a preset condition from the candidate clusters includes:

determining a risk score corresponding to the candidate cluster based on the label of the candidate cluster;

and determining the candidate cluster as a target cluster in response to the risk score of the candidate cluster meeting a preset condition.

calculating the relative entropy of the candidate cluster, wherein the relative entropy comprises discrete relative entropy and continuous relative entropy, the discrete relative entropy is used for representing the external variability of the candidate cluster, and the continuous relative entropy is used for representing the internal aggregation of the candidate cluster;

And determining the candidate cluster as a target cluster in response to the discrete relative entropy satisfying a first reference relative entropy and the continuous relative entropy satisfying a second reference relative entropy.

In one possible implementation manner, the determining, based on the label of the candidate cluster, a risk score corresponding to the candidate cluster includes:

and inputting the label of the candidate cluster into a target risk calculation model, and calculating the risk score of the candidate cluster through the target risk calculation model to obtain the risk score of the candidate cluster.

In one possible implementation, before the inputting the label of the candidate cluster into the target risk calculation model, the method further includes:

acquiring a label of at least one history cluster;

and training the initial risk calculation model according to the label of the at least one history cluster to obtain a target risk calculation model.

In one possible implementation, the environmental data includes at least one of an IP address and geographic location data where the object to be identified is located; the registration data comprises personal information filled in by the object to be identified during registration; the device data comprises the type of the device used by the object to be identified, and the historical behavior data comprises the behaviors of historical browsing, purchasing, commenting and the like of the object to be identified.

In a second aspect, an embodiment of the present application provides a user data processing apparatus, where the method includes:

the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring characteristic data of at least one object to be identified, and the characteristic data comprises at least one of environment data, registration data, equipment data and historical behavior data of the object to be identified;

the combination module is used for combining the feature data of the at least one object to be identified to obtain m feature combinations meeting the reference condition, wherein m is an integer greater than or equal to 1;

the determining module is used for obtaining m clusters to be identified according to the feature data corresponding to the m feature combinations, wherein the m clusters to be identified correspond to the m feature combinations;

and the screening module is used for clustering the m clusters to be identified and screening target clusters meeting preset conditions.

In one possible implementation manner, the combination module is configured to perform free combination on feature data of the at least one object to be identified to obtain n feature combinations, where each feature combination includes k feature data, where n is an integer greater than m, and k is an integer greater than or equal to 1;

In a possible implementation manner, the screening module is configured to match a label to each of the m clusters to be identified, where the label is used to identify the cluster to be identified;

In a possible implementation manner, the filtering module is configured to update the label of the to-be-identified cluster according to the following formula according to the label of the neighbor cluster adjacent to the to-be-identified cluster, to obtain the label after updating the to-be-identified cluster:

In one possible implementation manner, the screening module is configured to determine a risk score corresponding to the candidate cluster based on the label of the candidate cluster;

In one possible implementation, the filtering module is configured to calculate a relative entropy of the candidate cluster, where the relative entropy includes a discrete relative entropy and a continuous relative entropy, where the discrete relative entropy is used to represent an external variability of the candidate cluster, and the continuous relative entropy is used to represent an internal aggregation of the candidate cluster;

In one possible implementation manner, the filtering module is configured to input the label of the candidate cluster into a target risk calculation model, calculate a risk score of the candidate cluster through the target risk calculation model, and obtain the risk score of the candidate cluster.

In a possible implementation manner, the obtaining module is further configured to obtain a label of at least one history cluster;

the apparatus further comprises:

and the training module is used for training the initial risk calculation model according to the label of the at least one history cluster to obtain a target risk calculation model.

In a third aspect, embodiments of the present application provide a server, where the server includes a processor and a memory, where at least one piece of program code is stored in the memory, and the at least one piece of program code is loaded and executed by the processor to implement any of the above-mentioned user data processing methods.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement any of the above-described user data processing methods.

The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:

when the method provided by the embodiment of the application is used for processing the user data, the characteristic data of the object to be identified is considered, the characteristic combination is determined based on the characteristic data of the object to be identified, and the cluster to be identified is obtained based on the characteristic combination, so that the determination of the cluster to be identified is more accurate. Clustering is carried out on the clusters to be identified, and target clusters meeting preset conditions are screened, so that the determination of the target clusters is more accurate, and the accuracy and reliability of user data processing can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation environment of a user data processing method according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for processing user data according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a candidate cluster provided in an embodiment of the present application;

FIG. 4 is a flowchart of a user data processing method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a user data processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a user data processing method according to an embodiment of the present application, as shown in fig. 1, where the implementation environment includes: a server 101 and an electronic device 102.

The server 101 may be one server or may be a server cluster formed by a plurality of servers. Server 101 may be at least one of a cloud computing platform and a virtualization center, which is not limited by the embodiments of the present application. The server 101 is configured to obtain feature data of an object to be identified, and determine a feature combination according to the feature data of the object to be identified. And determining a cluster to be identified according to the feature combination, clustering the cluster to be identified, and screening target clusters meeting preset conditions. Of course, the server 101 may also include other functional servers to provide more comprehensive and diverse services.

The electronic device 102 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an MP3 (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, and a laptop portable computer. The electronic device 102 is connected to the server 101 via a wired network or a wireless network, and an application program for user data processing is installed and run in the electronic device 102. The electronic device 102 may also send the identification of the object to be identified to the server 101, so that the server 101 may obtain the feature data of the object to be identified based on the identification of the object to be identified.

Based on the above implementation environment, the embodiment of the present application provides a user data processing method, taking a flowchart of the user data processing method provided in the embodiment of the present application shown in fig. 2 as an example, where the method may be executed by the server 101 in fig. 1. As shown in fig. 2, the method comprises the steps of:

in step 201, feature data of at least one object to be identified is acquired, the feature data comprising at least one of environmental data, registration data, device data and historical behavior data of the object to be identified.

In this embodiment of the present application, the server and the electronic device are in communication connection through a wired network or a wireless network, where the electronic device may send an identification request of an object to be identified to the server, where the identification request carries an object identifier of the object to be identified, where the object identifier may be a number or an account number of the object to be identified, so long as the object identifier may correspond to one object to be identified, and this embodiment of the present application does not limit the object identifier.

In one possible implementation manner, the storage space of the server stores object identifiers of all objects to be identified and corresponding user data, and when the server receives an identification request sent by the electronic device, the server analyzes the identification request to obtain the object identifiers of the objects to be identified carried in the identification request. And acquiring the user data of the object to be identified in the storage space of the server based on the object identification of the object to be identified.

In one possible implementation, the storage space of the server may store the user data of the object to be identified, and the server divides the storage space into a target number of first storage spaces, each for storing the user data of one object to be identified. For example, the server divides the storage space into five first storage spaces, the first storage space is used for storing user data corresponding to the object to be identified, the second first storage space is used for storing user data corresponding to the object to be identified, the third first storage space is used for storing user data corresponding to the object to be identified, the fourth first storage space is used for storing user data corresponding to the object to be identified, and the fifth first storage space is used for storing user data corresponding to the object to be identified.

In one possible implementation manner, after the server obtains the user data of the object to be identified, the feature data of the object to be identified is extracted from the user data of the object to be identified. The characteristic data of the object to be identified includes at least one of environmental data, registration data, device data, and historical behavior data of the object to be identified. The environment data comprises at least one of IP address and geographic position data of the object to be identified, the registration data comprises personal information filled in the object to be identified when registering, and the personal information comprises, but is not limited to, information such as name, telephone number, identification card number and the like of the object to be identified. The device data includes a device type used by the object to be identified. The historical behavior data comprises historical browsing, purchasing, commenting and other behaviors of the object to be identified.

The server receives an identification request of the object to be identified, which is sent by the electronic device, and the identification request carries an object identifier 0001 of the object to be identified. Analyzing the identification request to obtain an object identifier 0001 carried in the identification request, determining a first storage space corresponding to the object identifier 0001 in a storage space, namely a first storage space, acquiring user data stored in the first storage space, and extracting feature data of an object to be identified from the user data stored in the first storage space, namely the feature data of the object to be identified acquired by a server.

It should be noted that, the process of obtaining the feature data of each object to be identified by the server is consistent with the process of obtaining the feature data of the first object to be identified, which is not described herein.

In step 202, feature data of at least one object to be identified is combined to obtain m feature combinations satisfying the reference condition, where m is an integer greater than or equal to 1.

In the embodiment of the present application, the process of combining the feature data of at least one object to be identified obtained in the above step 201 to obtain m feature combinations satisfying the reference condition includes the following steps 2021 to 2024.

Step 2021, performing free combination on the feature data of at least one object to be identified to obtain n feature combinations, where each feature combination includes k feature data.

In a possible implementation manner, based on the feature data of at least one object to be identified obtained in the step 201, the feature data are freely combined to obtain n feature combinations, where each feature combination includes k feature data, where n is an integer greater than m, and k is an integer greater than or equal to 1.

Illustratively, the step 201 obtains the feature data of the first object to be identified, the feature data of the second object to be identified, the feature data of the third object to be identified, the feature data of the fourth object to be identified, and the feature data of the fifth object to be identified. Based on the feature data of the five objects to be identified, free combination is performed, and taking the number k of the feature data included in each feature combination as 3 as an example, five feature combinations are obtained, namely feature combination one, feature combination two, feature combination three, feature combination four and feature combination five. For example, the feature data included in the feature combination group is a name, an identification card number, a telephone number; the feature data included in the feature combination II is name, telephone number and geographic position data; the feature data included in the feature combination III are telephone numbers, geographic position data and equipment types; the feature data included in the feature combination IV are names, geographic position data and equipment types; the feature data included in the feature combination five is a name, an IP address, and a device type.

Based on the feature data corresponding to the five objects to be identified, the five objects to be identified are respectively added into the corresponding feature combinations to obtain the feature combinations comprising the objects to be identified. The first feature combination comprises a first object to be identified and a second object to be identified; the feature combination II comprises an object I to be identified and an object III to be identified, and an object five to be identified; the feature combination III comprises an object to be identified II, an object to be identified III and an object to be identified IV; the feature combination IV comprises an object to be identified I and an object to be identified V; the feature combination five comprises an object to be identified II, an object to be identified IV and an object to be identified V.

It should be noted that, the number of objects to be identified is only 5, the number of feature data included in the feature combination is 3, and the number of feature combinations is 5, which is described as an example, and is not intended to limit the present application. The number of objects to be identified may be more or less, the number k of feature data included in the feature combinations may be more or less, and the number of feature combinations may be more or less, which is not limited in the embodiment of the present application.

In one possible implementation, feature combinations that do not meet the requirements may also be deleted. Illustratively, the number of objects to be identified included in each feature combination is determined, and if the number of objects to be identified included in the feature combination is smaller than the target number, the corresponding feature combination is deleted. For example, taking the target number as 2 as an example, since the number of objects to be identified included in the feature combinations is equal to or greater than 2, no feature combination is deleted. Taking the target number of 3 as an example, since the number of objects to be identified included in the feature combination two and the feature combination four is 2, the feature combination two and the feature combination four can be deleted.

Step 2022, calculating scores of the n feature combinations based on feature scores of feature data included in the n feature combinations.

In one possible implementation manner, each feature data of the object to be identified has a feature score corresponding to the feature data, and the feature scores may be represented by 0 and 1, or may be represented by other numbers, and the representation form of the feature scores is not limited in the embodiment of the present application.

In a possible implementation manner, based on the n feature combinations obtained in the step 2021, feature scores of feature data of the object to be identified included in each feature combination are determined, and scores of corresponding feature combinations are calculated according to the feature scores of the feature data of the object to be identified included in each feature combination. For example, feature scores of feature data of the object to be identified included in the feature combination may be added to obtain a score of the feature combination.

For example, the first feature combination includes a first object to be identified and a second object to be identified, and feature scores corresponding to feature data of the first object to be identified are shown in the following table one, and feature scores corresponding to feature data of the second object to be identified are shown in the following table two.

List one

Feature data	Name of name	Telephone number	Identification card number	Geographic location data	IP address	Device type
							Feature score	1	0	1	1	0	1

Watch II

Feature data	Name of name	Telephone number	Identification card number	Geographic location data	IP address	Device type
							Feature score	0	1	1	0	0	1

Based on the feature scores corresponding to the feature data in the first and second tables, calculating the total value of the features corresponding to the first object to be identified as 1+0+1+1+0+1=4, and calculating the total value of the features corresponding to the second object to be identified as 0+1+1+0+0+1=3. And calculating the score corresponding to the feature combination as 4+3=7 according to the feature total value corresponding to the first object to be identified and the feature total value corresponding to the second object to be identified, so that the score of the feature combination can be obtained.

Note that, in the embodiment of the present application, the score of the feature combination is calculated only by taking the method of adding the feature scores of the feature objects as an example, and the score of the feature combination may also be calculated in other manners, which is not limited in the embodiment of the present application.

The above-mentioned calculation process of the score of the feature combination is described by taking only one feature combination as an example, and the calculation process of the score of other feature combinations is identical to the calculation process of the score of one feature combination, and will not be described in detail here.

In one possible implementation, if a certain feature data of an object to be identified has no corresponding feature score, a corresponding feature score is generated for the feature data by a random number generator, so as to prevent the feature data from affecting the score of the feature combination where the feature data is located.

In one possible implementation manner, if the object to be identified included in the feature combination includes a suspected object to be identified, the feature combination may be determined as a suspected feature combination first, a suspected score of the feature combination is calculated, and the suspected score is taken into consideration when calculating the feature score of the feature combination. The calculation process of the suspicion score may be as shown in the following formula (1):

illustratively, the object to be identified in the feature set is a suspected object to be identified, in calculating the feature score for the feature set, the suspicion score for the feature set is taken into account, i.e., the feature setThe feature score of the first object is the feature total value of the first object to be identified, the feature total value of the second object to be identified, and the suspicion score of the first object to be identified=4+3+0.5=7.5.

Step 2023, sorting according to the scores of the n feature combinations, to obtain n sorted feature combinations.

In one possible implementation, the scores of the n feature combinations obtained in the step 2022 are ranked, where the ranking may be from high to low, or from low to high, and this embodiment of the present application is not limited.

For example, the score of the feature combination one is 7, the score of the feature combination two is 9, the score of the feature combination three is 5, the score of the feature combination four is 8, the score of the feature combination five is 10, and the feature combinations are ranked in the order from high to low based on the score of the feature combinations, so that the ranked feature combinations are the feature combination five, the feature combination two, the feature combination four, the feature combination one and the feature combination three.

Step 2024, determining m feature combinations satisfying the reference condition from the n feature combinations after sorting.

In one possible implementation, m feature combinations that satisfy the reference condition are determined among the n feature combinations based on the n feature combinations after the ranking. The m feature combinations satisfying the reference condition may be m feature combinations with scores greater than the reference score, or may be feature combinations ranked in the top m according to the score rank, which is not limited in the embodiment of the present application.

For example, based on the scores of the n feature combinations, the feature combinations with the scores of the preceding 3 are determined, that is, feature combinations five, feature combinations two, and feature combinations four are feature combinations satisfying the reference condition.

In step 203, m clusters to be identified are obtained according to the feature data corresponding to the m feature combinations, where the m clusters to be identified correspond to the m feature combinations.

In this embodiment of the present application, according to the feature data corresponding to the m feature combinations determined in the above step 202, the objects to be identified included in each feature combination are formed into one cluster to be identified, so that m clusters to be identified can be obtained, where each cluster to be identified corresponds to one feature combination. For example, the first cluster to be identified includes a first object to be identified and a second object to be identified; the second cluster to be identified comprises a first object to be identified and a third object to be identified, and a fifth object to be identified; the cluster III to be identified comprises an object II to be identified, an object III to be identified and an object IV to be identified; the cluster IV to be identified comprises an object I to be identified and an object five to be identified; the cluster five to be identified comprises an object two to be identified, an object four to be identified and an object five to be identified.

In one possible implementation manner, in order to make the objects to be identified included in the clusters to be identified more extensive, the objects to be identified consistent with the feature data may be determined based on the feature data included in the feature combination, and the objects to be identified included in the feature combination and the objects to be identified consistent with the feature data are combined to form the clusters to be identified corresponding to the feature combination, so that the number of the objects to be identified included in the clusters to be identified is more extensive, and thus the objects to be identified included in the clusters to be identified are more extensive.

In step 204, the m clusters to be identified are clustered, and target clusters meeting preset conditions are screened.

In the embodiment of the present application, the clustering is performed on m clusters to be identified, and the screening of the target clusters meeting the preset conditions includes the following steps 2041 to 2044.

Step 2041, respectively matching a label for m clusters to be identified, where the label is used to identify the clusters to be identified.

In a possible implementation manner, the m clusters to be identified obtained in step 203 are each assigned a label, where each label identifies a corresponding cluster to be identified, and the label may be represented in a digital manner or in an alphabetical manner.

Step 2042, updating the label of the cluster to be identified according to the label of the neighbor cluster adjacent to the cluster to be identified, and obtaining the label after updating the cluster to be identified.

In one possible implementation manner, according to the label of the neighbor cluster adjacent to the cluster to be identified, the label of the cluster to be identified is updated according to the following formula (2), so as to obtain the label after the update of the cluster to be identified.

In the above formula (2), argmax is the maximum argument function, i represents the i-th cluster to be identified, j represents the neighbor cluster j, W adjacent to the i-th cluster to be identified _i,j For the weight between the ith cluster to be identified and the neighbor cluster j, the weight is the number of common objects to be identified included in the clusters to be identified and the neighbor clusters, N is the number of the neighbor clusters, A _N Is the nth neighbor cluster.

For example, taking a label of a cluster to be identified as an example, for example, a label of a cluster to be identified 1 is a, and clusters adjacent to the cluster to be identified 1 are a neighbor cluster 1, a neighbor cluster 2, a neighbor cluster 3, and a neighbor cluster 4. The label of the neighbor cluster 1 is B, the label of the neighbor cluster 2 is C, the label of the neighbor cluster 3 is D, and the label of the neighbor cluster 4 is B. The number of occurrences of the label B is the largest in the clusters adjacent to the cluster to be identified 1, so that the label of the cluster to be identified 1 is updated, and the label after the update of the cluster to be identified 1 is obtained as B.

It should be noted that the updating process of the labels of the other clusters to be identified is consistent with the updating process of the labels of the cluster to be identified 1, and will not be described herein.

Step 2043, clustering the clusters to be identified with the same labels in the labels after updating the clusters to be identified to obtain candidate clusters, wherein the candidate clusters comprise a plurality of clusters to be identified.

In the embodiment of the application, based on the label after the update of the cluster to be identified, the cluster to be identified with consistent labels is clustered to obtain a candidate cluster, wherein the candidate cluster comprises a plurality of clusters to be identified with consistent labels.

In one possible implementation, for the objects to be identified belonging to the plurality of clusters to be identified, they are grouped into the clusters containing the largest number of objects to be identified. The number of the objects to be identified included in each cluster to be identified can also be determined, and if the number of the objects to be identified included in the cluster to be identified is smaller than the reference number, the cluster to be identified can be deleted, that is, the cluster to be identified whose number does not meet the reference number is filtered. And clustering the rest clusters to be identified, thereby obtaining candidate clusters. This approach may make the resulting candidate clusters more accurate. Fig. 3 is a schematic diagram of a candidate cluster according to an embodiment of the present application, where in fig. 3, a black circle represents a cluster to be identified, a white circle represents a neighbor cluster adjacent to the cluster to be identified, and a cluster outlined by a dotted line is the candidate cluster.

And 2044, screening target clusters meeting preset conditions from the candidate clusters.

In the embodiment of the present application, the following two implementations may be used to screen target clusters that meet the preset condition from the candidate clusters.

According to the first implementation mode, the candidate clusters with risk scores meeting preset conditions are determined to be target clusters based on the risk scores corresponding to the candidate clusters.

In one possible implementation, a tag of at least one history cluster is obtained, and the initial risk calculation model is trained according to the tag of the at least one history cluster, so as to obtain a target risk calculation model.

In one possible implementation manner, the label of the candidate cluster is input into a target risk calculation model, and the risk score of the candidate cluster is calculated through the target risk calculation model, so that the risk score of the candidate cluster is obtained. Responding to the risk scores of the candidate clusters to meet preset conditions, and determining the candidate clusters as target clusters; and determining the candidate cluster as a common cluster in response to the risk score of the candidate cluster not meeting the preset condition. The preset conditions can be set based on experience, and can be adjusted based on different application scenes, and the embodiment of the application does not limit the content and the setting time of the preset conditions.

For example, if the risk score corresponding to the preset condition is 0.80 and the risk score of the candidate cluster is 0.85, the candidate cluster is determined to be the target cluster, and if the risk score of the candidate cluster is 0.75, the candidate cluster is determined to be the normal cluster.

And in the second implementation mode, determining whether the candidate cluster is a target cluster or not based on the relative entropy of the candidate cluster.

In one possible implementation, the relative entropy includes a discrete relative entropy and a continuous relative entropy, where the discrete relative entropy is used to represent the external variability of the candidate cluster, and the continuous relative entropy is used to represent the internal aggregation of the candidate cluster, where a common cluster may have a high internal aggregation and a low external variability. And determining the candidate cluster as a target cluster when the discrete relative entropy of the candidate cluster meets the first reference relative entropy and the continuous relative entropy meets the second reference relative entropy.

In one possible implementation manner, after the candidate cluster is determined as the target cluster, reasonable explanation can be made for the candidate cluster as the target cluster according to the similarity between the feature data of the objects to be identified included in the candidate cluster. For example, the IP address of 100% of the objects to be identified in the candidate cluster is "222.32.60.147", the device type of 100% of the objects to be identified is "43", and the name of 100% of the objects to be identified is "x", so that high consistency and strong relevance of all the objects to be identified in the candidate cluster can be represented, and thus reasonable explanation can be made for the candidate cluster as the target cluster.

Fig. 4 is a flowchart of a user data processing method provided in an embodiment of the present application, where fig. 4 includes a feature data module, a first-layer clustering module, a second-layer clustering module, a screening module, and a decision interpretation module. Performing one-layer clustering based on characteristic data of an object to be identified, so as to obtain at least one cluster to be identified, performing two-layer clustering based on the at least one cluster to be identified, wherein the two-layer clustering comprises weighted label propagation clustering and small cluster filtering, so as to obtain candidate clusters, screening based on the candidate clusters, determining cluster characteristics of the candidate clusters based on KL (Kullback-Leibler) divergence, namely discrete relative entropy and continuous relative entropy of the candidate clusters, and screening target clusters in the candidate clusters based on the discrete relative entropy and continuous relative entropy. The risk scores for the candidate clusters may also be calculated based on a supervised learning model, such that target clusters included in the candidate clusters may be determined, e.g., for rogue cluster identification, as if they were identified rogue clusters.

When the method is used for processing the user data, the feature data of the object to be identified is considered, the feature combination is determined based on the feature data of the object to be identified, and the cluster to be identified is obtained based on the feature combination, so that the determination of the cluster to be identified is more accurate. Clustering is carried out on the clusters to be identified, and target clusters meeting preset conditions are screened, so that the determination of the target clusters is more accurate, and the accuracy and reliability of user data processing can be improved.

Fig. 5 is a schematic structural diagram of a user data processing apparatus according to an embodiment of the present application, where, as shown in fig. 5, the apparatus includes:

an obtaining module 501, configured to obtain feature data of at least one object to be identified, where the feature data includes at least one of environmental data, registration data, device data, and historical behavior data of the object to be identified;

a combination module 502, configured to combine the feature data of the at least one object to be identified to obtain m feature combinations that satisfy a reference condition, where m is an integer greater than or equal to 1;

a determining module 503, configured to obtain m clusters to be identified according to feature data corresponding to the m feature combinations, where the m clusters to be identified correspond to the m feature combinations;

And the screening module 504 is configured to cluster the m clusters to be identified, and screen target clusters that meet a preset condition.

In a possible implementation manner, the combination module 502 is configured to perform free combination on feature data of the at least one object to be identified to obtain n feature combinations, where each feature combination includes k feature data, where n is an integer greater than m, and k is an integer greater than or equal to 1;

In a possible implementation manner, the screening module 504 is configured to match a label to each of the m clusters to be identified, where the label is used to identify the cluster to be identified;

In a possible implementation manner, the filtering module 504 is configured to update the label of the to-be-identified cluster according to the following formula according to the label of the neighbor cluster adjacent to the to-be-identified cluster, to obtain the label after updating the to-be-identified cluster:

In one possible implementation, the screening module 504 is configured to determine, based on the label of the candidate cluster, a risk score corresponding to the candidate cluster;

In one possible implementation, the filtering module 504 is configured to calculate a relative entropy of the candidate cluster, where the relative entropy includes a discrete relative entropy and a continuous relative entropy, where the discrete relative entropy is used to represent an external variability of the candidate cluster, and the continuous relative entropy is used to represent an internal aggregation of the candidate cluster;

In a possible implementation manner, the filtering module 504 is configured to input the label of the candidate cluster into a target risk calculation model, calculate a risk score of the candidate cluster through the target risk calculation model, and obtain the risk score of the candidate cluster.

In a possible implementation manner, the obtaining module 501 is further configured to obtain a tag of at least one history cluster;

the apparatus further comprises:

When the device processes the user data, the characteristic data of the object to be identified is considered, the characteristic combination is determined based on the characteristic data of the object to be identified, and the cluster to be identified is obtained based on the characteristic combination, so that the determination of the cluster to be identified is more accurate. Clustering is carried out on the clusters to be identified, and target clusters meeting preset conditions are screened, so that the determination of the target clusters is more accurate, and the accuracy and reliability of user data processing can be improved.

It should be noted that: in the user data processing device provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the user data processing device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the user data processing device and the user data processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the user data processing device and the user data processing method are detailed in the method embodiments and are not repeated herein.

Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application. The server 600 may include one or more processors (Central Processing Units, CPU) 601 and one or more memories 602, where the one or more memories 602 store at least one instruction that is loaded and executed by the one or more processors 601 to implement the user data processing method provided in the above method embodiments. Of course, the server 600 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 700 may be: a smart phone, a tablet, an MP3 (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook or a desktop. Electronic device 700 may also be referred to by other names of user devices, portable electronic devices, laptop electronic devices, desktop electronic devices, and the like.

In general, the electronic device 700 includes: one or more processors 701, and one or more memories 702.

Processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 701 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 701 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 701 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement the user data processing methods provided by the method embodiments herein.

In some embodiments, the electronic device 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, a display 705, a camera assembly 706, audio circuitry 707, a positioning assembly 708, and a power supply 709.

A peripheral interface 703 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 701 and memory 702. In some embodiments, the processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 704 is configured to receive and transmit RF (Radio Frequency) signals, also referred to as electromagnetic signals. The radio frequency circuitry 704 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 704 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 704 may communicate with other electronic devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 704 may also include NFC (Near Field Communication ) related circuitry, which is not limited in this application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 705 is a touch display, the display 705 also has the ability to collect touch signals at or above the surface of the display 705. The touch signal may be input to the processor 701 as a control signal for processing. At this time, the display 705 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 705 may be one, providing a front panel of the electronic device 700; in other embodiments, the display 705 may be at least two, respectively disposed on different surfaces of the electronic device 700 or in a folded design; in some embodiments, the display 705 may be a flexible display disposed on a curved surface or a folded surface of the electronic device 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 705 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 706 is used to capture images or video. Optionally, the camera assembly 706 includes a front camera and a rear camera. In general, a front camera is disposed on a front panel of an electronic device, and a rear camera is disposed on a rear surface of the electronic device. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing, or inputting the electric signals to the radio frequency circuit 704 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple, and disposed at different locations of the electronic device 700. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 707 may also include a headphone jack.

The location component 708 is operative to locate a current geographic location of the electronic device 700 for navigation or LBS (Location Based Service, location-based services). The positioning component 708 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.

The power supply 709 is used to power the various components in the electronic device 700. The power supply 709 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 700 further includes one or more sensors 170. The one or more sensors 170 include, but are not limited to: acceleration sensor 711, gyroscope sensor 712, pressure sensor 711, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the electronic device 700. For example, the acceleration sensor 711 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display a user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 711. The acceleration sensor 711 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the electronic device 700, and the gyro sensor 712 may collect a 3D motion of the user on the electronic device 700 in cooperation with the acceleration sensor 711. The processor 701 may implement the following functions based on the data collected by the gyro sensor 712: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 711 may be disposed at a side frame of the electronic device 700 and/or at an underlying layer of the display screen 705. When the pressure sensor 711 is disposed on a side frame of the electronic device 700, a grip signal of the user on the electronic device 700 may be detected, and the processor 701 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 711. When the pressure sensor 711 is disposed at the lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 714 is used to collect a fingerprint of the user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 714 may be provided on the front, back, or side of the electronic device 700. When a physical key or vendor Logo is provided on the electronic device 700, the fingerprint sensor 714 may be integrated with the physical key or vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 705 is turned up; when the ambient light intensity is low, the display brightness of the display screen 705 is turned down. In another embodiment, the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 706 based on the ambient light intensity collected by the optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically provided on the front panel of the electronic device 700. The proximity sensor 716 is used to capture the distance between the user and the front of the electronic device 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front of the electronic device 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the off screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the electronic device 700 gradually increases, the processor 701 controls the display screen 705 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the electronic device 700 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In an exemplary embodiment, there is also provided a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement any of the above-described user data processing methods.

Alternatively, the above-mentioned computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Read-Only optical disk (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The foregoing is illustrative of the present application and is not to be construed as limiting thereof, but rather as providing for the use of various modifications, equivalents, improvements or alternatives falling within the spirit and principles of the present application.

Claims

1. A method of user data processing, the method comprising:

acquiring characteristic data of at least one object to be identified, wherein the characteristic data comprises at least one of environment data, registration data, equipment data and historical behavior data of the object to be identified;

combining the feature data of the at least one object to be identified to obtain m feature combinations meeting a reference condition, wherein m is an integer greater than or equal to 1;

clustering the m clusters to be identified, and screening target clusters meeting preset conditions;

the clustering of the m clusters to be identified, and screening of target clusters meeting preset conditions comprise:

updating the label of the cluster to be identified according to the label of the neighbor cluster adjacent to the cluster to be identified, so as to obtain the label after the updating of the cluster to be identified;

clustering the clusters to be identified with the same labels in the labels after updating the clusters to be identified to obtain candidate clusters, wherein the candidate clusters comprise a plurality of clusters to be identified;

Screening target clusters meeting preset conditions from the candidate clusters;

updating the label of the cluster to be identified according to the label of the neighbor cluster adjacent to the cluster to be identified, to obtain the label after the update of the cluster to be identified, comprising:

updating the label of the cluster to be identified according to the label of the neighbor cluster adjacent to the cluster to be identified according to the following formula, and obtaining the label after updating the cluster to be identified:

wherein argmax is a maximum argument function, i represents an ith cluster to be identified, j represents a neighbor cluster j adjacent to the ith cluster to be identified, and W _i,j The weight between the ith cluster to be identified and the neighbor cluster j is the number of common objects to be identified, which are included in the clusters to be identified and the neighbor clusters, N is the number of neighbor clusters, A is _N Is the nth neighbor cluster.

2. The method according to claim 1, wherein combining the feature data of the at least one object to be identified to obtain m feature combinations satisfying a reference condition comprises:

and determining m feature combinations meeting a reference condition from the n feature combinations after sequencing.

3. The method of claim 1, wherein the screening target clusters that meet a preset condition from the candidate clusters includes:

and determining the candidate cluster as a target cluster in response to the risk score of the candidate cluster accords with a preset condition.

4. The method of claim 1, wherein the screening target clusters that meet a preset condition from the candidate clusters includes:

calculating the relative entropy of the candidate cluster, wherein the relative entropy comprises discrete relative entropy and continuous relative entropy, the discrete relative entropy is used for representing the external difference of the candidate cluster, and the continuous relative entropy is used for representing the internal aggregation of the candidate cluster;

5. The method of claim 3, wherein the determining the risk score corresponding to the candidate cluster based on the label of the candidate cluster comprises:

6. The method of claim 5, wherein prior to entering the label of the candidate cluster into a target risk calculation model, the method further comprises:

acquiring a label of at least one history cluster;

7. The method according to any one of claims 1-6, wherein the environmental data comprises at least one of an IP address and geographic location data where the object to be identified is located; the registration data comprises personal information filled in by the object to be identified during registration; the device data comprises a device type used by the object to be identified, and the historical behavior data comprises historical browsing, purchasing and comment behaviors of the object to be identified.

8. A server comprising a processor and a memory, wherein the memory has stored therein at least one program code that is loaded and executed by the processor to implement the user data processing method of any of claims 1 to 7.

9. A computer readable storage medium having stored therein at least one program code, the at least one program code being loaded and executed by a processor to implement the user data processing method of any of claims 1 to 7.