CN113553369A

CN113553369A - Visual user classification method, service method, system, device and storage medium

Info

Publication number: CN113553369A
Application number: CN202010339657.6A
Authority: CN
Inventors: 孙娇; 李茵; 陈天佳; 李智慧; 刘昕; 黄铃; 时磊; 徐葳
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2021-10-26
Anticipated expiration: 2040-04-26

Abstract

The visual user classification method, the service method, the system, the device and the storage medium obtain visual data according to input information and user group data of a user group; the input information is used for setting a fraud detection algorithm, algorithm parameters and at least one user behavior characteristic set for processing user group data; wherein the visualization data for display comprises: reflecting a distribution view formed by a user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view so as to perform visual output. The scheme of the application can intuitively and accurately show the synchronization of the user on different behaviors according to the behavior similarity of the different user behaviors, and is favorable for quickly and accurately analyzing the fraud behaviors or evaluating the quality of fraud detection.

Description

Visual user classification method, service method, system, device and storage medium

Technical Field

The present application relates to the field of graphical display technologies, and in particular, to a visual user classification method, a service method, a system, an apparatus, and a storage medium.

Background

Today, many online services are flooded with fraudulent activities, such as false accounts on forums and video websites and bots on social networks. Fraudulent activities can act to the detriment of the commercial value of online services.

Accordingly, online fraud detection techniques are needed to detect fraud problems in online services. Over the years, researchers have proposed many fraud detection algorithms, particularly unsupervised learning methods, based on fraud detection based on log records from the user's online behavior. However, designing and evaluating these algorithms is challenging: 1) the log records contain many dimensions describing the user's behavior, and it is difficult to select the dimension most relevant to fraudulent behavior; 2) the choice of user data and algorithms depends largely on the domain and scenario; 3) little or no fraud tags are used for training or evaluation, and fraud can only be confirmed if the user is compromised by fraud after a long period of time.

The elimination of false positives is critical to the success of the fraud detection process. Excluding false positives often requires the involvement of analysts, and thus visualization is an essential component of any successful fraud detection system.

However, since there are various aspects to fraud analysis, such as whether a fraud group exists, how the fraud group is distributed, the relationship characteristics between individuals in the fraud group, and so on, the visualization result of the fraud detection process needs to reflect at least some of the characteristics.

In addition, in other needs, the visual content of the fraud detection process also needs to reflect the effect of fraud detection, whether there is a clue for false alarm, etc., which also puts higher demands.

Therefore, how to provide a visualization scheme for fraud detection results to improve the accuracy of fraud detection has become an urgent technical problem in the industry.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, it is an object of the present application to provide a visual user classification method, a service method, a system, an apparatus, and a storage medium, which overcome various deficiencies of the prior art.

To achieve the above and other related objects, a first aspect of the present application provides a visual user classification method, including: acquiring input information; acquiring visual data obtained according to the input information and user group data of a user group; wherein the input information is used for setting a fraud detection algorithm, algorithm parameters and at least one user behavior feature set for processing the user group data to determine suspected fraudulent users; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view; and performing visual output according to the visual data.

In certain embodiments of the first aspect of the present application, the distribution view comprises any one or more of: 1) a first user distribution view reflecting that the user group is mapped based on the behavior similarity on the user behavior feature set 2) a second user distribution view reflecting that suspected fraudulent users in the user group are mapped based on the similarity on at least part of the user behavior features in the user behavior feature set; 3) a third user distribution view formed by grouping and mapping the behavior similarity of each suspected fraudulent user on at least one user behavior feature in the user behavior feature set, wherein each suspected fraudulent user group is displayed in a distinguishing way; 4) and reflecting a fourth user distribution view mapped by each member in the suspected fraud user group based on the behavior similarity on the original value of the user behavior feature set.

In certain embodiments of the first aspect of the present application, the behavioral similarity is measured based on a weighted result of behavioral similarities of a plurality of user behavioral features in the set of user behavioral features.

In certain embodiments of the first aspect of the present application, the similarity of behavior of each two users on each user behavior feature is related to: the user behavior characteristics obtained based on the user group data statistics are first probability distributions of various values and relative entropies between the first probability distributions and second probability distributions when one value is collided on the user behavior characteristics of the two users; or; a relative entropy sum related to a plurality of relative entropies corresponding at a plurality of said value collisions; wherein, the larger the relative entropy or the relative entropy sum is, the lower the behavior similarity between the two users is.

In certain embodiments of the first aspect of the present application, the mapping distance is a mapping result of a collision distance; the size of the collision distance is inversely related to the degree of the behavior similarity.

In certain embodiments of the first aspect of the present application, the collision distance is scaled up in negative correlation with the similarity of behavior.

In certain embodiments of the first aspect of the present application, the suspected fraudulent users in the second distribution view have the same display characteristics.

In certain embodiments of the first aspect of the present application, the third distribution view is distinguished from different groups of suspected fraudulent users by different display characteristics.

In certain embodiments of the first aspect of the present application, the display features comprise: one or more combinations of size, color, texture, gray scale, brightness, and numbering.

In certain embodiments of the first aspect of the present application, the display characteristic corresponding to each suspected fraudulent user group is determined based on a predominant number of display characteristics of its members.

In certain embodiments of the first aspect of the present application, the second and third distribution views are respectively presented on graphics pages that can be switched to display with each other; and/or, the third distribution views formed corresponding to different third user behavior feature sets are respectively presented on the graphic pages which can be mutually switched and displayed.

In certain embodiments of the first aspect of the present application, the visualization data is derived from low-dimensional data obtained by dimension reduction processing of the user population data.

In certain embodiments of the first aspect of the present application, the user distribution of at least one of the first, second, and third distribution views follows an estimated distribution; and the estimated distribution is obtained by performing kernel density estimation according to the original user distribution obtained by the behavior similarity.

In some embodiments of the first aspect of the present application, each user behavior feature set and/or the selected subset is obtained according to the importance.

In certain embodiments of the first aspect of the present application, the importance of each user behavior feature is determined by: the average information entropy of the user behavior feature in each suspected fraud user group and/or the average relative entropy of the value distribution of the user behavior feature of the user group relative to each relative entropy of the value distribution of the user behavior feature of each suspected fraud user group; wherein the lower the average information entropy or the higher the average relative entropy, the higher the importance.

In certain embodiments of the first aspect of the present application, the visual output is used as a reference for adjusting one or more of the fraud detection algorithm, algorithm parameters and at least one set of user behavior characteristics.

In certain embodiments of the first aspect of the present application, the visual output is used as a reference for adjusting one or more of a fraud detection algorithm, an algorithm parameter, and at least one user behavior feature set, and includes any one or more of the following: 1) taking the difference between the second distribution view and the first distribution view as a reference basis, and indicating to adjust the user behavior characteristics in the user behavior characteristic set according to the importance; 2) taking the overall mixed situation caused by different suspected fraud user groups represented by different display characteristics in the third distribution view as a reference basis, and indicating whether to reduce the user behavior characteristics with lower importance in the user behavior characteristic set or reduce the weight values of the user behavior characteristics with lower importance in the algorithm parameters; 3) the number of different suspected fraud user groups represented by different display characteristics in the third distribution view is taken as a reference basis for indicating whether to add user behavior characteristics with higher importance degree in the user behavior characteristic set; 4) taking the mixed situation among different suspected fraudulent user groups represented by different display characteristics in at least one local area of the third distribution view as a reference basis, wherein the mixed situation is used for indicating whether to adjust a member threshold condition for screening the displayed suspected fraudulent user group or an edge threshold condition for dividing the edges of the suspected fraudulent user group according to the strength of the relationship among the suspected fraudulent users; 5) and taking the dense user distribution condition shown in the fourth distribution view as a quality evaluation basis for the suspected fraud user group, and indicating whether to adjust the user behavior characteristics in the user characteristic set.

In certain embodiments of the first aspect of the present application, the visual user classification method includes: and differentially displaying each suspected fraudulent user which does not belong to the suspected fraudulent user group and is in the distribution set in the fourth distribution view for analysis.

In certain embodiments of the first aspect of the present application, the set of user behavior characteristics comprises a plurality of categories of user behavior characteristics.

In certain embodiments of the first aspect of the present application, the suspected fraudulent user in the user group is detected by the fraud detection algorithm according to the at least part of the user behavior characteristics.

In certain embodiments of the first aspect of the present application, the user population data is about an e-commerce website, and the classification of the user behavior characteristics comprises: one or more of time-related, IP address-related, and phone number-related.

In certain embodiments of the first aspect of the present application, the user group data pertains to social networking sites, and the classification of the user behavior characteristics comprises: one or more of time-related, IP address-related, source user-related, target user-related, and event-related.

To achieve the above and other related objects, a second aspect of the present application provides a visual user classification system, comprising: the input module is used for acquiring input information; the processing module is used for acquiring visual data obtained according to the input information and user group data of the user group; wherein the input information is used for setting a fraud detection algorithm, algorithm parameters and at least one user behavior feature set for processing the user group data to determine suspected fraudulent users; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view; and the output module is used for carrying out visual output according to the visual data.

In certain embodiments of the second aspect of the present application, the distribution map comprises any one or more of: 1) a first user distribution view reflecting that the user group is mapped based on the behavior similarity on the user behavior feature set 2) a second user distribution view reflecting that suspected fraudulent users in the user group are mapped based on the similarity on at least part of the user behavior features in the user behavior feature set; 3) a third user distribution view formed by grouping and mapping the behavior similarity of each suspected fraudulent user on at least one user behavior feature in the user behavior feature set, wherein each suspected fraudulent user group is displayed in a distinguishing way; 4) and reflecting a fourth user distribution view mapped by each member in the suspected fraud user group based on the behavior similarity on the original value of the user behavior feature set.

In certain embodiments of the second aspect of the present application, the behavioral similarity is measured based on a weighted result of the behavioral similarities of the plurality of user behavioral features in the set of user behavioral features.

In some embodiments of the second aspect of the present application, the similarity of behavior of each two users on each user behavior feature is related to: the user behavior characteristics obtained based on the user group data statistics are first probability distributions of various values and relative entropies between the first probability distributions and second probability distributions when one value is collided on the user behavior characteristics of the two users; or; a relative entropy sum related to a plurality of relative entropies corresponding at a plurality of said value collisions; wherein, the larger the relative entropy or the relative entropy sum is, the lower the behavior similarity between the two users is.

In certain embodiments of the second aspect of the present application, the mapping distance is a mapping result of a collision distance; the size of the collision distance is inversely related to the degree of the behavior similarity.

In certain embodiments of the second aspect of the present application, the collision distance is scaled up in negative correlation with the similarity of behavior.

In certain embodiments of the second aspect of the present application, the suspected fraudulent user in the second distribution view has the same display characteristics.

In certain embodiments of the second aspect of the present application, the third distribution view is distinguished from different groups of suspected fraudulent users by different display characteristics.

In certain embodiments of the second aspect of the present application, the display features comprise: one or more combinations of size, color, texture, gray scale, brightness, and numbering.

In certain embodiments of the second aspect of the present application, the display characteristic associated with each suspected fraudulent user group is determined based on a predominant number of display characteristics of its members.

In some embodiments of the second aspect of the present application, the second and third distribution views are respectively presented on graphics pages that can be switched to display with each other; and/or, the third distribution views formed corresponding to different third user behavior feature sets are respectively presented on the graphic pages which can be mutually switched and displayed.

In certain embodiments of the second aspect of the present application, the visualization data is derived from low-dimensional data obtained by dimension reduction processing of the user population data.

In certain embodiments of the second aspect of the present application, the user distribution of at least one of the first, second, and third distribution views follows an estimated distribution; and the estimated distribution is obtained by performing kernel density estimation according to the original user distribution obtained by the behavior similarity.

In some embodiments of the second aspect of the present application, each user behavior feature set and/or the selected subset is obtained according to the importance degree.

In certain embodiments of the second aspect of the present application, the importance of each user behavior feature is given by: the average information entropy of the user behavior feature in each suspected fraud user group and/or the average relative entropy of the value distribution of the user behavior feature of the user group relative to each relative entropy of the value distribution of the user behavior feature of each suspected fraud user group; wherein the lower the average information entropy or the higher the average relative entropy, the higher the importance.

In certain embodiments of the second aspect of the present application, the visual output is used as a reference for adjusting one or more of the fraud detection algorithm, algorithm parameters and at least one set of user behavior characteristics.

In certain embodiments of the second aspect of the present application, the visual output is used as a reference for adjusting one or more of a fraud detection algorithm, an algorithm parameter, and at least one user behavior feature set, including any one or more of the following: 1) taking the difference between the second distribution view and the first distribution view as a reference basis, and indicating to adjust the user behavior characteristics in the user behavior characteristic set according to the importance; 2) taking the overall mixed situation caused by different suspected fraud user groups represented by different display characteristics in the third distribution view as a reference basis, and indicating whether to reduce the user behavior characteristics with lower importance in the user behavior characteristic set or reduce the weight values of the user behavior characteristics with lower importance in the algorithm parameters; 3) the number of different suspected fraud user groups represented by different display characteristics in the third distribution view is taken as a reference basis for indicating whether to add user behavior characteristics with higher importance degree in the user behavior characteristic set; 4) taking the mixed situation among different suspected fraudulent user groups represented by different display characteristics in at least one local area of the third distribution view as a reference basis, wherein the mixed situation is used for indicating whether to adjust a member threshold condition for screening the displayed suspected fraudulent user group or an edge threshold condition for dividing the edges of the suspected fraudulent user group according to the strength of the relationship among the suspected fraudulent users; 5) and taking the dense user distribution condition shown in the fourth distribution view as a quality evaluation basis for the suspected fraud user group, and indicating whether to adjust the user behavior characteristics in the user characteristic set.

In certain embodiments of the second aspect of the present application, the visual user classification method includes: and differentially displaying each suspected fraudulent user which does not belong to the suspected fraudulent user group and is in the distribution set in the fourth distribution view for analysis.

In certain embodiments of the second aspect of the present application, the set of user behavior characteristics comprises a plurality of classifications of user behavior characteristics.

In certain embodiments of the second aspect of the present application, the user population data relates to e-commerce websites, and the classification of the user behavior characteristics comprises: one or more of time-related, IP address-related, and phone number-related.

In certain embodiments of the second aspect of the present application, the user group data pertains to social networking sites, and the classification of the user behavior characteristics comprises: one or more of time-related, IP address-related, source user-related, target user-related, and event-related.

To achieve the above and other related objects, a third aspect of the present application provides a visual data service method, including: acquiring a fraud detection algorithm, algorithm parameters and at least one user behavior feature set determined according to input information; processing user group data of a user group according to a fraud detection algorithm, algorithm parameters and at least one user behavior feature set to generate a user classification result or visual data; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view; outputting the user classification result or visualization data; wherein the outputted user classification result is used for externally generating the visualization data.

To achieve the above and other related objects, a fourth aspect of the present application provides a visual data service system, comprising: the setting module is used for acquiring a fraud detection algorithm, algorithm parameters and at least one user behavior characteristic set which are determined according to input information; the data processing module is used for processing user group data of a user group according to a fraud detection algorithm, algorithm parameters and at least one user behavior characteristic set so as to generate a user classification result or visual data; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view; the output module is used for outputting the user classification result or the visualized data; wherein the outputted user classification result is used for externally generating the visualization data.

To achieve the above and other related objects, a fifth aspect of the present application provides a computer apparatus comprising: a storage device storing at least one computer program; and the processing device is used for running the computer program to execute and realize the visual user classification method.

To achieve the above and other related objects, a sixth aspect of the present application provides a service apparatus comprising: communication means for communicating with the outside; a storage device storing at least one computer program; and the processing device is used for running the computer program to execute and realize the visual data service method.

To achieve the above and other related objects, a seventh aspect of the present application provides a computer-readable storage medium storing at least one computer program, which executes and implements the visual user classification method or the visual data service method when being invoked.

As described above, the visual user classification method, the service method, the system, the apparatus, and the storage medium of the present application acquire input information; acquiring visual data obtained according to the input information and user group data of a user group; wherein the input information is used for setting a fraud detection algorithm, algorithm parameters and at least one user behavior feature set for processing the user group data to determine suspected fraudulent users; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view; and performing visual output according to the visual data. According to the scheme, the synchronization of the user on different behaviors can be intuitively and accurately shown according to the behavior similarity of the different user behaviors, and the method and the device are favorable for quickly and accurately analyzing the fraud behaviors or evaluating the quality of fraud detection.

Drawings

Fig. 1 is a flowchart illustrating a visual user classification method according to an embodiment of the present application.

Fig. 2 is a display diagram illustrating a first distribution view in an embodiment of the present application.

Fig. 3 is a display diagram illustrating a second distribution view according to an embodiment of the present application.

Fig. 4 is a display diagram illustrating a third distribution view in the embodiment of the present application.

Fig. 5 is a display diagram showing a fourth distribution view in the embodiment of the present application.

Fig. 6 is a display diagram of a human-computer interaction interface in an embodiment of the present application.

Fig. 7A is a comparison diagram of a third distribution view and a fourth distribution view showing poor packet quality in the embodiment of the present application.

Fig. 7B is a schematic diagram showing a comparison of the third distribution view and the fourth distribution view, which are excellent in the grouping quality in the embodiment of the present application.

Fig. 8 is a schematic structural diagram of a communication system in an embodiment of the present application.

Fig. 9 is a flowchart illustrating a visualization data service method according to an embodiment of the present application.

Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Fig. 11 is a schematic structural diagram of a service apparatus in an embodiment of the present application.

Fig. 12 is a schematic block diagram illustrating a system for visually classifying users in an embodiment of the present application.

Fig. 13 is a schematic block diagram of a visualization data service system in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application is provided for illustrative purposes, and other advantages and capabilities of the present application will become apparent to those skilled in the art from the present disclosure.

In the following description, reference is made to the accompanying drawings that describe several embodiments of the application. It is to be understood that other embodiments may be utilized and that changes in the module or unit composition, electrical, and operation may be made without departing from the spirit and scope of the present disclosure. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

Although the terms first, second, etc. may be used herein to describe various elements, information, or parameters in some instances, these elements or parameters should not be limited by these terms. These terms are only used to distinguish one element or parameter from another element or parameter. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the various described embodiments. Both the first and second elements are described as one element, but they are not the same element unless the context clearly dictates otherwise. Depending on context, for example, the word "if" as used herein may be interpreted as "at … …" or "at … …".

Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

Those of ordinary skill in the art will appreciate that the various illustrative modules and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Online fraud is related to finance, such as telecom fraud, phishing websites. Online fraud is often conducted on accounts collected at the target web site, which are often discarded after the fraud is completed, and may also be referred to as disposable accounts. Fraudsters use a large number of low-cost mobile devices with phone cards (SIM cards), such as cell phones, to hack into websites by, for example, registering accounts to conduct fraud. These one-time accounts may have unusual similar behavior in some respects compared to legitimate accounts, such as repeatedly used phone numbers, similar phone access durations, highly repetitive IP segments and activity frequencies, etc.

Therefore, the fraud, even the fraud group existing in the accounts can be effectively found by analyzing the behavior similarity of the user behaviors among the accounts. However, it is not trivial to classify these user behaviors.

Some unsupervised classification algorithms exist in the prior art, and may be applied to the present scenario to classify the user behavior.

However, fraud detection scenarios are different from most everyday scenarios where various machine learning models can be easily tried to solve the problem, as compared to more complex fraud detection scenarios. In addition, the existing algorithms usually do not provide an end-to-end visual interactive interface for analysts, which makes deployment of the algorithms more difficult, and analysts cannot adjust algorithms, parameters, behaviors, and the like used for analysis.

Algorithms exist in the prior art for implementing visual interaction, such as Google Vizier using parallel coordinates to analyze the searched model, ATMSeer using visualization to assist in the automated machine learning process, Google facts visualization of the source code of the machine learning data set to help understand and analyze the source code; the AutoAIViz visualizes the AI model generation process to improve interpretability. However, they cannot be directly applied to fraud detection, because fraud detection involves highly dimensional user behavior data, is difficult to visualize directly, and even visualization may not be conducive to fraud detection analysis.

The application provides a visual scheme design for fraud detection, and detection tasks to be achieved during design are as follows:

1. fraud detection overview: is the distribution of rogue users in the high-dimensional feature space? Is the distribution of rogue users similar to the distribution of all users? This will help the analyst to understand the quality of the fraud detection algorithm.

2. And (3) quality evaluation: is the fraud detection result correct? Is the grouping for a rogue user accurate? This will help the analyst to better understand the quality of the fraud detection algorithm.

3. Analyzing the importance of the user behavior characteristics: which features in the user's behavior contribute most to fraud detection results? -to assist the analyst in selecting more powerful user behavior features for fraud identification.

4. Detailed information of suspected rogue user groups: how many users are in each suspected fraud user group? How does their user behavior characteristics value-wise distribute? What are some specific fraudulent user behavior characteristics valued? This will help the analyst to go deep into the information of each suspected fraudulent user individual.

5. The common user behavior characteristics of suspected fraud user groups are as follows: is a fraud pattern common to users in a suspected fraud user group? How does this pattern characterize? This will help the analysts to further make assumptions about the collective behaviour of the suspected fraudulent user group. The analyst can verify and learn this assumption as domain knowledge to better understand the advantages and disadvantages of fraud detection algorithms.

6. Visual interaction: how does the fraud detection result show up given the user behavior characteristics? How would a parameter be adjusted? This will help the analyst to adjust the inputs to get a visualization of the desired result.

7. And (3) error elimination: is the fraud detection algorithm used in the discretion of the particular user? This will help the analyst to focus on a particular user to mark it out before the fraud detection result is output.

Based on the foregoing problems, embodiments of the present application provide a visual user classification method, a service method, a system, an apparatus, and a storage medium, and each related embodiment will be shown in the following with reference to the accompanying drawings.

As shown in fig. 1, a flowchart of a visual user classification method in the embodiment of the present application is shown. The method comprises the following steps:

step S101: input information is acquired.

In some embodiments, the input information may be input to the electronic device by a user through an input device, such as a keyboard, a mouse, a microphone, a touch screen, etc., to convert human operation into input information in the form of electrical signals recognizable by the electronic device.

In one example, information input may be achieved by displaying a human-machine interaction graphical interface (GUI) on a display, the GUI including graphical controls that can receive input information, and a user operating on the GUI through an input device such as a keyboard, mouse, microphone, or a touch screen that the display may have, including but not limited to direct entry of information, selection of information, etc.

Wherein the display is functionally enabled by a graphics module in the electronic device and a controller displaying the same, the graphics module including various known software components for rendering and displaying graphics on the touch screen. Note that the term "graphic" includes any object that may be displayed to a user, including but not limited to text, web pages, icons (e.g., user interface objects including soft keys), digital images, videos, animations and the like. The display screen is, for example, a touch screen, and provides both an output interface and an input interface between the device and the user. The touch screen controller receives/sends electrical signals from/to the touch screen. The touch screen then displays visual output to the user. This visual output may include text, graphics, video, and any combination thereof.

In one example, the electronic device receives input information in the form of user speech through a sound pick-up (e.g., a microphone), and the information is converted into an electrical signal capable of being recognized by a machine through a speech recognition computer program to complete information input, while the electronic device can also convert output information into speech through a speech conversion computer program and play the speech to the user through a sound player.

In the above embodiments, the electronic device is, for example, an electronic device loaded with an APP application computer program or having a web/website access capability, and includes components such as a memory, a memory controller, one or more processing units (CPUs), a peripheral interface, an RF circuit, an audio circuit, a speaker, a microphone, an input/output (I/O) subsystem, a display screen, other output or control devices, and an external port, which communicate via one or more communication buses or signal lines. The electronic device includes, but is not limited to, personal computers such as desktop computers, notebook computers, tablet computers, smart phones, smart televisions, and the like. The electronic device can also be an electronic device consisting of a host with a plurality of virtual machines and a human-computer interaction device (such as a touch display screen, a keyboard and a mouse) corresponding to each virtual machine.

The input information is used for setting a fraud detection algorithm, algorithm parameters and at least one user behavior feature set for processing the user group data to determine suspected fraudulent users, wherein the user group data includes data of each user in different dimensions, such as IP addresses, telephone numbers and the like; therefore, the value of each user behavior feature in the user behavior feature set can be extracted from the user group data, for example, the IP address is 123.xxx.xxx.1, the telephone number is 021-. It should be noted that the fraud detection principle mentioned in the embodiment of the present application is based on the synchronization and similar characteristics of the suspected fraud group on some user behaviors, so that the fraud detection algorithm is used to classify users based on user behavior characteristics of different dimensions, so as to distinguish suspected fraudulent users.

In some embodiments, the fraud detection algorithm is, for example, Crosspot or D-Spot, etc.; of course, other classification algorithms may be used in this application, and are not limited to this example. In some embodiments, the algorithm parameters include: such as a weight for each user behavior characteristic, etc.

In some embodiments, each of the user behavior feature sets may include one or more user behavior features. The user behavior characteristics which are relatively related to the characteristics of the fraudulent conduct can more accurately locate the suspected fraudulent user, and the characteristics of the fraudulent conduct are related to the application scenes of online fraud, such as e-commerce websites, social networking websites and the like.

In some embodiments, the user population data may be about an e-commerce website, such as based on a domestic e-commerce website: taobao, Tianmao mall, Jingdong mall, Suningyi, Wei-Hui, etc.; importing a cross-border e-commerce website: world cat international, global purchase in the Jingdong, cyberkia, world specials of the Wei-Shi Hui, etc.; exporting the cross-border e-commerce website: fast selling, export e-commerce platform under the ali flag, cross border traffic (easy to buy around the world), proud e-commerce and the like; foreign e-commerce websites: amazon, eBay, groupon, paytm, newegg, other e-commerce websites, or phishing websites constructed based on these websites, etc.

The classification of the user behavior characteristics that are more relevant to fraud at e-commerce websites includes: one or more of time-related, IP address-related, and phone number-related.

For example, the time-dependent user behavior characteristics include: time stamps of various account-related operations, such as one or more of account registration time, login time, logout time, and operation time; the user behavior characteristics related to the IP address comprise: IP address, IP location, etc.; the user behavior characteristics related to the telephone number comprise: telephone number, area where the phone is located (which may be obtained from area code).

In some embodiments, the user group data may be about social networking sites, such as national social networking sites such as: multifunctional popular socialization: baidu sticking bars; based on various living hobbies: bean paste; social network site based on travel sharing, group communication and customer stack information: turning over a net; professional-based social networking sites: the skynet, the cricket-ball-shaped human vein net and the Youshi net; social network sites based on enterprise user communication and sharing: a community of friends enterprises; based on resource downloading, paper retrieval, concept research and activity events: an academic net of breguet; popular based socialization: a QQ space; social network site based on liveness and practicability: a people net; entertainment based on the communication of white collar users and student users: open heart, human net; network co-residence based emotional communication: a guest competition network; marriage based on unwarified men and women: century jiayuan, lily net, treasure net; communication based on localization: the Nanjing family; friend making based on young users: 51; based on original articles: newcastle blogs web blog; fast sharing based on information: micro blogging; tag-based social sharing: the method is easy to find; based on the social question and answer website: asking and answering the net instantly; foreign social networking sites are for example: facebook, Twitter, LinkedIn, Pinterest, Google +, Tumblr, Instagram, VK, Flickr, friend web (MySpace), Tagged, ask.fm, Meetup, MeetMe, ClassMates, snaphat, other social networking sites, or phishing websites constructed based on these sites, and the like.

The classification of the user behavior features that are more relevant to social networking site fraud includes: one or more of time-related, IP address-related, source user-related, target user-related, and event-related.

For example, the time-dependent user behavior characteristics include: time stamps of various account-related operations, such as one or more of account registration time, login time, logout time, and operation time; the user behavior characteristics related to the IP address comprise: IP address, IP location, etc.; the user behavior characteristics related to the telephone number comprise: telephone number, area where the telephone is located (can be obtained according to area code); the user behavior characteristics associated with the source user include: source IP of transmission information, region of source user, address of source user, etc.; the user behavior characteristics related to the target user include: target IP, target user area, target user address, etc. of the transmission information; the user behavior characteristics related to the event include: social events among users, such as access, add friends, talk, comments, etc.

The user behavior feature set may be a set of some or all of the user behavior features described above, for example, one or more combinations of user behavior features including multiple classifications, a single classification, a combination of user behavior features extracted from multiple classifications or a single classification, or a single user behavior feature. In some embodiments, each user behavior feature may be assigned an importance level to indicate a relevance to fraud, for reference, and the highest or higher user behavior feature may be selected as much as possible when the fraudulent user is to be highlighted, the specific composition of which will be described in detail later.

Step S102: and acquiring visual data obtained according to the input information and the user group data of the user group.

Step S103: and performing visual output according to the visual data.

The visualization data for display includes: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping distance of the users in the distribution view.

In some embodiments, each user may be represented as a tile in the view, and occupy one or more pixel points (or occupy one or more grids formed by performing criss-cross mesh division according to the view size), and the behavior similarity between users is expressed as a mapping distance between tiles in the view, where the higher the behavior similarity between users is, the smaller the mapping distance is, and the lower the behavior similarity is, the larger the mapping distance is; thus, the user's tiles with similar behaviors in the view are "gathered", and the user's tiles with dissimilar behaviors in the view are "far away", and the behavior similarity of the user's behaviors is very high (for example, the IP address, the phone number, the region where the fraudulent user is located, and the like) for the fraudulent user, especially for the fraudulent company, while the user's behaviors of the normal legitimate user are often discrete.

In some embodiments, the distribution view includes any one or more of:

1) reflecting a first user distribution view mapped by the user group based on the behavior similarity on the user behavior feature set;

2) a second user distribution view which reflects the mapping of suspected fraudulent users in the user group based on the similarity of at least part of the user behavior characteristics in the user behavior characteristic set;

3) a third user distribution view formed by grouping and mapping the behavior similarity of each suspected fraudulent user on at least one user behavior feature in the user behavior feature set, wherein each suspected fraudulent user group is displayed in a distinguishing way;

4) and reflecting a fourth user distribution view mapped by each member in the suspected fraud user group based on the behavior similarity on the original value of the user behavior feature set.

In some embodiments, such as shown in fig. 2, an interface schematic of a first distribution view 200 in an embodiment of the present application is shown. In this embodiment, the first distribution view 200 classifies the entire user group according to the user behavior similarity on the user behavior feature set among all users in the user group to form each user cluster, which respectively corresponds to the groups 201 presented on the first distribution view 200, and actually, each group 201 is a pattern block respectively representing each user and is close to each other due to the similarity of the user behaviors, so that the pattern blocks are displayed in the form of the group 201 in an aggregated manner. In a possible implementation manner, the user behavior characteristics in the first user behavior characteristic set may be, for example, one or more user behavior characteristics in a single or multiple categories, and the first user distribution view is used for observing behavior similarity of all users (including legal users and fraudulent users) in the whole.

The first distribution view displays a user cluster which is gathered due to similar behaviors and comprises legal users and suspected fraudulent users; the display characteristics of the user clusters can show the user characteristics, such as the gray scale can indicate the number of users in each user cluster.

In some embodiments, similar to the formation of the first distribution view 200, such as that shown in fig. 3, the second distribution view 300 is mapped based on the similarity of suspected fraudulent users to at least some of the features in the set of user behavior features.

Wherein each suspected fraudulent user is determined from a population of users by the fraud detection algorithm, such as crosspot or D-Spot; clusters of suspected fraudulent users are shown in the second distribution view 300 aggregated by tile spacing corresponding to the similarity of these suspected fraudulent users, and the same display features that distinguish the tiles in the first distribution view can be adopted for the tiles of each of these suspected fraudulent users, e.g. the tiles in the first distribution view are in gray, the tiles in the second distribution view are in red, etc.; wherein the shade of the group 301 indicates how many users are aggregated.

Thus, it can be appreciated that the second distribution view can reflect the distribution of suspected fraudulent users. It should be noted that the suspected fraudulent users referred to in the embodiments are only some user groups gathered due to the similarity of behaviors, and it does not necessarily indicate that the suspected fraudulent user cluster is a group of fraudulent users, nor does it necessarily indicate that the suspected fraudulent users are fraudulent users, but rather indicates that the suspected fraudulent users are suspected to be similar in some behaviors.

For example, as shown in fig. 3, a graphical schematic of a second distribution view 300 in an embodiment of the present application is shown. In fig. 3, each cluster of suspected fraudulent users is represented as an aggregated pattern, each suspected fraudulent user can be represented as a tile of one or more pixels, and each cluster of suspected fraudulent users can be represented as a group 301. In some embodiments, the display characteristics of the tile corresponding to each suspected fraudulent user can show the identity of the suspected fraudulent user, and optionally, each suspected fraudulent user in the second distribution view 300 can be represented by the same display characteristics, including: one or more combinations of size, color, texture, gray scale, brightness, and numbering.

For example, in the second distribution view 300 illustrated in fig. 3, each suspected fraudulent user may be colored the same, and optionally, may be colored a more vibrant color, such as red; optionally, the groups 301 corresponding to each suspected fraudulent user cluster are gathered together, so that a darker color can be presented; accordingly, the larger the number of suspected fraudulent user clusters, the darker the color of the corresponding group 301 visually observed by the naked human eye, and the lighter the number of groups.

In other embodiments, the number of pixels per suspected fraudulent user may be greater or lesser, and thus greater or lesser in size; alternatively, the group 301 corresponding to each suspected fraudulent user cluster may be represented as a pattern with the same specific texture (e.g., a group of horizontal stripes, star stripes, etc.); or, the group 301 corresponding to each suspected fraudulent user cluster may be represented by adopting the same specific gray scale and brightness; alternatively, each suspected fraudulent user cluster may be numbered, and optionally, the number may be displayed on the corresponding group 301 of each suspected fraudulent user cluster.

In the third distribution view, displaying each suspected fraud user group generated by grouping each suspected fraud user in the second distribution view on at least one user behavior feature in the user behavior feature set, wherein different suspected fraud user groups are differentially displayed by adopting different display features, and the display features include: one or more combinations of size, color, texture, gray scale, brightness, and numbering.

For example, as shown in fig. 4, a graphical illustration of a third distribution view 400 in an embodiment of the present application is shown. In this fig. 4, each suspected fraudulent user may be presented as a tile of one or more pixels (or grid), and each suspected fraudulent user group may be presented as one or more groups 401. The display characteristics of one or more groups 401 corresponding to each suspected fraudulent user group can be shown to be different from other suspected fraudulent user groups, and optionally, the suspected fraudulent users in each suspected fraudulent user group in the third distribution view 400 can be represented by the same display characteristics, where the display characteristics include: one or more combinations of size, color, texture, gray scale, brightness, and numbering.

For example, in the third distribution view 400 shown in fig. 4, suspected fraud users belonging to different suspected fraud user groups may adopt different coloring methods, for example, if pixels corresponding to the suspected fraud users in the suspected fraud user group 1 are colored green, three groups A, B, C corresponding to the suspected fraud user group 1 are displayed green, and pixels corresponding to the suspected fraud users in the suspected fraud user group 2 are colored orange, two groups D, E corresponding to the suspected fraud user group 2 are displayed orange; optionally, each group 401 corresponding to each suspected rogue user group may be in a darker color because they are clustered together; correspondingly, if the number of suspected fraudulent user groups is more, the color of the corresponding group 401 is darker in the visual observation of naked eyes, and if the number of suspected fraudulent user groups is less, the color is lighter; thus, the shade of the group 401 in each suspected fraudulent user group corresponds to the number of users gathering there.

Optionally, the display characteristics of each group 401 corresponding to each suspected fraudulent user group are the display characteristics corresponding to the dominant part of each suspected fraudulent user corresponding to the group. For example, the display feature may be a coloring, such as, after calculating that the number of suspected fraudulent user groups is 7, 7 colors of red, orange, yellow, green, blue, indigo, and purple (selected according to RGB color values, for example) are assigned, and several groups 401 (A, B, C, for example) corresponding to a certain suspected fraudulent user group are provided, where most users in the group belong to suspected fraudulent user group 1 and are red, some users belong to suspected fraudulent user group 2 and are orange, some users belong to suspected fraudulent user group 5 and are blue, and the greatest number of suspected fraudulent users corresponding to red is the dominant part, so that the coloring of the group A, B, C is consistent with the coloring of the fraudulent user group 351 and is red. The shade of the group A can indicate how many people are in the dominant part.

In other embodiments, the number of pixels that represent each suspected fraudulent user of different groups of suspected fraudulent users may be greater or lesser, and thus greater or lesser in size, by differing display characteristics; alternatively, each group 401 corresponding to the suspected rogue user group may be represented as a pattern with a specific texture (e.g., a circular pattern of horizontal stripes, star stripes, etc.); or, each group 401 corresponding to the suspected fraudulent user group may adopt a specific gray scale and brightness; alternatively, each suspected fraudulent user group may be numbered, and optionally, the number may be displayed on each of the groups 401 corresponding to each suspected user group.

It can be seen that the distribution of the suspected fraudulent user on the one or more user behavior characteristics can be well demonstrated through the second distribution view and/or the third distribution view, and the visualization helps the analyst to intuitively understand the distribution, so as to observe the synchronicity of the suspected fraudulent user on the one or more user behavior characteristics, and at least meet the requirements of the aforementioned tasks 1 and 4.

In some embodiments, such as shown in fig. 5, an interface schematic of a fourth distribution view 500 in an embodiment of the present application is shown. In this embodiment, the fourth distribution view 500 may be obtained by mapping behavior similarity represented by similarity of original feature values of suspected fraudulent users in a certain suspected fraudulent user group in the third distribution view on the user behavior feature set, where the original feature values refer to original values without dimension reduction (for example, original data of dimensions such as geographic location, telephone number, and IP address), so that distribution of the blocks 501 corresponding to each member in the group can be closer to an actual similar situation, and thus, observation and analysis are more facilitated. It should be noted that, in some embodiments, the value of the user behavior feature used in the fourth distribution view may also be a conversion result obtained through a predetermined processing method, instead of the original value.

The fourth distribution view 500 and the third distribution view can be compared to find the overall similarity of various suspected fraud users in the same suspected fraud user group in various user behaviors, and can be used for judging the quality of fraud detection.

In some embodiments, corresponding to the fourth distribution view 500, specific information related to values of the suspected fraudulent users on the user behavior characteristics may be further displayed nearby, for example, the values and the value distribution of the suspected fraudulent users on the user behavior characteristics, which is beneficial for an analyst to further analyze the value characteristics of the suspected fraudulent users on the user behavior characteristics.

In some embodiments, part or all of the second distribution view, the third distribution view, the first distribution view and the fourth distribution view can be displayed through a human-computer interaction graphical interface. Optionally, some or all of the views may be displayed in parallel or switched in the interface.

For example, as shown in fig. 6, a schematic diagram of a human-computer interaction graphical interface in the embodiment of the present application is shown.

In this embodiment, in the area shown in the area a on the left side of the interface, an option of performing corresponding setting according to user input information is provided, for example, a corresponding fraud detection algorithm (e.g., a D-Spot algorithm in the drawing) and an algorithm parameter (e.g., a weight setting box in the drawing for inputting a weight value) are set. Optionally, the results of the fraud detection algorithm, such as the number of users, the number of fraud groups, the number of suspected fraudulent users, the number of legitimate users, and the accuracy (precision) and recall (recall) of the fraud detection algorithm can be calculated according to the results, so that the analyst can adjust the fraud detection algorithm. Optionally, in the area a, for example, the number of fraudulent user groups selected by the user in the area B, the ID, and the number of suspected fraudulent users contained in the group may also be displayed.

The upper B area in the interface currently shows a third distribution view, and optionally, the second distribution view and the third distribution view are respectively presented on a graphic page that can be switched to display, that is, the second distribution view and the third distribution view can be switched to display through a column above the B area; optionally, the first distribution view may also be displayed in a manner of switching with other views, for example, when receiving a user operation, a "boud" information bar option in the diagram may trigger the second distribution view to be displayed in the B region, when receiving the user operation, a "group" information bar option may trigger the third distribution view to be displayed in the B region, when receiving the user operation, an "All" information bar option may trigger the first distribution view to be displayed in the B region, and when operating the information bar option, the user may implement switching of the second distribution view, the third distribution view, and the first distribution view.

It is to be understood that, although not shown in the drawings, alternatively, the third distribution views formed corresponding to different third user behavior feature sets may be presented on respective graphic pages that can be switched to be displayed. For example, different options of information bars, such as "IP address", "phone number", etc., may be added to the B area in fig. 6, respectively for triggering the third distribution view of the corresponding user behavior feature "phone number" shown in the B area and displaying the third distribution view of the corresponding user behavior feature "IP address".

In this embodiment, the area C below the area B may further display a fourth distribution view, an area D on the right side of the fourth distribution view, and specific information of each suspected fraudulent user shown in the fourth distribution view, such as a feature value, a value distribution, and the like of the user, for example, an IP address of each suspected fraudulent user shown in the fourth distribution view is a ratio of a certain same value, or a ratio of the same province.

In this embodiment, optionally, a "user behavior feature list" may also be displayed in the E area to list the respective user behavior features, and a feature selection option may be provided, for example, a selection box in the illustration, for the user to select the user behavior feature by the "√" shape to generate the corresponding second and third user distribution views. Further alternatively, each user behavior feature may have a Weight (Weight, which may be user-settable in this figure), and an importance, wherein the importance includes both an average Entropy (Entropy) and an average relative Entropy (KL), and the calculation thereof will be described in detail in the following embodiments.

Optionally, a thumbnail (thumbnail) corresponding to each user behavior feature may be displayed, where a black line shows a value distribution of the user behavior feature in the whole user group data, and a gray line shows a value distribution of the user behavior feature in part of the detected user group data of all suspected fraudulent users. By comparing the two value distributions, an analyst can preliminarily know that the user behavior characteristics act on each distribution view so as to show the importance of separating suspected fraudulent users.

Optionally, in other embodiments, an option for selecting user group data, etc. may also be provided in the human-computer interaction interface, which is not illustrated here.

It should be noted that the layout in fig. 6 is only an example, and in other embodiments, the regions may be added, deleted or combined according to requirements, and is not limited to fig. 6.

In step S103, the visual output refers to a graphical output on a display (e.g., LCD, LED, OLED, etc.), showing an electronic pattern such as those shown in fig. 2-6.

In some embodiments, the behavioral similarity is measured based on a weighted result of the behavioral similarities of the plurality of user behavioral features in the set of user behavioral features. For example, corresponding to the first distribution view, all the user behavior features in the user behavior feature set are involved, and when comparing the behavior similarities of the user 1 and the user 2 on the user behavior feature set, the behavior similarities on the user behavior features are weighted and summed (the weight of each user behavior feature may be from the set algorithm parameter), and in some examples, the weights may be set by the user or set by default; the weights of all the user behavior characteristics in the user behavior characteristic set can be all the same, partially the same or all different; for another example, corresponding to the second distribution view, assuming that it relates to three user behavior features, namely, a telephone number, an IP address, and a device, the behavior similarities on the three user behavior features are respectively weighted and summed, and if the weights of the three user behavior features are equal, the weighted sum is an average value obtained by dividing the sum of the three behavior similarities by 3.

The existing method for calculating the behavior similarity between the two, such as Euclidean distance and cosine distance, can be applied to the behavior similarity calculation in the application, but the application in a fraud detection scene may have limitations, because the user behavior characteristics are classified data, the use of one-hot coding (one-hot coding) to calculate the behavior similarity has no meaning, and the one-hot coding also causes the calculation to be more complicated, which is impractical for a huge user group in the fraud detection scene.

Thus, in the embodiments of the present application, an improved metric regarding behavior similarity is provided, namely, "collision distance" (collid), the magnitude of which is inversely related to behavior similarity, and the mapping distance between users in each distribution view is the mapping result of the collision distance, i.e., the ratio of collision distances between, for example, three users a and B, B and C, A and C is 1: 2: 3, converting to the distribution view, wherein the mapping space ratio of the distribution view is also 1: 2: 3, to x pixels: 2x pixels: 3x pixels are also 3x pixels: 6x pixels: 9x pixels, may be determined according to the actual size of the distribution view to be rendered.

In some embodiments, the "collision distance" is related to, i.e., represents, behavior similarity through similarity between users or values of user behavior characteristics, and the behavior similarity can be measured through an information amount, i.e., "entropy", included in the similarity of the values of the user behavior characteristics.

In some embodiments, the behavior similarity of each two users on each user behavior feature is related to: the user behavior characteristics obtained based on the user group data statistics are first probability distributions of various values and relative entropies between the first probability distributions and second probability distributions when one value is collided on the user behavior characteristics of the two users.

For example, user u_iAnd user u_jThe behavior similarity between the kth user behavior feature and the kth user behavior feature can be expressed as the following formula (1):

wherein p is_kRepresents the distribution of values over the kth user behavior feature, and p_k(v) Then representing the probability distribution when the value of the k-th user behavior characteristic is v; in that

Each v in (b) represents a user u_iAnd user u_jA value collision occurs on the kth user behavior characteristic, that is, for example, the values are the same, such as using the same IP address or the same address. Wherein-log (p)_k(v) Can be based on the relative entropy between two probability distributions, i.e., KL divergence (Kullback-Leibler divergence), where the smaller the relative entropy, the smaller the difference and the higher the behavior similarity, let p be_kRepresenting the first probability distribution, F representing the second probability distribution, the KL divergence being represented as KL (F | | p)_k) (ii) a Wherein, the p is_kThe statistical method can be obtained by counting the values of the k-th user behavior characteristics of each user in the user group data, and specifically, the statistical method can be obtained by the first probability distribution function p_k(x) Is represented by p_k(x) Representing the probability distribution when the value of the k-th user behavior characteristic is x; f may be represented by a second probability distribution function F (x), i.e. user u_iAnd user u_jA probability distribution function of the value collision x appears on the kth user behavior characteristic.

In this embodiment, when user u_iAnd user u_jValue collision occurs on the kth user behavior characteristic, namely the value of the kth user behavior characteristic is

V in (1), i.e., x ═ v, f (x) 1; or, when x ≠ v, f (x) ═ 0, that is, in this embodiment, f (x) may be simplified to be a binary function; therefore, for the kth user behavior feature, if the value is v and the second probability distribution function f (x) is 1, the kth user behavior feature can be derivedThe second probability distribution F corresponding to the behavior characteristic of each user when the value is v is relative to the first probability distribution p_kThe KL divergence of (A) can be expressed as:

KL(F||p_k)＝F(x＝v)log(F(x＝v)/p_k(v))＝1·log(1/p_k(v))＝-log(p_k(v) to obtain the formula KL (F | | p)_k)＝-log(p_k(v))。

It should be noted that, in the formula (1), it is also considered that two users may have value collisions on the same user behavior feature for multiple times, for example, the IP addresses of the a user and the B user are both C in yesterday, and the IP addresses of the a user and the B user are both D today, in this case, the sum of the relative entropies of multiple corresponding relative entropies when multiple values collide is related to, that is, the sum of the relative entropies 1 when the value collision C occurs on the user behavior feature of the IP address of the user A, B and the sum of the relative entropies 2 when the value collision D occurs are respectively obtained.

The method utilizes the formula (1) to measure the behavior similarity, so that users with higher behavior similarity have better chance to be classified into the same suspected fraud user group.

In some embodiments, if the user behavior feature set has K user behavior features, the behavior similarity of the user behavior feature set as a whole may be a weighted sum of the behavior similarities of the K user behavior features, for example, an average value, which may be expressed as formula (2):

as mentioned earlier, in the first to fourth distribution views of fig. 2 to 5, each user is represented by one or more pixels, and the collision distance between users is negatively correlated to the behavior similarity, i.e. the higher the behavior similarity between two users is, the smaller the distance between two corresponding groups of pixels is, so as to form the distribution views, e.g. the distribution of the groups of the corresponding suspected fraudulent user clusters, suspected fraudulent user groups gathered in the second and third distribution views.

In the distribution view, the mapping distance between two image blocks representing two users is the mapping result of the collision distance, and the magnitude of the collision distance is inversely related to the behavior similarity.

In some embodiments, the collision distance is scaled up in negative correlation to the similarity of behavior. In some examples, the zooming-in refers to zooming in the similarity between users with behavior similarity to each other to form a first collision distance, and for users without behavior similarity to each other, it may be indicated by a second collision distance that is much larger than the first collision distance that there is no similarity between them, that is, the dissimilar users are as far as possible from each other in the distribution view, so that the mapping distances between legitimate users and suspected fraudulent users and between meaningful fraudulent users in different fraud groups are as far as possible, so that each cluster of suspected fraudulent users and suspected fraud user group are clearly and prominently displayed on each distribution view, so as to avoid the situation that the dissimilar users look very close to each other in the distribution view in the case of a large number of users.

For example, in conjunction with equation (2), the collision distance corresponding to the behavior similarity can be expressed as equation (3) below:

wherein, in user u_iAnd user u_jUnder the condition of having behavior similarity on K user behavior characteristics, the operation is carried out by an operator^-1Converted to distances on the distribution view. D_maxRepresenting the maximum non-zero distance between all pairs of users. S_maxIs a parameter that controls the degree of user classification in the mapping. By this parameter, the behavior characteristics of the user with suspicious value collisions between two users are more favored. According to the prediction of the CollIS, a fraudulent user and an ordinary user can be well separated. For user behavior characteristics with values as values, the similarity definition in equation (4) can also be used, where the closeness between values is used to express the degree of value collision, equation(4) Expressed as:

similarly, the first distribution view, the second distribution view, the third distribution view and the fourth distribution view can be calculated and mapped based on the collision distance calculation principle.

The collision distance (CollIS) measurement mode is beneficial to tolerating noise in the original characteristic value and focuses on important information for distinguishing common users from suspected fraudulent users. Given the set of user behavior characteristics, the distribution view mapped according to ColDis will always be the same, which allows us to pre-compute the mapped distribution view. Different fraud detection algorithms may change the classification results for normal and suspected fraudulent users, but not the visual layout of the distribution views corresponding to their mappings. This makes it easier to evaluate different fraud detection algorithms from the ColDis based visualization. The fraud detection effects of these algorithms, respectively, may be compared by means of a visual interface such as that shown in fig. 6.

In some embodiments, since the number of the user group may be huge, the original value of the user behavior feature may also have a high dimensionality, and the generation of the visualization data is performed directly by using the original user group data, even with the help of the collision distance, the display effect of classifying the users in the distribution view may be poor, so that the collision distance may be calculated according to the low-dimensional data (the low-dimensional data includes the low-dimensional feature vector corresponding to each user) obtained by performing the dimension reduction processing on the user group data, thereby forming the distribution view.

For example, a high-dimensional user population data (which can be presented in the form of data in a user profile) can be mapped into a 2D space by using, for example, a t-SNE (t-distributed stored neighboring embedding) dimension reduction algorithm, which is an improved SNE-based algorithm and is a nonlinear dimension reduction algorithm, and is very suitable for reducing the dimension of the high-dimensional data into 2 dimensions or 3 dimensions for visualization. The t-SNE is only used as an example and not limited thereto, and other dimension reduction algorithms such as Principal Component Analysis (PCA) and the like can be used in other embodiments.

In some embodiments, when the user data size becomes large, severe group overlap may be caused in the distribution view to be generated (e.g., the original distribution view constructed based on the collision distance), so that the original user distribution in which the user is not accurately observed by human vision. Therefore, the distribution of the original user can be estimated by using a Kernel Density Estimation (KDE) method, and the estimated distribution is used for replacing the actual distribution, so that users with similar user behaviors on the distribution view can be concentrated, the distance between adjacent groups is larger, and the overlapping condition in the distribution view is reduced. For example, the user distribution in at least one of the first, second, and third distribution views is dependent on the calculated estimated distribution; and the estimated distribution is obtained by performing kernel density estimation according to the original user distribution obtained by the behavior similarity.

The estimated distribution of the kernel density estimation can be expressed as the following formula (5):

where N is the number of all users, x_iIs the ith user u_iThe original mapping position in the distribution view determined based on the original user distribution in the distribution view, h is the bandwidth of KDE, in this embodiment, a gaussian kernel may be used, that is, the gaussian kernel is approximate to h ═ 1.06 σ N^-0.2Initialization is performed and σ is the standard deviation of all user positions.

In some embodiments, the first distribution view may be a distribution view reconstructed from the USER-KDE estimated distribution by replacing the original USER distribution of the USER population with the USER-KDE estimated distribution to form a finally rendered first distribution view to reduce overlap in the original distribution view.

Taking the second distribution view as an example, the second distribution view is an original USER distribution which mainly shows suspected fraudulent USERs and is formed on the basis of selecting parts more relevant to fraudulent behaviors from the USER behavior characteristics used for constructing the first distribution view, and similar to the principle of calculating the USER-KDE estimation distribution, the Fraud-KDE estimation distribution can be calculated to replace the original USER distribution of the suspected fraudulent USERs, so that the second distribution view is formed. To highlight the cluster of suspected fraudulent users in the second distribution view, the cluster may be colored with a striking color, such as red.

Similarly, the third distribution view may also optimize the display based on the above-described method of kernel density estimation.

In order to provide a selection basis for the user behavior characteristics closer to the fraudulent behavior, in some embodiments, each user behavior characteristic may have an importance, and each user behavior characteristic constructing the second distribution view may be one or more of the highest importance in the user behavior characteristic set. For example, compared to constructing the first distribution view, the user behavior features selected in constructing the second distribution view may be the most important ones of the set of user behavior features used in constructing the first distribution view, such as 2,3,4, etc.

Optionally, the importance may be expressed by two evaluation indicators, including: the average information entropy of the user behavior feature in each suspected fraud user group and/or the average relative entropy of the value distribution of the user behavior feature of the user group relative to the average relative entropy of the value distribution of the user behavior feature of each suspected fraud user group.

Wherein, regarding the average information entropy of the user behavior feature in each suspected fraud user group, for example, n users in total, the values of the user behavior feature are x₁，x₂，......，x_NThen, according to the calculation formula of the information entropy:

then x is taken by statistics₁，x₂，......，x_NAnd obtaining the probability of each value, correspondingly calculating the probability p (x), thus obtaining the information entropy H (X) by calculation, and further dividing by n to obtain the average information entropy.

Wherein, regarding an average relative entropy of the value distribution of the user group in the user behavior feature relative to each relative entropy of the value distribution of each suspected fraud user group in the user behavior feature, for example, a total of n users, values of a certain user behavior feature k are x respectively₁，x₂，......，x_NCorresponding value distribution p (x) is calculated, and the value distribution of each suspected fraud user group on a certain user behavior characteristic k can also be calculated correspondingly, for example, if there are M suspected fraud user groups, the corresponding value distribution is q₁(x)，....q_M(x) Respectively calculating p (x) and respectively corresponding q according to the formula (1)₁(x)，....q_M(x) The KL divergence (in the manner of calculation of the KL divergence in the foregoing), i.e., the respective relative entropies, and further averaging them.

The lower the average information entropy or the higher the average relative entropy, the higher the importance. Simply speaking, the lower the information entropy is, the more similar the values of the users on the user behavior characteristics is, and the behavior similarity is high; the higher the relative entropy, the larger the difference between users, such as the relative entropy between a legitimate user and a suspected fraudulent user; the user behavior characteristics with high importance can be more accurately positioned to the fraudulent user. In an embodiment such as that of fig. 6, for selecting a region (e.g., region E) of user behavior characteristics, the importance of each user behavior characteristic may be listed to facilitate the selection by the analyst, where "entry" and "KL" respectively represent the calculated average information Entropy and average relative Entropy.

In some embodiments, the distribution views shown in fig. 2 to fig. 5 are used for providing visual information to the analyst, and particularly in the interface shown in fig. 6, the area a is provided for the analyst to set/adjust fraud detection algorithms, algorithm parameters, etc., and the area E is used for selecting one or more user behavior characteristics in the user behavior characteristic set, and the adjusted reference basis may be information corresponding to the visual output of each distribution view.

For example, the references include any one or more of the following in combination:

in some embodiments, the difference between the second distribution view and the first distribution view may be used as a reference for indicating to adjust the user behavior characteristics in the user behavior characteristic set according to the importance.

For example, if the second distribution view is the same as or close to the first distribution view, it indicates that the correlation between the selected user behavior characteristics and the fraudulent behavior is not high enough to effectively separate the suspected fraudulent user cluster from the legitimate user cluster, and this situation can provide the analyst's suggestion, and the user behavior characteristics in the second user behavior characteristic set can be adjusted, for example, selecting some user behavior characteristics (e.g., 2,3,4 or more) with the highest importance from the first user behavior characteristic set, and so on.

In some embodiments, the overall mixed situation caused by different suspected fraud user groups represented by different display features in the third distribution view may be used as a reference for indicating whether to reduce the user behavior features with lower importance in the user behavior feature set or reduce the weight values corresponding to the user behavior features with lower importance in the algorithm parameters.

For example, there are many groups that appear as follows, as a whole: each group contains a plurality of members of a suspected fraudulent user group and exhibits various colors, such as red, yellow, green, orange, etc., and the distribution of any one of the suspected fraudulent user groups cannot be studied. Under such a situation, it means that too many groups are formed by the user behavior features in the currently selected third user behavior feature set, and the concentrated fraudulent behavior cannot be well reflected, and the weighted value corresponding to the user behavior feature with lower importance in the algorithm parameter can be deleted or reduced, so that the overall mixed situation can be avoided.

In some embodiments, the number of different suspected fraud user groups represented by different display characteristics in the third distribution view may be used as a reference for indicating whether to add a user behavior characteristic with higher importance to the user behavior characteristic set.

For example, assuming that the minimum threshold for the number of suspected fraudulent user groups appearing in the third distribution view is required to be 10, if the number of suspected fraudulent user groups appearing in the displayed third distribution view (e.g. represented by colors) is lower than 10, for example, only 8 colors, then the user behavior feature with higher importance may be added. Wherein the user behavior feature with higher importance may be one or more of the remaining unselected user behavior features with the highest importance.

In some embodiments, the third distribution view may be used to indicate whether to adjust the member threshold condition for screening the displayed suspected rogue user group or the edge threshold condition for dividing the edge of the suspected rogue user group according to the relationship strength between the suspected rogue users, based on the mixed condition among different suspected rogue user groups represented by different display features in at least one local area.

For example, in the third distribution view, in a few local areas, there are cases where colors corresponding to different suspected fraudulent user groups are mixed, for example, one suspected fraudulent user group 1 corresponds to 7 groups, each group is colored green, and in a distribution area of the 7 groups, an 8 th group is also mixed, which belongs to the suspected fraudulent user group 2, and the color is red, or, a 9 th group is also mixed, which belongs to the suspected fraudulent user group 3, and the color is purple, and the like; a predetermined percentage of the regions in the entire third distribution view, such as 1%, 2%, 3%, 4%, 5%. etc. local regions, may also be a predetermined number, such as 1, 2,3,4, 5.. etc. local regions; if this situation occurs, which indicates that the distribution space of the suspected fraudulent user group 1 may be too small to assign a part of the users to other groups, then the group number may be expanded, for example, by changing the membership threshold condition, so that the users of the red group are included in the suspected fraudulent user group 1; or screening users which can be added into each suspected fraud user group according to the edge threshold condition, namely determining the edge of the suspected fraud user group; the relationship strength may be determined according to a comprehensive situation of collisions occurring on multiple user behavior features among suspected fraudulent users, for example, when a collision occurs on 2,3,4 among the user behavior features 1, 2,3,4, 5-10, a relationship strength between the users a and B may be comprehensively calculated according to weights of the user behavior features 2,3,4, by setting an edge threshold condition related to a relationship strength threshold, if the relationship strength between A, B is lower than a relationship strength threshold, it is indicated that a and B do not belong to the same group, and if the relationship strength is higher than a relationship strength threshold, it is indicated that a and B belong to the same group.

In some embodiments, the dense user distribution shown in the fourth distribution view is used as a quality evaluation basis for the suspected fraudulent user group to indicate whether to adjust the user behavior characteristics in the user characteristic set.

For example, if suspected fraudulent users belonging to the same suspected fraudulent user group in the third distribution view are group-joined, they should have the overall behavior similarity on the user behavior feature set, and if they do not present at least a partial aggregation in the fourth distribution view, for example, as shown in fig. 7A, the distribution of the suspected fraudulent user group 701A in the third distribution view in the fourth distribution view is very scattered, which indicates that the packet quality of the suspected fraudulent user group is not good and the more relevant user behavior feature needs to be adjusted, for example, the importance is higher; as shown in fig. 7B, it can be seen that the users in the suspected fraudulent user group 702A are aggregated into a "cloud" of pixels 702B marked by a dotted multi-cluster line on the whole user behavior feature set, and the similarity of the pixels in the whole behavior proves that the grouping quality is better.

For example, as mentioned in the previous embodiments, if the display characteristics of the group may be set to those of the leading part of the corresponding suspected fraudulent user group, for example, if the leading part corresponds to green, then the entire group may be visually set to green, in this way, in fact, the suspected fraudulent user group in the third distribution view may contain some user members that may not belong to the group, and these user members may be observed when the suspected fraudulent user group is expanded in the fourth distribution view, and if they are also aggregated, it is indicated that it is possible to be a fraudulent group, and then the fraudulent analysis is also required for these suspected fraudulent users.

In the above embodiment, the summary requirement of task 1 can be met through the aforementioned second distribution view and/or third distribution view, the quality evaluation requirement of task 2 can be met by waiting for two or more comparisons among the first distribution view, second distribution view, third distribution view and fourth distribution view, and the detailed information requirement of task 4 can be met through the fourth distribution view (which may also be supplemented with corresponding detailed information); through the interface of fig. 6, for example, the change of the graphic display of the corresponding output result is obtained by adjusting the selected user behavior characteristics by the analyst in the area E, so that the analyst can more flexibly and intuitively perform fraud detection analysis to meet the requirements of the tasks 3 and 6; moreover, the analyst can also meet the requirements of task 5 according to each distribution view, check false reports, meet the requirements of task 7, and finally extract the learned fraud detection rules.

Therefore, the visual user classification method can effectively assist in the analysis of fraud detection and make up for various defects in the prior art.

In some embodiments, the user classification method may be performed entirely on a local electronic device. Such as an electronic device loaded with an APP application computer program or having web/website access capabilities, including components such as memory, memory controller, one or more processing units (CPUs), peripheral interfaces, RF circuitry, audio circuitry, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports, which communicate via one or more communication buses or signal lines. The electronic device includes, but is not limited to, personal computers such as desktop computers, notebook computers, tablet computers, smart phones, smart televisions, and the like. The electronic device can also be an electronic device consisting of a host with a plurality of virtual machines and a human-computer interaction device (such as a touch display screen, a keyboard and a mouse) corresponding to each virtual machine.

In some embodiments, part or all of the work in step S102 in the user classification method may be implemented on other electronic devices in local communication, or implemented locally and cooperatively with other electronic devices.

The other electronic devices may be the electronic device categories exemplified above, or may be a service system, and communicate with the local electronic device through a network; wherein the network may be the internet, a mobile network, a Local Area Network (LAN), a wide area network (WLAN), a Storage Area Network (SAN), one or more intranets, the like, or a suitable combination thereof; the server system may be arranged on one or more physical servers according to various factors such as functions, loads, and the like. Wherein, when distributed in a plurality of entity servers, the server system can be composed of servers based on cloud architecture. For example, the Cloud-based server includes a Public Cloud (Public Cloud) server system and a Private Cloud (Private Cloud) server system, wherein the Public or Private Cloud server system includes Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS), Infrastructure as a Service (IaaS), and the like. The private cloud server system is used for example for a Mei Tuo cloud computing service platform, an Array cloud computing service platform, an Amazon cloud computing service platform, a Baidu cloud computing platform, a Tencent cloud computing platform and the like. The server system may also be constituted by a distributed or centralized cluster of servers. For example, the server cluster is composed of at least one entity server. Each entity server is provided with a plurality of virtual servers, each virtual server runs at least one functional module in the catering merchant information management server system, and the virtual servers are communicated with each other through a network.

In a communication system such as that shown in fig. 8, for example, a connection between a local electronic device 801 and a service system 802 is shown, a human-machine-interaction graphical interface (e.g., as shown in fig. 6) may be provided at the local electronic device 801 where a user may enter information (e.g., by typing, selecting, etc.) to set fraud detection algorithms, user behavior characteristics, algorithm parameters (e.g., weights for respective user behavior characteristics), etc., the data are further output to the service system 802, the service system 802 executes user classification according to an operation algorithm, the service system 802 can feed back a classification result to the local electronic device 801, the local electronic device 801 combines the user classification result to generate the visual data by using the collision distance, the dimensionality reduction algorithm, the kernel density estimation and the like in the foregoing embodiment, and displays corresponding distribution views on the human-computer interaction graphical interface; of course, in some embodiments, the visual data may also be directly generated by the service system 802 and fed back to the local electronic device 801, and then the local electronic device 801 only needs to be responsible for sending information set by the user and further displaying the received visual data, thereby reducing the requirement on the local electronic device 801.

In fig. 9, a flowchart of a visual data service method performed by a service system in an embodiment is provided, where the visual data service method includes:

step S901: and acquiring a fraud detection algorithm, algorithm parameters and at least one user behavior feature set determined according to the input information.

In some embodiments, the input information is input by a user at the electronic device and transmitted to the service system via a network. The input information is used to configure fraud detection algorithms, algorithm parameters, and at least one user behavior feature set for processing the user population data to determine suspected fraudulent users. It should be noted that the fraud detection principle mentioned in the embodiment of the present application is based on the characteristics of synchronization and similarity of a suspected fraud group on some user behaviors, so that the fraud detection algorithm is used to classify users based on user behavior characteristics of different dimensions, so as to distinguish suspected fraudulent users.

In some embodiments, the fraud detection algorithm is, for example, Crosspot or D-Spot, etc.; of course, other classification algorithms may be used in this application, and are not limited to this example. In some embodiments, the algorithm parameters include: the weight of each user behavior feature, etc.

Step S902: and processing user group data of the user group according to the fraud detection algorithm, the algorithm parameters and the at least one user behavior characteristic set to generate a user classification result or visual data.

Wherein, the fraud detection algorithm is Crosspot or D-Spot, etc.; the visualization data for display includes: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping distance of the users in the distribution view.

In some embodiments, each user may be represented as a tile in the view, and occupy one or more pixel points (or occupy one or more grids formed by performing criss-cross mesh division according to the view size), and the behavior similarity between users is expressed as a mapping distance between tiles in the view, where the higher the behavior similarity between users is, the smaller the mapping distance is, and the lower the behavior similarity is, the larger the mapping distance is; thus, the user's blocks with similar behaviors in the view are "gathered", and the blocks with dissimilar behaviors in the view are "far away" from each other, so that the behavior similarity of the user behaviors is very high (for example, the IP address, the telephone number, the region where the fraudulent user is located, and the like) for the fraudulent user, especially for the fraudulent company, and the user behaviors of the normal legitimate user are often discrete.

In some embodiments, the distribution view includes any one or more of:

1) reflecting a first user distribution view mapped by the user group based on the behavior similarity on the user behavior feature set

2) Reflecting a second user distribution view mapped by suspected fraudulent users in the user group based on the similarity of one or more user behavior characteristics in the user behavior characteristic set, wherein each suspected fraudulent user cluster is presented in the second user distribution view;

For example, the first distribution view may refer to the aforementioned embodiment of fig. 2, the second distribution view may refer to the aforementioned embodiment of fig. 3, the third distribution view may refer to the aforementioned embodiment of fig. 4, and the fourth distribution view may refer to the aforementioned embodiment of fig. 5; a human-computer interaction graphical interface may also be provided, for example, as shown in the embodiment of fig. 6, to combine and display the distribution views, and to provide an area for setting information for changing the distribution view result, such as fraud detection algorithm, algorithm parameters, user behavior characteristics, and the like, which is not repeated here.

Step S903: outputting the user classification result or visualization data; wherein the outputted user classification result is used for externally generating the visualization data.

In some embodiments, the output may be to send the user classification result or visualization data to the outside, for example, in the embodiment of fig. 8, the service system may send the user classification result or visualization data to a local electronic device through a network.

Fig. 10 is a schematic structural diagram of a computer device provided in the embodiment of the present application.

The computer device 1000 may be used to implement the electronic device on the side of the analyst in the foregoing embodiment, and may execute the visual user classification method in the embodiment of fig. 1, for example, to perform any one or more graphic displays in fig. 2 to fig. 6.

The computer device 1000 comprises:

the storage device 1001 stores at least one computer program. In some embodiments, the storage 1001 includes at least one memory for storing at least one computer program; in embodiments, the memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid state storage devices. In certain embodiments, the memory may also include memory that is remote from the one or more processors, such as network attached memory that is accessed via RF circuitry or external ports and a communications network, which may be the internet, one or more intranets, local area networks, wide area networks, storage area networks, and the like, or suitable combinations thereof. The memory controller may control access to the memory by other components of the device, such as the CPU and peripheral interfaces.

Processing means 1002 for running said computer program to perform and implement the visual user classification method of fig. 1, for example, to perform any one or more of the graphical displays of fig. 2-6. In some embodiments, the processing device 1002 comprises at least one processor, which is connected to the at least one memory, and is configured to execute and implement at least one embodiment described in the above visual user classification method, such as the embodiment described in fig. 1, when the at least one computer program is run. In an embodiment, the processor is operatively coupled with a memory and/or a non-volatile storage device. More specifically, the processor may execute instructions stored in the memory and/or the non-volatile storage device to perform operations in the computing device, such as generating image data and/or transmitting image data to an electronic display. As such, at least one of the processors may comprise one or more general purpose microprocessors, one or more special purpose processors, one or more field programmable logic arrays, or any combination thereof.

In some embodiments, the computer device 1000 may also be used to implement the local electronic equipment in fig. 8, which may include a communication device 1003 for communicating with the outside, for example, it includes one or more wired or wireless communication circuits, such as wired ethernet card, USB, etc., and the wireless communication circuits include wireless network card (WiFi), 2G/3G/4G/5G mobile communication module, bluetooth, infrared, etc. It should be noted that, when the computer device 1000 implements the visual user classification method locally without communicating with the outside, the communication device 1003 may be omitted and is indicated by a dotted line in fig. 10.

Fig. 11 is a schematic structural diagram of a service device provided in the embodiment of the present application.

The service apparatus 1100 can be used to implement a service system such as that in fig. 8, the hardware architecture of the service apparatus 1100 is similar to that of the computer apparatus of fig. 10, except that the service apparatus 1100 needs to have a communication capability to provide services to the outside, and there are differences in the running computer programs based on implementing different functions.

The service apparatus 1100 includes:

communication means 1103 for communicating with the outside, for example, with the local electronic device in fig. 8. In some embodiments, the communication device 1103 includes one or more wired or wireless communication circuits, including, for example, a wired Ethernet card, USB, etc., and wireless communication circuits, including, for example, a wireless network card (WiFi), a 2G/3G/4G/5G mobile communication module, Bluetooth, infrared, etc.

The storage device 1101 stores at least one computer program. In some embodiments, the storage 1101 comprises at least one memory for storing at least one computer program; in embodiments, the memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid state storage devices. In certain embodiments, the memory may also include memory that is remote from the one or more processors, such as network attached memory that is accessed via RF circuitry or external ports and a communications network, which may be the internet, one or more intranets, local area networks, wide area networks, storage area networks, and the like, or suitable combinations thereof. The memory controller may control access to the memory by other components of the device, such as the CPU and peripheral interfaces.

A processing device 1102, configured to run the computer program to execute and implement the visualization data service method. In some embodiments, the processing device 1102 comprises at least one processor, which is connected to the at least one memory, and configured to execute and implement at least one embodiment described in the above visual user classification method, such as the embodiment described in fig. 1, when the at least one computer program is run. In an embodiment, the processor is operatively coupled with a memory and/or a non-volatile storage device. More specifically, the processor may execute instructions stored in the memory and/or the non-volatile storage device to perform operations in the computing device, such as generating image data and/or transmitting image data to an electronic display. As such, at least one of the processors may comprise one or more general purpose microprocessors, one or more special purpose processors, one or more field programmable logic arrays, or any combination thereof.

Fig. 12 is a schematic block diagram illustrating a visual user classification system according to an embodiment of the present application.

As shown, the visual user classification system includes: an input module 1201, configured to obtain input information; a processing module 1202, configured to obtain visual data obtained according to the input information and user group data of the user group; wherein the input information is used for setting a fraud detection algorithm, algorithm parameters and at least one user behavior feature set for processing the user group data to determine suspected fraudulent users; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view; and an output module 1203, configured to perform visual output according to the visual data.

In certain embodiments, the distribution view includes any one or more of: 1) a first user distribution view reflecting that the user group is mapped based on the behavior similarity on the user behavior feature set 2) a second user distribution view reflecting that suspected fraudulent users in the user group are mapped based on the similarity on at least part of the user behavior features in the user behavior feature set; 3) a third user distribution view formed by grouping and mapping the behavior similarity of each suspected fraudulent user on at least one user behavior feature in the user behavior feature set, wherein each suspected fraudulent user group is displayed in a distinguishing way; 4) and reflecting a fourth user distribution view mapped by each member in the suspected fraud user group based on the behavior similarity on the original value of the user behavior feature set.

In some embodiments, the behavioral similarity is measured based on a weighted result of the behavioral similarities of the plurality of user behavioral features in the set of user behavioral features.

In some embodiments, the behavior similarity of each two users on each user behavior feature is related to: the user behavior characteristics obtained based on the user group data statistics are first probability distributions of various values and relative entropies between the first probability distributions and second probability distributions when one value is collided on the user behavior characteristics of the two users; or; a relative entropy sum related to a plurality of relative entropies corresponding at a plurality of said value collisions; wherein, the larger the relative entropy or the relative entropy sum is, the lower the behavior similarity between the two users is.

In some embodiments, the mapping distance is a mapping result of a collision distance; the size of the collision distance is inversely related to the degree of the behavior similarity.

In certain embodiments, the collision distance is scaled up in negative correlation to the similarity of behavior.

In some embodiments, the suspected fraudulent user in the second distribution view has the same display characteristics.

In some embodiments, the third distribution view is distinguished by different display characteristics to represent different groups of suspected fraudulent users.

In some embodiments, the display features include: one or more combinations of size, color, texture, gray scale, brightness, and numbering.

In some embodiments, the display characteristics corresponding to each suspected rogue user group are determined based on a predominant number of display characteristics of its members.

In some embodiments, the second distribution view and the third distribution view are respectively presented on a graphic page which can be mutually switched to display; and/or, the third distribution views formed corresponding to different third user behavior feature sets are respectively presented on the graphic pages which can be mutually switched and displayed.

In some embodiments, the visualization data is derived from low-dimensional data obtained by dimension reduction processing of the user population data.

In certain embodiments, a user distribution of at least one of the first, second, and third distribution views follows an estimated distribution; and the estimated distribution is obtained by performing kernel density estimation according to the original user distribution obtained by the behavior similarity.

In some embodiments, each user behavior feature pair has an importance, and the set of user behavior features and/or the selected subset is obtained according to the importance.

In some embodiments, the importance of each user behavior feature is determined by: the average information entropy of the user behavior feature in each suspected fraud user group and/or the average relative entropy of the value distribution of the user behavior feature of the user group relative to each relative entropy of the value distribution of the user behavior feature of each suspected fraud user group; wherein the lower the average information entropy or the higher the average relative entropy, the higher the importance.

In some embodiments, the visual output is used as a reference for adjusting one or more of the fraud detection algorithm, algorithm parameters, and at least one set of user behavior characteristics.

In some embodiments, the visual output is used as a reference for adjusting one or more of a fraud detection algorithm, an algorithm parameter, and at least one user behavior feature set, including any one or more of the following: 1) taking the difference between the second distribution view and the first distribution view as a reference basis, and indicating to adjust the user behavior characteristics in the user behavior characteristic set according to the importance; 2) taking the overall mixed situation caused by different suspected fraud user groups represented by different display characteristics in the third distribution view as a reference basis, and indicating whether to reduce the user behavior characteristics with lower importance in the user behavior characteristic set or reduce the weight values of the user behavior characteristics with lower importance in the algorithm parameters; 3) the number of different suspected fraud user groups represented by different display characteristics in the third distribution view is taken as a reference basis for indicating whether to add user behavior characteristics with higher importance degree in the user behavior characteristic set; 4) taking the mixed situation among different suspected fraudulent user groups represented by different display characteristics in at least one local area of the third distribution view as a reference basis, wherein the mixed situation is used for indicating whether to adjust a member threshold condition for screening the displayed suspected fraudulent user group or an edge threshold condition for dividing the edges of the suspected fraudulent user group according to the strength of the relationship among the suspected fraudulent users; 5) and taking the dense user distribution condition shown in the fourth distribution view as a quality evaluation basis for the suspected fraud user group, and indicating whether to adjust the user behavior characteristics in the user characteristic set.

In some embodiments, the visual user classification method includes: and differentially displaying each suspected fraudulent user which does not belong to the suspected fraudulent user group and is in the distribution set in the fourth distribution view for analysis.

In some embodiments, the set of user behavior features includes a plurality of categories of user behavior features.

In some embodiments, the user population data relates to e-commerce websites, and the classification of the user behavior characteristics includes: one or more of time-related, IP address-related, and phone number-related.

In some embodiments, the user group data pertains to a social networking site, and the classification of the user behavior characteristics comprises: one or more of time-related, IP address-related, source user-related, target user-related, and event-related.

It should be noted that the principle of the visualized data service system in the embodiment is similar to that in the embodiment of the visualized user classification method (for example, fig. 1), so that the technical details in the embodiment of the method can be applied in the present application, and therefore, the detailed description is not repeated. It should be noted that, the functional modules in the visual data service system may be implemented by a combination of computer software/electronic hardware/software and hardware, for example, by the computer device in fig. 10 running a computer software program.

As shown in fig. 13, a visual data service system in the embodiment of the present application is shown, including: a setting module 1301, configured to obtain a fraud detection algorithm, algorithm parameters, and at least one user behavior feature set determined according to input information; a data processing module 1302, configured to process user group data of a user group according to a fraud detection algorithm, algorithm parameters, and at least one user behavior feature set to generate a user classification result or visual data; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view; an output module 1303, configured to output the user classification result or the visualization data; wherein the outputted user classification result is used for externally generating the visualization data.

It should be noted that the principle of the visualized data service system in the present embodiment is similar to that of the visualized user service method (for example, fig. 9) in the foregoing embodiment, so that various technical details in the embodiment of the method can be applied in the present application, and therefore, the detailed description thereof is not repeated. It should be noted that, the functional modules in the visual data service system may be implemented by a combination of computer software/electronic hardware/software and hardware, for example, by the service device in fig. 11 running a computer software program.

Also provided in an embodiment of the present application is a computer-readable storage medium storing at least one computer program, which when invoked, executes and implements at least one embodiment of the visualized user classification method (for example, as shown in fig. 1) or the visualized data service method (for example, as shown in fig. 9).

These computer programs, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application.

In the embodiments provided herein, the computer-readable and writable storage medium may comprise read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a USB flash drive, a removable hard disk, or any other medium which can be used to store desired computer program code in the form of instructions or data structures and which can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable-writable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be non-transitory, tangible storage media. Disk and disc, as used in this application, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

In one or more exemplary aspects, the functions described in connection with the computer program(s) described in the methods of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. The steps of a method or algorithm disclosed herein may be embodied in a processor-executable software module, which may be located on a tangible, non-transitory computer-readable and/or writable storage medium. Tangible, non-transitory computer readable and writable storage media may be any available media that can be accessed by a computer.

The flowcharts and block diagrams in the figures described above of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, computer program segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In summary, the visual user classification method, the service method, the system, the device and the storage medium of the application acquire the input information; acquiring visual data obtained according to the input information and user group data of a user group; wherein the input information is used for setting a fraud detection algorithm, algorithm parameters and at least one user behavior feature set for processing the user group data to determine suspected fraudulent users; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view; and performing visual output according to the visual data. According to the scheme, the synchronization of the user on different behaviors can be intuitively and accurately shown according to the behavior similarity of the different user behaviors, and the method and the device are favorable for quickly and accurately analyzing the fraud behaviors or evaluating the quality of fraud detection.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. A visual user classification method is characterized by comprising the following steps:

acquiring input information;

acquiring visual data obtained according to the input information and user group data of a user group; wherein the input information is used for setting a fraud detection algorithm, algorithm parameters and at least one user behavior feature set for processing the user group data to determine suspected fraudulent users; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view;

and performing visual output according to the visual data.

2. The visual user classification method according to claim 1, characterized in that the distribution view comprises any one or more of the following: 1) a first user distribution view reflecting that the user group is mapped based on the behavior similarity on the user behavior feature set 2) a second user distribution view reflecting that suspected fraudulent users in the user group are mapped based on the similarity on at least part of the user behavior features in the user behavior feature set; 3) a third user distribution view formed by grouping and mapping the behavior similarity of each suspected fraudulent user on at least one user behavior feature in the user behavior feature set, wherein each suspected fraudulent user group is displayed in a distinguishing way; 4) and reflecting a fourth user distribution view mapped by each member in the suspected fraud user group based on the behavior similarity on the original value of the user behavior feature set.

3. The visual user classification method according to claim 1, characterised in that the behaviour similarity is measured based on a weighted result of the behaviour similarities of a plurality of user behaviour features in a set of user behaviour features.

4. The visual user classification method according to claim 1, characterized in that the behavior similarity of each two users on each user behavior feature is related to: the user behavior characteristics obtained based on the user group data statistics are first probability distributions of various values and relative entropies between the first probability distributions and second probability distributions when one value is collided on the user behavior characteristics of the two users; or; a relative entropy sum related to a plurality of relative entropies corresponding at a plurality of said value collisions; wherein, the larger the relative entropy or the relative entropy sum is, the lower the behavior similarity between the two users is.

5. The visual user classification method according to claim 1 or 4, characterized in that the mapping distance is a mapping result of collision distance; the size of the collision distance is inversely related to the degree of the behavior similarity.

6. The visual user classification method according to claim 5, characterized in that the collision distance is scaled up in negative correlation with the behavior similarity.

7. The visual user classification method according to claim 2, characterized in that suspected fraudulent users in the second distribution view have the same display characteristics.

8. The visual user classification method according to claim 2, characterized in that the third distribution view is distinguished by different display characteristics to represent different groups of suspected fraud users.

9. The visual user classification method according to claim 7 or 8, characterized in that the display features comprise: one or more combinations of size, color, texture, gray scale, brightness, and numbering.

10. The visual user classification method of claim 8, where the display characteristics corresponding to each suspected fraudulent user group are determined based on the display characteristics of the predominant number of its members.

11. The visual user classification method according to claim 2, characterized in that the second and third distribution views are respectively presented on graphical pages that can be mutually switched to be displayed; and/or, the third distribution views formed corresponding to different third user behavior feature sets are respectively presented on the graphic pages which can be mutually switched and displayed.

12. The method of claim 1, wherein the visualization data is derived from low-dimensional data obtained by dimension reduction of the user population data.

13. The visual user classification method according to claim 2, characterized in that the user distribution of at least one of the first, second and third distribution views follows an estimated distribution; and the estimated distribution is obtained by performing kernel density estimation according to the original user distribution obtained by the behavior similarity.

14. The visual user classification method according to claim 2, characterised in that each user behaviour feature correspondence has an importance.

15. The visual user classification method according to claim 14, characterized in that the importance of each user behavior feature is given by: the average information entropy of the user behavior feature in each suspected fraud user group and/or the average relative entropy of the value distribution of the user behavior feature of the user group relative to each relative entropy of the value distribution of the user behavior feature of each suspected fraud user group; wherein the lower the average information entropy or the higher the average relative entropy, the higher the importance.

16. The visual user classification method according to claim 1, characterized in that the visual output is used as a reference for adjusting one or more of the fraud detection algorithm, algorithm parameters and at least one user behavior feature set.

17. The visual user classification method according to claim 2, wherein the visual output is used as a reference for adjusting one or more of fraud detection algorithm, algorithm parameters and at least one user behavior feature set, and comprises any one or more of the following combinations:

1) taking the difference between the second distribution view and the first distribution view as a reference basis for indicating that the user behavior characteristics in the user behavior characteristic set are adjusted according to the importance degree;

2) taking the overall mixed situation caused by different suspected fraud user groups represented by different display characteristics in the third distribution view as a reference basis, and indicating whether to reduce the user behavior characteristics with lower importance in the user behavior characteristic set or reduce the weight values of the user behavior characteristics with lower importance in the algorithm parameters;

3) the number of different suspected fraud user groups represented by different display characteristics in the third distribution view is taken as a reference basis for indicating whether to add user behavior characteristics with higher importance degree in the user behavior characteristic set;

4) taking the mixed situation among different suspected fraudulent user groups represented by different display characteristics in at least one local area of the third distribution view as a reference basis, wherein the mixed situation is used for indicating whether to adjust a member threshold condition for screening the displayed suspected fraudulent user group or an edge threshold condition for dividing the edges of the suspected fraudulent user group according to the strength of the relationship among the suspected fraudulent users;

5) and taking the dense user distribution condition shown in the fourth distribution view as a quality evaluation basis for the suspected fraud user group, and indicating whether to adjust the user behavior characteristics in the user characteristic set.

18. The visual user classification method according to claim 2, comprising:

and differentially displaying each suspected fraudulent user which does not belong to the suspected fraudulent user group and is in the distribution set in the fourth distribution view for analysis.

19. The visual user classification method according to claim 1, characterised in that the set of user behavior characteristics comprises a plurality of classes of user behavior characteristics.

20. The visual user classification method of claim 19, wherein the user population data pertains to e-commerce web sites, and the classification of the user behavior characteristics comprises: one or more of time-related, IP address-related, and phone number-related.

21. The visual user classification method of claim 19, where the user group data pertains to social networking sites, and the classification of the user behavior characteristics comprises: one or more of time-related, IP address-related, source user-related, target user-related, and event-related.

22. A visual user classification system, comprising:

the input module is used for acquiring input information;

the processing module is used for acquiring visual data obtained according to the input information and user group data of the user group; wherein the input information is used for setting a fraud detection algorithm, algorithm parameters and at least one user behavior feature set for processing the user group data to determine suspected fraudulent users; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view;

and the output module is used for carrying out visual output according to the visual data.

23. A visual data service method, comprising:

acquiring a fraud detection algorithm, algorithm parameters and at least one user behavior feature set determined according to input information;

processing user group data of a user group according to a fraud detection algorithm, algorithm parameters and at least one user behavior feature set to generate a user classification result or visual data; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view;

outputting the user classification result or visualization data; wherein the outputted user classification result is used for externally generating the visualization data.

24. A visual data service system, comprising:

the setting module is used for acquiring a fraud detection algorithm, algorithm parameters and at least one user behavior characteristic set which are determined according to input information;

the data processing module is used for processing user group data of a user group according to a fraud detection algorithm, algorithm parameters and at least one user behavior characteristic set so as to generate a user classification result or visual data; wherein the visualization data for display comprises: reflecting a distribution view formed by the user group based on behavior similarity on one or more user behavior characteristics in at least one user behavior characteristic set; wherein the degree of the behavior similarity among the users is inversely related to the size of the mapping space of the users in the distribution view;

the output module is used for outputting the user classification result or the visualized data; wherein the outputted user classification result is used for externally generating the visualization data.

25. A computer device, comprising:

a storage device storing at least one computer program;

processing means for running said computer program to perform and implement the visual user classification method of any one of claims 1 to 21.

26. A service device, comprising:

communication means for communicating with the outside;

a storage device storing at least one computer program;

processing means for running said computer program to perform and implement the visual data service method of claim 23.

27. A computer-readable storage medium, characterized in that at least one computer program is stored which, when being invoked, executes and implements the visual user classification method according to any one of claims 1 to 21 or the visual data service method according to claim 23.