CN108268624B - User data visualization method and system - Google Patents

User data visualization method and system Download PDF

Info

Publication number
CN108268624B
CN108268624B CN201810022133.7A CN201810022133A CN108268624B CN 108268624 B CN108268624 B CN 108268624B CN 201810022133 A CN201810022133 A CN 201810022133A CN 108268624 B CN108268624 B CN 108268624B
Authority
CN
China
Prior art keywords
group
data
user
event
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810022133.7A
Other languages
Chinese (zh)
Other versions
CN108268624A (en
Inventor
徐葳
孙娇
姚期智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huakong Tsingjiao Information Technology Beijing Co Ltd
Original Assignee
Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huakong Tsingjiao Information Technology Beijing Co Ltd filed Critical Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority to CN201810022133.7A priority Critical patent/CN108268624B/en
Publication of CN108268624A publication Critical patent/CN108268624A/en
Application granted granted Critical
Publication of CN108268624B publication Critical patent/CN108268624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Abstract

The application provides a user data visualization method and system. Wherein the method comprises the following steps: acquiring a group data set, wherein the data characteristics of the data set comprise user information, an IP address, an event type, an event origin, an event responder and event occurrence time; wherein the data characteristics of the data sets are determined as different decision priorities; displaying a decision tree graph to characterize the attribute testing process of all users in the group, wherein: displaying the data characteristics of the first priority of the root node of the decision tree graph and a decision value range of the data characteristics; displaying final attributes of at least one user characterized by each leaf node of the decision tree graph; displaying the current attributes, the data characteristics of the current priority and the decision value range of the data characteristics of a plurality of users represented by each non-leaf node of the decision tree graph; and displaying the decision paths corresponding to the root node or each non-leaf node in the decision tree graph, wherein the decision paths are represented by lines with different colors, shapes or thicknesses.

Description

User data visualization method and system
Technical Field
The present application relates to the field of computer processing technologies, and in particular, to a user data visualization method and system.
Background
Online fraud, which is now well known to the public as a dark aspect of the internet, causes immeasurable losses worldwide each year. In 2015, the internet crime complaint center receives millions of complaints about fraud problems worldwide, online fraud causes billions of economic losses worldwide every year, and fraudulent users generally get a reward from helping to promote a specific commodity or distribute junk information. In internet finance, fraudulent users apply for loans with false identities, purchase goods with credit cards they steal, and even perform illegal activities such as money laundering. Therefore, in internet business scenarios, the need to find suitable anti-fraud algorithms becomes increasingly critical.
Although there are many methods for identifying fraud on the internet today, due to the limitations of the constructed fraud event detection system, the credibility of the screened data corresponding to the suspected fraud person requires a large amount of subsequent human verification, for example, the platform supervisor needs to check and verify one by one. This makes the revision of algorithm parameters, the design of data feature priorities, the selection of algorithm models, etc. in the fraud event detection system not only require the software design of algorithm experts, but also require the participation of domain experts. Therefore, improving the transparency of the fraud identification algorithm can effectively improve the fraud event detection accuracy, so that how to realize data visualization is an urgent problem to be solved in the field.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present application aims to provide a method and a system for visualizing user data, which are used to solve the problem of visualization of fraud identification algorithms in the prior art.
To achieve the above and other related objects, a first aspect of the present application provides a user data visualization method applied in a fraud event detection system, the visualization method including the steps of: acquiring a group of data sets, wherein the data characteristics of the data sets comprise user information, IP addresses, event types, event origin, event responders and event occurrence time; wherein the data characteristics of the data sets are determined as different decision priorities; displaying a decision tree graph to characterize the attribute testing process of all users in the group, wherein: displaying the data characteristics of the first priority of the root node of the decision tree graph and a decision value range of the data characteristics; displaying a final attribute of at least one user characterized by each leaf node of the decision tree graph; displaying the current attributes, the data characteristics of the current priority and the decision value range of the data characteristics of a plurality of users represented by each non-leaf node of the decision tree graph; and displaying the decision paths corresponding to the root node or each non-leaf node in the decision tree graph, wherein the decision paths are represented by lines with different colors, shapes or thicknesses.
A second aspect of the present application provides a computer device comprising: a processor; a presentation engine executing on the processor, the presentation engine to perform a user data visualization method as any one of above.
A third aspect of the present application provides a user data visualization system, comprising: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a data set of a group, and the data characteristics of the data set comprise user information, an IP address, an event type, an event origin, an event responder and event occurrence time; wherein the data characteristics of the data sets are determined as different decision priorities; the display module is used for displaying a decision tree graph to represent the attribute testing process of all users in the group, wherein the data characteristics of the first priority of the root node of the decision tree graph and the decision value range of the data characteristics are displayed; displaying a final attribute of at least one user characterized by each leaf node of the decision tree graph; displaying the current attributes, the data characteristics of the current priority and the decision value range of the data characteristics of a plurality of users represented by each non-leaf node of the decision tree graph; and displaying the decision paths corresponding to the root node or each non-leaf node in the decision tree graph, wherein the decision paths are represented by lines with different colors, shapes or thicknesses.
In a fourth aspect, the present application provides a client connected to a server via a network, wherein the client logs in the server based on a request to send to execute the steps of the user data visualization method described in any one of the above.
In a fifth aspect, the present application provides a server connected to a client via a network, wherein the server sends the process of the user data visualization method according to any one of the above to the client based on an operation requested by the client to execute, and displays an execution result by the client.
In a sixth aspect, the present application provides a browser, connected to a server through a network, where the browser logs in the server to execute the steps of the user data visualization method based on a request to send.
In a seventh aspect, the present application provides a computer-readable storage medium storing a data visualization computer program, wherein the data visualization computer program is configured to implement the steps of the user data visualization method according to any one of the above-mentioned items when executed.
As described above, the user data visualization method and system of the present application have the following beneficial effects: the grouping process of the group users determined in the fraudulent event detection process, the data characteristic distribution, the classification list and other modes are presented, so that the grouping in the fraudulent event detection period is displayed in various relation interfaces, and the domain experts and the algorithm experts can evaluate and revise the detection algorithm of the fraudulent event detection system.
Drawings
Fig. 1 is a flowchart illustrating a method for visualizing user data according to an embodiment of the present application.
FIG. 2 shows a flow chart for obtaining a group data set according to an embodiment of the present disclosure.
FIG. 3 illustrates an interface including a plurality of groups according to an embodiment of the present application.
FIG. 4 is a diagram illustrating a group user decision tree graph according to an embodiment of the present invention.
FIG. 5 is a display interface showing an embodiment of the present application in which the decision tree graph further includes the number of users classified into each node.
Fig. 6 is an interface diagram showing an operation log of a target user on a time axis on the left side and a group decision tree graph on the right side in an embodiment of the present application.
FIG. 7 is a diagram illustrating a list interface for a group of data sets shown in an embodiment of the present application.
FIG. 8 is an interface diagram illustrating a distribution of characteristics of entropy of information registered for a time dimension in a cluster in a network, according to an embodiment of the present application.
FIG. 9 is a flow chart illustrating an interface for displaying a feature distribution of the data sets of the group according to an embodiment of the present application.
FIG. 10 is a flowchart illustrating steps performed by the present application to distribute a plurality of groups among a cluster, in one embodiment.
FIG. 11 illustrates the present application in one embodiment showing a distribution interface of groups in a cluster.
FIG. 12 is a block diagram of a computer device according to an embodiment of the present invention.
Fig. 13 is a schematic block diagram of a user data visualization system provided in the present application.
Detailed Description
The following description of the embodiments of the present application is provided for illustrative purposes, and other advantages and capabilities of the present application will become apparent to those skilled in the art from the present disclosure.
In the following description, reference is made to the accompanying drawings that describe several embodiments of the application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.
In the fraud detection technology, domain experts provide experience of data classification and requirements for accuracy of classification results for core technology of fraud identification, but the algorithm architecture itself and parameters in the algorithm are not well known to them. The domain expert can not obtain the data classification mode during the detection period, and when the fraud event detection system is used for obtaining the detection result, the domain expert can not judge the accuracy of the obtained detection result except for verifying the detection result. In order to improve the accuracy of the fraud detection system, the application provides a user data visualization method applied to the fraud detection system, groups and data sets thereof obtained by classification in the fraud detection system are displayed to algorithm experts and domain experts in a visualization mode, so that different domain experts or algorithm experts can explore various fraud behaviors through various interactive means, and the fraud detection algorithm can be flexibly modified according to the characteristics of fraud.
The user data visualization method is mainly executed by a computer device. The computer device may be a suitable computer device such as a handheld computer device, a tablet computer device, a notebook computer, a desktop computer, a server, or the like. The computer device includes a display, input means, input/output (I/O) ports, one or more processors, memory, non-volatile storage, network interfaces, and power supplies, among others. The various components described may include hardware elements (e.g., chips and circuits), software elements (e.g., a tangible, non-transitory computer-readable medium storing instructions), or a combination of hardware and software elements. Further, it is noted that the various components may be combined into fewer components or separated into additional components. For example, the memory and the non-volatile storage device may be included in a single component. The computer device can execute the visualization method alone or in cooperation with other computer devices. In some embodiments, a computer device performs a visualization method and presents a corresponding visualization interface. For example, the computer device includes a processor, a display, wherein a rendering engine (or a display engine) is executed on the processor, the rendering engine is used for executing the user data visualization method and displaying the user data through the display, and herein, the rendering engine includes but is not limited to software and hardware capable of parsing the interface display developed based on a program language, such as a scripting language like XML, HTML, etc., C language, etc. In still other embodiments, one computer device performs the visualization method and provides a corresponding visualization interface to another computer device for presentation. For example, the client initiates a request to the server based on a request operation of the user and logs in the server, the server executes a visualization method to form corresponding interface data and feeds the interface data back to the client, and a browser or a customized application program of the client displays a corresponding diagram according to the corresponding interface data.
The visualization method is applicable to a fraud event detection system. The fraud event detection system may include software and hardware in one or more computer devices. To provide the domain expert with what a group does as a fraudulent group and what the algorithm expert proposes "whether users of the same group all have the same behavioural habit". The application provides a visualization method in the grouping process of group users. Please refer to fig. 1, which is a flowchart illustrating a method for visualizing user data according to an embodiment of the present application. As shown in the figure, the user data visualization method includes the following steps:
in step S11, a data set of one group is acquired. The data characteristics of the data set include at least user information, an IP address, an event type, an event origin, an event responder, and an event occurrence time. The user information refers to information capable of characterizing the user identity, such as a user ID, a unique user nickname, a certificate number, and the like. The user information further includes: mobile phone number, mailbox, ID number, gender, user equipment number used by the user, registration time and the like. The IP address represents the IP address or IP address segment or IP address packet of the corresponding computer equipment when the same user information generates an event in the network. The event type is a type recorded in the network operation log and represents a user behavior event, and includes but is not limited to: the network users can perform at least one of social behaviors such as attention, praise, comment and give away among the network users, or operation behaviors such as login, logout, state update, registration and information modification among the network users. The same user information may correspond to at least one event type, each event type corresponding to an event origination, an event responder, and an event occurrence time. For example, the same user information may correspond to a plurality of event types, each corresponding to a respective event origin, event responder and event occurrence time. The event origin refers to user information and the like for initiating an event type. The event responder includes target user information of the initiated event type, and the like.
Here, the group grouping of the collected users is determined in terms of data characteristics based on the collected member clusters. A detection algorithm (such as an unsupervised detection algorithm) of group members participating in fraud time is preset in the fraud event detection system. The detection algorithm classifies all members step by step based on the decision priorities of all data features of the collected members in order to accurately classify the group members. Different fraud events correspond to different decision priority detection algorithms.
In some embodiments, the detection algorithm performs decision classification based on similarity of data features of all members. Specifically, referring to fig. 2, which is a flowchart illustrating an embodiment of the present application for obtaining a group data set, as shown in the figure, the step S11 further includes:
step S111, obtaining an operation log of a cluster formed by a plurality of network users; in different embodiments, the cluster is a cluster formed by all network users that can be acquired, and the network users in the cluster are from the same website or different websites, or from different network channels, such as the internet, one or more intranets, Local Area Networks (LANs), wide area networks (WLANs), storage local area networks (SANs), and the like, or a suitable combination thereof, or a mobile communication network of a mobile phone, and the like.
Step S112, determining at least one data characteristic from the operation logs of the plurality of network users, and analyzing the similarity of at least one group of data characteristics in the operation logs to determine the group; in a specific embodiment, aiming at the characteristic that the network fraud behavior inevitably leaves user use data in the network, the fraud event detection system collects operation logs of a plurality of network users from at least one website, and groups the users generating the corresponding operation logs by analyzing the similarity of at least one data feature in the operation logs to obtain groups and data sets of the groups in the operation logs.
In some embodiments, the data sets located in a group include, but are not limited to, data characteristics of at least two of user information, IP address, event type, source of event occurrence, event responder, and event occurrence time. The user information includes a mobile phone number, a mailbox, an ID number, an identification number, a gender, a user equipment number used by the user, registration time and the like. Wherein the same user information may correspond to at least one event type, each event type corresponding to an event origin, an event responder and an event occurrence time. The event features include, but are not limited to: the network users perform at least one of social behaviors such as attention, praise, comment and give away (or referred to as gift sending) among the network users, and operation behaviors such as login, logout, state update, registration and information modification among the network users. For example, the same user information may correspond to a plurality of event types, each corresponding to a respective event origin, event responder and event occurrence time.
Step S113, a data set of the group is acquired. In some embodiments, the data set may be obtained from a database storing the groups and their data sets, for example, on a remote storage server or in a storage device in a local computer device, and the obtained data set of one group may be obtained by extracting it from the database based on an input operation of a user. For example, the fraud detection system obtains a plurality of groups by using an unsupervised detection algorithm, and the user selects one of the groups through the selection interface to obtain the data set of the corresponding group.
Specifically, the fraud event detection system calculates the similarity of all data in the operation log in the same type of data features, wherein the similarity can be measured by using information entropy, for example, the fraud event detection system calculates the information entropy of the dimension of the IP usage amount or the maximum IP usage amount by using user information, calculates the information entropy of the dimension of the operation type by using the event type, and calculates the information entropy of the dimension of poor operation by using the information entropy of the registration time dimension or the operation time; by means of the calculation, each obtained information entropy is detected in an unsupervised detection mode and is divided into a plurality of groups. The unsupervised detection mode includes, for example, using a dense subgraph-based algorithm or a vector space-based algorithm. The groups presented by the visualization method provided by the application are used for reflecting shared resources, user relations and the like used by the fraud event, so that a user using the fraud event detection system can more clearly determine whether the classification strategy in the unsupervised detection algorithm is reasonable. Wherein, the shared resources include but are not limited to shared IP, mailbox, etc., and the user relationship includes but is not limited to: user attention, interaction, etc.
In one embodiment, the visualization method further comprises the step of displaying at least one group interface, the group size in the group interface being characterized by the displayed geometric figure size. Referring to fig. 3, an interface including a plurality of groups shown in an embodiment of the present application is shown, as shown in the figure, 11 groups are shown in the interface, and a geometric figure used for representing the groups is a circle, the 11 groups are all located in a maximum dotted circle, for example, the dotted circle is used for representing a cluster composed of N network users, for example, the group with the reference number 0 is a normal group, and 10 groups with different sizes with the reference numbers 1-10 are located in a smaller dotted circle, and the size of the circle is proportional to the number of members of the group, that is, a large group represents a larger number of members, a small group represents a smaller number of members, and a group with the reference numbers 1-10 is an abnormal group. In different embodiments, the geometric figures of the groups may be of arbitrary shape. The colors of the geometric figures may be randomly set or related to the number of groups or members of a group. For example, N colors are preset, and the fraud detection system randomly corresponds different colors to the geometric figures representing the groups. For another example, the fraud event detection system sequentially corresponds to the geometric figures representing each group according to a preset color sequence and a sequence from small member number to large member number. When a user selects a geometric figure by operating the display interface, the fraud event detection system obtains a group of data sets.
In a preferred embodiment, the display of at least one group interface may further include an information bar for displaying group information, and when a user selects one group in the group interface, basic information of the group is displayed in a form of a window or a text box on one side of the interface, where the basic information is, for example: group encoding, number of members, data characteristics for determining the most preferred group, group attributes (such as normal group or abnormal group), etc.
In order to show the grouping decision process, the fraud detection system executes step S12 after grouping, and displays the grouping process of the group users classified according to the decision priority in the corresponding detection algorithm in the form of a tree structure, so that the domain expert and/or the algorithm expert can solve the deficiencies and defects in the corresponding detection algorithm through a visual interface.
In step S12, a decision tree graph is displayed to characterize the attribute testing process of all users in the group. The attributes of the user may include Normal user (Normal) and Abnormal user (Abnormal), or include Normal user (Normal), fraudulent role a (Abnormal a), fraudulent role b (Abnormal b), and so on. In the display interface, the step represents the process of classifying user attributes of the same group step by step from the highest priority to the lowest priority by using a detection algorithm from a fraud event detection system through a decision path or each non-leaf node to each leaf node in a tree structure. Wherein the following graphical representations are displayed in the display interface: the data characteristics of the first priority of the root node of the decision tree graph and the decision value range of the data characteristics; final attributes of at least one user characterized by each leaf node of the decision tree graph; the decision tree graph comprises the current attributes of a plurality of users represented by each non-leaf node, the data characteristics of the current priority and decision value ranges thereof; and the decision paths corresponding to the root node or each non-leaf node in the decision tree graph are characterized by lines with different colors, shapes or thicknesses. As can be seen from the decision tree graph, the users classified into each leaf node are determined to be detected as normal users or abnormal user final attributes, and the users classified into each non-leaf node need to be classified continuously until being allocated to the determined leaf node to determine the final attributes (namely, normal users and abnormal users) of the corresponding users.
The decision result of the decision tree is obtained by classification according to the relation between the original value of each user data characteristic and the corresponding decision value threshold through step-by-step analysis. For example, a selected user in the fraud detection system performs a grouping decision on all users by calculating the IP usage amount according to the priority from high to low by using the maximum IP usage amount, the out _ degree of the user in the social network and the user information after pruning the decision tree. The unsupervised detection algorithm includes, for example, a dense subgraph-based algorithm or a vector space-based algorithm. The groups presented by the visualization method provided by the application are used for reflecting shared resources, user relations and the like used by the fraud event, so that a user using the fraud event detection system can more clearly determine whether the classification strategy in the unsupervised detection algorithm is reasonable. Wherein, the shared resources include but are not limited to shared IP, mailbox, etc., and the user relationship includes but is not limited to: user attention, interaction, etc.
Please refer to fig. 4, which is a diagram illustrating a group user decision tree graph. As shown in the figure, the root node of the decision tree graph displays the highest priority data characteristic as the maximum IP usage, and measures the attribute classification of the group users by using the maximum IP usage, that is, when the maximum IP usage (max _ IP _ used _ be _ used _ amount) corresponding to one user is less than or equal to 80.5, the phase-applied user is classified to the first non-leaf node along the "blue" color decision path, otherwise, the phase-applied user is classified to the first leaf node along the "yellow" color decision path. And the first non-leaf node continuously classifies and judges the acquired users according to the data characteristic that the event origin is the second priority, namely, the event origin is used for measuring the attribute classification of the currently acquired users, when the out _ degree of the event origin corresponding to one user in the social network is less than or equal to 711.0, the corresponding user is classified to the second leaf node along a 'blue' color decision path, otherwise, the corresponding user is classified to the second non-leaf node along a 'yellow' color decision path. And the second non-leaf node continuously classifies and judges the acquired users according to the data characteristics of which the IP usage is the third priority, namely, the IP usage is used for measuring the attribute classification of the currently acquired users, when the IP usage (IP _ used _ amount) corresponding to one user is less than or equal to 870.0, the phase-applied user is classified to a third leaf node along a 'blue' color decision path, otherwise, the phase-applied user is classified to a fourth leaf node along a 'yellow' color decision path. Here, "blue" represents a decision path classified as "normal user" at a currently prioritized non-leaf node, and "yellow" represents a decision path classified as "abnormal user" at a current priority. Wherein, the value ranges of the corresponding information measured by the user in each non-leaf node, such as 80.5, 711.0, 870.0 and the like shown in the figure, are decision value ranges corresponding to the current priority data characteristics.
In different embodiments, lines of different shapes may also be used to represent the difference of the decision path during display, for example, a solid line is used to represent the determined user attribute as a normal user, a dashed line is used to represent the determined user attribute as an abnormal user, or a straight line is used to represent the determined user attribute as a normal user, and a curved line is used to represent the determined user attribute as an abnormal user, or a thick line is used to represent the difference of the decision path, for example, a thin line is used to represent the determined user attribute as a normal user, and a thick line is used to represent the determined user attribute as an abnormal user.
In order to more clearly see the number of users acquired by each non-leaf node and leaf node, in the attribute testing process of displaying a decision tree graph to characterize all users in the group, the root node in the decision tree graph also displays the number of users in the group (i.e. the sample size given by the root node), and the number of users of the current attribute (i.e. the sample size acquired by the current non-leaf node) in each non-leaf node of the decision tree graph. Referring to fig. 5, the decision tree graph further includes a display interface for displaying the number of users classified into each node, where the sample _ size displayed in the root node is the sample size given by the root node, i.e. the total number of group members, the sample _ sizes displayed in other non-leaf nodes are the sample size obtained by the current non-leaf node, and the sample _ size displayed in the leaf node represents the number of users classified into the node itself by the previous stage.
It should be noted that, according to the type of the fraud event and the design of the unsupervised detection algorithm, in the process of detecting each group decision classification in the operation log, the priority of each data feature, the decision value range of each priority, the relationship between the upper and lower adjacent priorities, the decision paths of each level, and the like may be different. Even in order to obtain the group decision result of each user in the operation log more quickly, the used unsupervised detection algorithm can intercept and select the selected data features according to the convergence condition during training, namely when the trained detection algorithm reaches the convergence condition, the rest data features are pruned, and the pruned data features are not displayed on the display interface of the decision tree graph. Or, if all the users in the obtained group are determined to be in the classification of the previous stages in the detection algorithm, the remaining data features are pruned, and the display module only displays the decision tree graph including all the decision paths and the nodes connected with the decision paths. When the visualization method is used for displaying the classification decision process of a group, domain experts and algorithm experts can evaluate the accuracy of the detection algorithm more easily.
In a display interface of the decision tree graph or in another display interface that jumps based on the obtained operation instruction, the visualization method further includes: determining one user in the group as a target user; and displaying a time axis on one side of the decision tree graph to present the operation log of the target user on the time axis.
When a domain expert or an algorithm expert clicks one leaf node and selects one of pop-up windows of the leaf node to be linked by a user, operation logs of the corresponding user on a time axis are displayed beside a decision tree graph. Please refer to fig. 6, which is an interface diagram showing the operation log of the target user on the time axis on the left side and the group decision tree graph on the right side. As shown in the figure, time sequence nodes in the operation log according to the time sequence are marked from top to bottom, and an event type (e.g., event _ type), an event generation time (e.g., timestamp), user information (e.g., user _ id), an IP address (e.g., complete IP address or IP segment), an event responder (e.g., target _ user), event content (e.g., comment _ id, comment _ length, amount, object _ id, target _ video, etc.), an event type (e.g., event _ type), and the like in the operation log corresponding to a corresponding time point are displayed beside each time sequence node. By displaying the operation history of each user in the group on the time axis, the domain experts and the algorithm experts can check the accuracy of the detected user attributes in the same group and the common relationship between normal users and abnormal users in the same group in detail, so that the defects and the shortcomings of the detection algorithm can be confirmed.
In other embodiments, domain experts and algorithm experts are not only concerned with the member attribute classification process for groups, but also with whether the assigned groups are reasonable, which requires them to be able to view detailed data features in each group and to look at the order of preference of each data feature constructed for classifying groups from another dimension. The visualization method may include the step of displaying an interface of the data sets of one group. The displayed data sets are displayed in a list, thereby displaying detailed information of the data features in the same group for the user. To improve the accuracy of the classification of the group data sets, the list displayed in the interface may display the list of data features in a group by columns according to the classification priority according to which the fraud detection system is classified. For example, please refer to fig. 7, which shows a list interface diagram of a group of data sets displayed in an embodiment of the present application. In the list interface schematic diagram, the displayed data sets of a group are sorted from high to low according to the similarity of data features as priority. When the similarity of the data features in the first priority is the same, the data features in the second priority are sorted, and in the embodiment shown in fig. 7, the priority is in the order from high to low: IP address, source of event occurrence (source), responder of event (target), type of event (event _ type), and time of event occurrence (timestamp). In this embodiment, the head-up (header) of the table is encoded with the importance of different columns, and if the value of a feature is more concentrated, the feature is more important. In one embodiment provided herein, the fraud detection system represents this characteristic by computing the entropy of information for each feature. If the entropy of the information is lower, it means that the consistency is higher. Then, the fraud event detection system sorts the features according to the ascending order of the information entropy, and finally prompts the attention of the user by advancing the order of the list head with low information entropy, and certainly, under different implementation conditions, the color rendering can be performed according to the list head in the displayed table, for example, the color rendering of the list head with low information entropy as the deepest color can be performed to prompt the user to pay the most attention to the data features represented by the column, and so on, the color rendering of other data features represented by the column is performed, and then the data set list interface shown in the figure is obtained. The list interface may be displayed after the step of displaying a plurality of group interfaces, or before or after step S12, or based on a selection operation of the user selecting the list interface.
In some embodiments, to further characterize whether the acquired data set of the group can reflect the characteristics of a fraudulent event, it may be necessary to perform the presentation from other dimensions. The accuracy of the detected fraud events is further confirmed, for example, by comparing the normal user's network operation data to the group data set. To this end, the visualization method further comprises: a step of displaying an interface of feature distributions of the data sets of the group. The feature distribution interface may show the distribution of each data type in the whole network, the whole network is opposite, for example, a cluster is formed by a plurality of network users, the distribution of a certain data feature in a certain group in the cluster may be displayed through the interface, please refer to fig. 3, for example, the maximum dotted circle in fig. 3 represents a cluster formed by a plurality of network users, there are 11 groups in the cluster, each group is numbered 0-10, and one group is selected from the 11 groups for information display.
In some embodiments, the types of data that the feature distribution interface may present are, for example: the entropy of information in the average operation time interval dimension (average operation interval entropy), the entropy of information in the IP address usage dimension (IPused _ amount _ entropy), the entropy of information in the gender dimension (sex _ amount), the entropy of information in the email dimension (email _ amount), the entropy of information in the registration time dimension (reg _ time _ entry), the entropy of information in the operation time dimension (operation time _ entry), the entropy of information in the device number dimension (device _ amount _ entry), the entropy of information in the operation type dimension (operation type _ entry), the maximum entropy of information in which an IP used by others is used (maxIP used _ be _ amount), and the like. . In the embodiment shown in fig. 8, the information entropy of the registration time dimension is taken as an example of data characteristics, that is, fig. 8 shows the characteristic distribution of the information entropy of the registration time (registration period) dimension in one group in the network cluster. In order to effectively compare the difference between the obtained group data set and the feature distribution of the network operation data of the normal user, please refer to fig. 9, which is a flowchart of an interface for displaying the feature distribution of the group data set, as shown in the figure, including the following steps:
in step S211, one of the groups is selected and at least one data feature is determined from the data set of the group. In one embodiment, for example, the group labeled 2 in fig. 3 is selected, and a data characteristic that is user information, for example, registration time, is determined from the data set in the group labeled 2.
In step S212, the feature distribution of the determined at least one data feature in the group and cluster is counted. In this embodiment, the feature distribution of the data feature at the registration time in the group is counted, and the feature distribution of the data feature at the registration time in the whole cluster is counted.
In step S213, a histogram of the feature distribution and a distribution contrast map corresponding to the histogram in the entire cluster histogram are displayed. In this embodiment, based on the encoding of the data feature, a histogram of the feature distribution of the data feature at the registration time in the group is displayed, and a histogram of the feature distribution of the data feature at the registration time in the entire cluster is displayed. As shown in fig. 8, in the interface D, a graph (a) is displayed as a thumbnail of a feature distribution of registration time in the selected group denoted by 2, corresponding to the enlargement of the thumbnail, and is an enlarged graph (D) of the lowermost side in the interface D, as seen from the enlarged graph, in one month from 8 month 1 day to 8 month 31 day in the group, the time of registration operation by the group member is concentrated in 5 days of 8 months, 6 days of 8 months, 11 days of 8 months, 12 days of 8 months, and 5 days of 8 months, 16 days, and in the interface D, a graph (c) is characterized as a histogram of the distribution of registration time by the registration user in 8 months, and in the interface D, as seen from the graph (c), the registration distribution by the registration user in 8 months has a certain regularity, and in the interface D, a graph (b) is characterized as showing a data feature of registration time in the whole cluster and the selected group by superimposing the graph (D) and the graph (c) The difference in the groups. In order to enable the user to know the differences and the connections among different features, in the embodiment provided by the application, the histogram is presented in three layers, and after the user clicks one of the thumbnails, the page is scrolled to the normalized distribution contrast diagram. Of course, in a particular application, there may be multiple thumbnails of the data features, each representing a different data feature.
In some embodiments, the histogram may be further color-rendered to distinguish or emphasize the distribution of features of a data feature in the group and the entire cluster, or dynamically displayed (e.g., blinking) to distinguish or emphasize the distribution of features of a data feature in the group and the entire cluster.
In some embodiments, in order to further analyze the difference between the plurality of groups in one network cluster, the user data visualization method further includes a step of displaying an interface of feature distribution of the data set of the plurality of groups, please refer to fig. 10 and 11, fig. 10 is a flowchart illustrating a step of the present application in one embodiment of distributing the plurality of groups in the cluster, and fig. 11 is a flowchart illustrating a step of the present application in one embodiment of distributing the plurality of groups in the cluster, as shown in the figure, the steps include:
in step S311, a plurality of groups are determined in a cluster composed of a plurality of network users, and the groups are characterized by different shapes, icons, labels and/or colors, respectively; in one embodiment, for example, 3 groups numbered 0, 1 and 2 in fig. 3 are selected, wherein the group numbered 0 is represented by "color", the group numbered 1 is represented by "color red", and the group numbered 2 is represented by "color blue".
In step S312, at least one data feature is determined from the plurality of groups of data sets; in this embodiment, a data characteristic, such as an IP address, is determined from the 3 groups of data sets.
In step S313, analyzing relative entropy between each two network users in each group as a measure of similarity between each two network users based on the at least one data feature; in the present embodiment, the relative entropy (entropy of IP usage amount dimension) between each two network users in 3 groups of reference numerals 0, 1, and 2 is analyzed based on the IP address as a measure of the degree of similarity between the each two network users. For example, a data dimension reduction method t-SNE (t-distribution neighborhood embedding algorithm) is adopted, and relative entropy between two users is used as an index for measuring the distance between the network users.
In step S314, a display interface is output, in which the network users are characterized by shapes, icons, and/or labels, the differences of the plurality of groups are characterized by different colors, and the degree of similarity between two network users in each group is characterized by the displayed distance. In this embodiment, as shown in fig. 11, in an interface E, a dot is used to represent a network user, a "green" color represents a group with a reference number 0, a "red" color represents a group with a reference number 1, and a "blue" color represents a group with a reference number 2, wherein the "blue" color represents a shorter distance between users in the group with the reference number 2, the group is distributed in a cluster, the "red" color represents a shorter distance between users in the group with the reference number 1, the group is distributed in a cluster, the "green" color represents a distribution of randomly sampled normal users, and the distance between normal users is longer and the distribution is more dispersed. It can be considered that the greater the probability that a group, if it is a dense cluster, is considered to be a fraudulent group. For example, in the embodiment shown in fig. 11, the groups represented by the "green" colors are distributed more dispersedly, and thus the "green" color group is a normal group, and the user represented by the "green" dot is also a normal user. On the contrary, the group represented by "red" (i.e. the group labeled 1) and the group represented by "blue" (i.e. the group labeled 2) are distributed in a cluster, which means that the "red" and "blue" color groups are abnormal groups, wherein the users represented by the "red" point and the "blue" point are abnormal users. In one embodiment, a user using the visualization system can interactively view the specific information and feature values of the users in each group by hovering over a mouse.
In other embodiments, in the output interface, the network user may also be represented by, for example, a shape, an icon, and/or a label, for example, a geometric figure with a shape of triangle, rectangle, etc., for example, an icon is a smiling face or a crying face, a skeleton avatar, a pirate avatar, etc., for example, a label is a character or a symbol with definite distinction, etc.
The user data visualization method realizes the display of the grouped users in the fraud detection period in various relation interfaces by presenting the grouped users determined in the fraud detection process, data characteristic distribution, classification lists and other modes, and is beneficial to field experts and algorithm experts to evaluate and revise the detection algorithm of the fraud detection system.
The present application also provides a computer device that may be a suitable computer device such as a handheld computer device, a tablet computer device, a notebook computer, a desktop computer, a server, or the like. The computer device includes a display, input means, input/output (I/O) ports, one or more processors, memory, non-volatile storage, network interfaces, and power supplies, among others. The various components described may include hardware elements (e.g., chips and circuits), software elements (e.g., a tangible, non-transitory computer-readable medium storing instructions), or a combination of hardware and software elements. Further, it is noted that the various components may be combined into fewer components or separated into additional components. For example, the memory and the non-volatile storage device may be included in a single component. The computer device can execute the visualization method alone or in cooperation with other computer devices.
Referring to fig. 12, which is a schematic block diagram of a computer device according to an embodiment of the present application, as shown in the figure, in the embodiment, the computer device 1 includes one or more processors 11 and a presentation engine 12 executed on the processors 1, so as to execute the visualization method and display a corresponding visualization interface. For example, the computer device includes a processor 11, a display, and a presentation engine 12 executed on the processor 11, wherein the presentation engine (or display engine) executed on the processor 11 is configured to execute the user data visualization method described in the above embodiments and display the user data through the display, and the implementation process of executing the user data visualization method is described with reference to the description of fig. 1 to 11. In a specific implementation state, the rendering engine is stored on a memory of a local computer device or a remote storage server, for example, and includes but is not limited to software and hardware capable of parsing interface display developed based on a program language, such as a scripting language like XML, HTML, C language, and the like. In still other embodiments, one computer device performs the visualization method and provides a corresponding visualization interface to another computer device for presentation. For example, the client initiates a request to the server based on a request operation of the user and logs in the server, the server executes a visualization method to form corresponding interface data and feeds the interface data back to the client, and a browser or a customized application program of the client displays a corresponding diagram according to the corresponding interface data.
The present application further provides a client, where the client is connected to a server through a network, and in this embodiment, the client is, for example, a web client, and the client is, for example, a web server, and the web client performs the user data visualization method described in the above embodiments by logging in the web server based on sending a web service request and displays the user data visualization method through a display, and the description of the implementation process of performing the user data visualization method refers to the description of fig. 1 to fig. 11.
The present application further provides a server, which is connected to a client through a network, in this embodiment, the client is, for example, a web client, and the client is, for example, a web server, and the web server executes a requested operation based on the web client, sends the user data visualization method described in the foregoing embodiment to the client, and displays the user data visualization method through a display, and the description of the implementation process of executing the user data visualization method refers to the description of fig. 1 to fig. 11.
The present application further provides a browser, which is connected to a server through a network, and the browser logs in the server to execute the user data visualization method described in the foregoing embodiment based on a sending request and displays the user data visualization method through a display, and reference is made to the description of the implementation process of executing the user data visualization method with reference to fig. 1 to 11. In the present embodiment, the browser is, for example, a web browser, including but not limited to a QQ browser, an Internet Explorer browser, a Firefox browser, a Safari browser, an Opera browser, a Google Chrome browser, a hundredth browser, a dog hunting browser, a leopard hunting browser, a 360 browser, a UC browser, an audact browser, a window around the world browser, and the like.
The present application further provides a user data visualization system that may include software and hardware in one or more computer devices and that visualizes the data set of the group detected by the fraud event detection system. To provide domain experts with what a group does as a fraudulent group and what the algorithm experts propose "whether users of the same group all have the same behavior habit". The application provides a user data visualization system from the group member relationship. Please refer to fig. 13, which is a schematic diagram illustrating a module structure of a user data visualization system provided in the present application. As shown, the user data visualization system 3 comprises an acquisition module 31 and a display module 32.
The acquiring module 31 is configured to acquire a group of data sets. The data characteristics of the data set include at least user information, an IP address, an event type, an event origin, an event responder, and an event occurrence time. The user information refers to information capable of characterizing the user identity, such as a user ID, a unique user nickname, a certificate number, and the like. The user information further includes: mobile phone number, mailbox, ID number, gender, user equipment number used by the user, registration time and the like. The IP address represents the IP address or IP address segment or IP address packet of the corresponding computer equipment when the same user information generates an event in the network. The event type is a type recorded in the network operation log and represents a user behavior event, and includes but is not limited to: the network users perform at least one of social behaviors such as attention, praise, comment and give away (or referred to as gift sending) among the network users, and operation behaviors such as login, logout, state update, registration and information modification among the network users. The same user information may correspond to at least one event type, each event type corresponding to an event origination, an event responder, and an event occurrence time. For example, the same user information may correspond to a plurality of event types, each corresponding to a respective event origin, event responder and event occurrence time. The event origin refers to user information and the like for initiating an event type. The event responder includes target user information of the initiated event type, and the like.
Here, the group grouping of the collected users is determined in terms of data characteristics based on the collected member clusters. A detection algorithm (such as an unsupervised detection algorithm) of group members participating in fraud time is preset in the fraud event detection system. The detection algorithm classifies all members step by step based on the decision priorities of all data features of the collected members in order to accurately classify the group members. Different fraud events correspond to unsupervised detection algorithms of different decision priorities.
In some embodiments, the detection algorithm performs decision classification based on similarity of data features of all members. Specifically, referring to fig. 2, a flow chart of an embodiment of the present application for obtaining a group data set is shown, as shown in the figure, the obtaining module may obtain a group data set from a plurality of group data sets obtained based on the following steps:
step S111, obtaining an operation log of a cluster formed by a plurality of network users; in different embodiments, the cluster is a cluster formed by all network users that can be acquired, and the network users in the cluster are from the same website or different websites, or from different network channels, such as the internet, one or more intranets, Local Area Networks (LANs), wide area networks (WLANs), storage local area networks (SANs), and the like, or a suitable combination thereof, or a mobile communication network of a mobile phone, and the like.
Step S112, determining at least one data characteristic from the operation logs of the plurality of network users, and analyzing the similarity of at least one group of data characteristics in the operation logs to determine the group; in a specific embodiment, aiming at the characteristic that the network fraud behavior inevitably leaves user use data in the network, the fraud event detection system collects operation logs of a plurality of network users from at least one website, and groups the users generating the corresponding operation logs by analyzing the similarity of at least one data feature in the operation logs to obtain groups and data sets of the groups in the operation logs.
In some embodiments, the data sets located in a group include, but are not limited to, data characteristics of at least two of user information, IP address, event type, source of event occurrence, event responder, and event occurrence time. The user information includes a mobile phone number, a mailbox, an ID number, an identification number, a gender, a user equipment number used by the user, registration time and the like. Wherein the same user information may correspond to at least one event type, each event type corresponding to an event origin, an event responder and an event occurrence time. The event features include, but are not limited to: the network users can perform at least one of social behaviors such as attention, praise, comment and give away among the network users, or operation behaviors such as login, logout, state update, registration and information modification among the network users. For example, the same user information may correspond to a plurality of event types, each corresponding to a respective event origin, event responder and event occurrence time.
Step S113, a data set of the group is acquired. In some embodiments, the data set may be obtained from a database storing the groups and their data sets, for example, on a remote storage server or in a storage device in a local computer device, and the obtained data set of one group may be obtained by extracting it from the database based on an input operation of a user. For example, the fraud detection system obtains a plurality of groups by using an unsupervised detection algorithm, and the user selects one of the groups through the selection interface to obtain the data set of the corresponding group.
Specifically, the fraud event detection system calculates the similarity of all data in the operation log in the same type of data features, wherein the similarity can be measured by using information entropy, for example, the fraud event detection system calculates the information entropy of the dimension of the IP usage amount or the maximum IP usage amount by using user information, calculates the information entropy of the dimension of the operation type by using the event type, and calculates the information entropy of the dimension of poor operation by using the information entropy of the registration time dimension or the operation time; by means of the calculation, each obtained information entropy is detected in an unsupervised detection mode and is divided into a plurality of groups. The unsupervised detection mode includes, for example, using a dense subgraph-based algorithm or a vector space-based algorithm. The groups presented by the visualization method provided by the application are used for reflecting shared resources, user relations and the like used by the fraud event, so that a user using the fraud event detection system can more clearly determine whether the classification strategy in the unsupervised detection algorithm is reasonable. Wherein, the shared resources include but are not limited to shared IP, mailbox, etc., and the user relationship includes but is not limited to: user attention, interaction, etc.
In one embodiment, the display module 32 in the user data visualization system may display at least one group interface, the group size in the group interface being characterized by the displayed geometry size. Referring to fig. 3, an interface including a plurality of groups shown in an embodiment of the present application is shown, as shown in the figure, 11 groups are shown in the interface, and a geometric figure used for representing the groups is a circle, the 11 groups are all located in a maximum dotted circle, for example, the dotted circle is used for representing a cluster composed of N network users, for example, the group with the reference number 0 is a normal group, and 10 groups with different sizes with the reference numbers 1-10 are located in a smaller dotted circle, and the size of the circle is proportional to the number of members of the group, that is, a large group represents a larger number of members, a small group represents a smaller number of members, and a group with the reference numbers 1-10 is an abnormal group. In different embodiments, the geometric figures of the groups may be of arbitrary shape. The colors of the geometric figures may be randomly set or related to the number of groups or members of a group. For example, N colors are preset, and the fraud detection system randomly corresponds different colors to the geometric figures representing the groups. For another example, the fraud event detection system sequentially corresponds to the geometric figures representing each group according to a preset color sequence and a sequence from small member number to large member number. When a user selects a geometric figure by operating the display interface, the fraud event detection system obtains a group of data sets.
In a preferred embodiment, the display of at least one group interface may further include an information bar for displaying group information, and when a user selects one group in the group interface, basic information of the group is displayed in a form of a window or a text box on one side of the interface, where the basic information is, for example: group encoding, number of members, data characteristics for determining the most preferred group, group attributes (such as normal group or abnormal group), etc.
In order to show the grouping decision process, after the fraud event detection system groups, the display module 32 displays the grouping process of the group users classified according to the decision priority data features in the corresponding detection algorithm in the form of a tree structure, so that the domain experts and/or the algorithm experts solve the defects and shortcomings in the corresponding detection algorithm through a visual interface.
The display module 32 is used for displaying a decision tree graph to characterize the attribute testing process of all users in the group. The attributes of the user may include Normal user (Normal) and Abnormal user (Abnormal), or include Normal user (Normal), fraudulent role a (Abnormal a), fraudulent role b (Abnormal b), and so on. In the display interface, the display module 32 represents the process of classifying the user attributes of the same group obtained by the fraud detection system from the highest priority to the lowest priority by a detection algorithm in a tree structure from the root node of the tree through a decision path or each non-leaf node to each leaf node. Wherein the following graphical representations are displayed in the display interface: the data characteristics of the first priority of the root node of the decision tree graph and the decision value range of the data characteristics; final attributes of at least one user characterized by each leaf node of the decision tree graph; the decision tree graph comprises the current attributes of a plurality of users represented by each non-leaf node, the data characteristics of the current priority and decision value ranges thereof; and the decision paths corresponding to the root node or each non-leaf node in the decision tree graph are characterized by lines with different colors, shapes or thicknesses. As can be seen from the decision tree graph, the users classified into each leaf node are determined to be detected as normal users or abnormal user final attributes, and the users classified into each non-leaf node need to be classified continuously until being allocated to the determined leaf node to determine the final attributes (namely, normal users and abnormal users) of the corresponding users.
The decision result of the decision tree is obtained by classification according to the relation between the original value of each user data characteristic and the corresponding decision value threshold through step-by-step analysis. For example, a selected user in the fraud detection system performs a grouping decision on all users by calculating the IP usage amount according to the priority from high to low by using the maximum IP usage amount, the out _ degree of the user in the social network and the user information after pruning the decision tree. The unsupervised detection algorithm includes, for example, a dense subgraph-based algorithm or a vector space-based algorithm. The groups presented by the visualization method provided by the application are used for reflecting shared resources, user relations and the like used by the fraud event, so that a user using the fraud event detection system can more clearly determine whether the classification strategy in the unsupervised detection algorithm is reasonable. Wherein, the shared resources include but are not limited to shared IP, mailbox, etc., and the user relationship includes but is not limited to: user attention, interaction, etc.
Please refer to fig. 4, which is a diagram illustrating a group user decision tree graph. As shown in the figure, the root node of the decision tree graph displays the highest priority data characteristic as the maximum IP usage, and measures the attribute classification of the group users by using the maximum IP usage, that is, when the maximum IP usage (max _ IP _ used _ be _ used _ amount) corresponding to one user is less than or equal to 80.5, the phase-applied user is classified to the first non-leaf node along the "blue" color decision path, otherwise, the phase-applied user is classified to the first leaf node along the "yellow" color decision path. And the first non-leaf node continuously classifies and judges the acquired users according to the data characteristic that the event origin is the second priority, namely, the event origin is used for measuring the attribute classification of the currently acquired users, when the out _ degree of the event origin corresponding to one user in the social network is less than or equal to 711.0, the corresponding user is classified to the second leaf node along a 'blue' color decision path, otherwise, the corresponding user is classified to the second non-leaf node along a 'yellow' color decision path. And the second non-leaf node continuously classifies and judges the acquired users according to the data characteristics of which the IP usage is the third priority, namely, the IP usage is used for measuring the attribute classification of the currently acquired users, when the IP usage (IP _ used _ amount) corresponding to one user is less than or equal to 870.0, the phase-applied user is classified to a third leaf node along a 'blue' color decision path, otherwise, the phase-applied user is classified to a fourth leaf node along a 'yellow' color decision path. Here, "blue" represents a decision path classified as "normal user" at a currently prioritized non-leaf node, and "yellow" represents a decision path classified as "abnormal user" at a current priority. Wherein, the value ranges of the corresponding information measured by the user in each non-leaf node, such as 80.5, 711.0, 870.0 and the like shown in the figure, are decision value ranges corresponding to the current priority data characteristics.
In different embodiments, lines of different shapes may also be used to represent the difference of the decision path during display, for example, a solid line is used to represent the determined user attribute as a normal user, a dashed line is used to represent the determined user attribute as an abnormal user, or a straight line is used to represent the determined user attribute as a normal user, and a curved line is used to represent the determined user attribute as an abnormal user, or a thick line is used to represent the difference of the decision path, for example, a thin line is used to represent the determined user attribute as a normal user, and a thick line is used to represent the determined user attribute as an abnormal user.
In order to more clearly see the number of users acquired by each non-leaf node and leaf node, in the attribute testing process of displaying a decision tree graph to characterize all users in the group, the root node in the decision tree graph also displays the number of users in the group (i.e. the sample size given by the root node), and the number of users of the current attribute (i.e. the sample size acquired by the current non-leaf node) in each non-leaf node of the decision tree graph. Referring to fig. 5, the decision tree graph further includes a display interface for displaying the number of users classified into each node, where the sample _ size displayed in the root node is the sample size given by the root node, i.e. the total number of group members, the sample _ sizes displayed in other non-leaf nodes are the sample size obtained by the current non-leaf node, and the sample _ size displayed in the leaf node represents the number of users classified into the node itself by the previous stage.
It should be noted that, according to the type of the fraud event and the design of the unsupervised detection algorithm, in the process of detecting each group decision classification in the operation log, the priority of each data feature, the decision value range of each priority, the relationship between the upper and lower adjacent priorities, the decision paths of each level, and the like may be different. Even in order to obtain the group decision result of each user in the operation log more quickly, the used unsupervised detection algorithm can intercept and select the selected data features according to the convergence condition during training, namely when the trained detection algorithm reaches the convergence condition, the rest data features are pruned, and the pruned data features are not displayed on the display interface of the decision tree graph. Or, if all the users in the obtained group have been determined to be in the previous several levels of classification in the detection algorithm, the remaining data features are pruned, and the display module 32 displays only the decision tree graph including all the decision paths and the nodes connected to the decision paths. When the user data visualization system is used for displaying the classification decision process of a group, domain experts and algorithm experts are easier to evaluate the accuracy of the detection algorithm.
In the display interface of the decision tree graph or in another display interface that jumps based on the obtained operation instruction, the display module 32 is further configured to determine one user in the group as a target user; and displaying a time axis on one side of the decision tree graph to present the operation log of the target user on the time axis.
When a domain expert or an algorithm expert clicks one leaf node and selects one of pop-up windows of the leaf node to be linked by a user, operation logs of the corresponding user on a time axis are displayed beside a decision tree graph. Please refer to fig. 6, which is an interface diagram showing the operation log of the target user on the time axis on the left side and the group decision tree graph on the right side. As shown in the figure, time sequence nodes in the operation log according to the time sequence are marked from top to bottom, and an event type (e.g., event _ type), an event generation time (e.g., timestamp), user information (e.g., user _ id), an IP address (e.g., complete IP address or IP segment), an event responder (e.g., target _ user), event content (e.g., comment _ id, comment _ length, amount, object _ id, target _ video, etc.), an event type (e.g., event _ type), and the like in the operation log corresponding to a corresponding time point are displayed beside each time sequence node. By displaying the operation history of each user in the group on the time axis, the domain experts and the algorithm experts can check the accuracy of the detected user attributes in the same group and the common relationship between normal users and abnormal users in the same group in detail, so that the defects and the shortcomings of the detection algorithm can be confirmed.
In other embodiments, domain experts and algorithm experts are not only concerned with the member attribute classification process for groups, but also with whether the assigned groups are reasonable, which requires them to be able to view detailed data features in each group and to look at the order of preference of each data feature constructed for classifying groups from another dimension. The visualization method is also used for displaying an interface of the data set of a group. The displayed data sets are displayed in a list, thereby displaying detailed information of the data features in the same group for the user. To improve the accuracy of the classification of the group data sets, the list displayed in the interface may display the list of data features in a group by columns according to the classification priority according to which the fraud detection system is classified. For example, please refer to fig. 7, which shows a list interface diagram of a group of data sets displayed in an embodiment of the present application. In the list interface schematic diagram, the displayed data sets of a group are sorted from high to low according to the similarity of data features as priority. When the similarity of the data features in the first priority is the same, the data features in the second priority are sorted, and in the embodiment shown in fig. 7, the priority is in the order from high to low: IP address (a segment or packet of an IP address), source of event origin (source), event responder (target), event type (event _ type), and event occurrence time (timestamp). In this embodiment, the head-up (header) of the table is encoded with the importance of different columns, and if the value of a feature is more concentrated, the feature is more important. In one embodiment provided herein, the fraud detection system represents this characteristic by computing the entropy of information for each feature. If the entropy of the information is lower, it means that the consistency is higher. Then, the fraud event detection system sorts the features according to the ascending order of the information entropy, and finally prompts the attention of the user by advancing the order of the list head with low information entropy, and certainly, under different implementation conditions, the color rendering can be performed according to the list head in the displayed table, for example, the color rendering of the list head with low information entropy as the deepest color can be performed to prompt the user to pay the most attention to the data features represented by the column, and so on, the color rendering of other data features represented by the column is performed, and then the data set list interface shown in the figure is obtained. The list interface may be displayed after the step of displaying a plurality of group interfaces, or before or after step S12, or based on a selection operation of the user selecting the list interface.
In some embodiments, to further characterize whether the acquired data set of the group can reflect the characteristics of a fraudulent event, it may be necessary to perform the presentation from other dimensions. The accuracy of the detected fraud events is further confirmed, for example, by comparing the normal user's network operation data to the group data set. To this end, the display module 32 is further configured to display an interface of the feature distribution of the data sets of the group. The feature distribution interface may show the distribution of each data type in the whole network, the whole network is opposite, for example, a cluster is formed by a plurality of network users, the distribution of a certain data feature in a certain group in the cluster may be displayed through the interface, please refer to fig. 3, for example, the maximum dotted circle in fig. 3 represents a cluster formed by a plurality of network users, there are 11 groups in the cluster, each group is numbered 0-10, and one group is selected from the 11 groups for information display.
In some embodiments, the types of data that the feature distribution interface may present are, for example: the entropy of information in the average operation time interval dimension (average operation interval entropy), the entropy of information in the IP address usage dimension (IPused _ amount _ entropy), the entropy of information in the gender dimension (sex _ amount), the entropy of information in the email dimension (email _ amount), the entropy of information in the registration time dimension (reg _ time _ entry), the entropy of information in the operation time dimension (operation time _ entry), the entropy of information in the device number dimension (device _ amount _ entry), the entropy of information in the operation type dimension (operation type _ entry), the maximum entropy of information in which an IP used by others is used (maxIP used _ be _ amount), and the like. . In the embodiment shown in fig. 8, the information entropy of the registration time dimension is taken as an example of data characteristics, that is, fig. 8 shows the characteristic distribution of the information entropy of the registration time (registration period) in one group in the network cluster. In order to effectively compare the difference between the obtained group data set and the feature distribution of the network operation data of the normal user, please refer to fig. 9, which is a flowchart illustrating an interface displaying the feature distribution of the group data set, as shown in the figure, the user visualization system performs the following steps so that the display module 32 displays the generated diagrams on the corresponding interfaces:
in step S211, one of the groups is selected and at least one data feature is determined from the data set of the group. In one embodiment, for example, the group labeled 2 in fig. 3 is selected, and a data characteristic that is user information, for example, registration time, is determined from the data set in the group labeled 2.
In step S212, the feature distribution of the determined at least one data feature in the group and cluster is counted. In this embodiment, the feature distribution of the data feature at the registration time in the group is counted, and the feature distribution of the data feature at the registration time in the whole cluster is counted.
In step S213, a histogram of the feature distribution and a distribution contrast map corresponding to the histogram in the entire cluster histogram are displayed. In this embodiment, based on the encoding of the data feature, a histogram of the feature distribution of the data feature at the registration time in the group is displayed, and a histogram of the feature distribution of the data feature at the registration time in the entire cluster is displayed. As shown in fig. 8, in the interface D, a graph (a) is displayed as a thumbnail of a feature distribution of registration time in the selected group denoted by 2, corresponding to the enlargement of the thumbnail, and is an enlarged graph (D) of the lowermost side in the interface D, as seen from the enlarged graph, in one month from 8 month 1 day to 8 month 31 day in the group, the time of registration operation by the group member is concentrated in 5 days of 8 months, 6 days of 8 months, 11 days of 8 months, 12 days of 8 months, and 5 days of 8 months, 16 days, and in the interface D, a graph (c) is characterized as a histogram of the distribution of registration time by the registration user in 8 months, and in the interface D, as seen from the graph (c), the registration distribution by the registration user in 8 months has a certain regularity, and in the interface D, a graph (b) is characterized as showing a data feature of registration time in the whole cluster and the selected group by superimposing the graph (D) and the graph (c) The difference in the groups. In order to enable the user to know the differences and the connections among different features, in the embodiment provided by the application, the histogram is presented in three layers, and after the user clicks one of the thumbnails, the page is scrolled to the normalized distribution contrast diagram. Of course, in a particular application, there may be multiple thumbnails of the data features, each representing a different data feature.
In some embodiments, the histogram may be further color-rendered to distinguish or emphasize the distribution of features of a data feature in the group and the entire cluster, or dynamically displayed (e.g., blinking) to distinguish or emphasize the distribution of features of a data feature in the group and the entire cluster.
In some embodiments, in order to further analyze differences between a plurality of groups in a network cluster, the display module 32 is further configured to display an interface of feature distribution of data sets of the plurality of groups, please refer to fig. 10 and 11, fig. 10 is a flowchart illustrating a step of the display module 32 in one embodiment displaying distribution of the plurality of groups in the cluster, and fig. 11 is a flowchart illustrating a step of the present application in one embodiment displaying distribution of the plurality of groups in the cluster, as shown in the figure, the steps include:
in step S311, a plurality of groups are determined in a cluster composed of a plurality of network users, and the groups are characterized by different shapes, icons, labels and/or colors, respectively; in one embodiment, for example, 3 groups numbered 0, 1 and 2 in fig. 3 are selected, wherein the group numbered 0 is represented by "green", the group numbered 1 is represented by "red", and the group numbered 2 is represented by "blue".
In step S312, at least one data feature is determined from the plurality of groups of data sets; in this embodiment, a data characteristic, such as an IP address, is determined from the 3 groups of data sets.
In step S313, analyzing relative entropy between each two network users in each group as a measure of similarity between each two network users based on the at least one data feature; in the present embodiment, the relative entropy (entropy of IP usage amount dimension) between each two network users in 3 groups of reference numerals 0, 1, and 2 is analyzed based on the IP address as a measure of the degree of similarity between the each two network users. For example, a data dimension reduction method t-SNE (t-distribution neighborhood embedding algorithm) is adopted, and relative entropy between two users is used as an index for measuring the distance between the network users.
In step S314, a display interface is output, in which the network users are characterized by shapes, icons, and/or labels, the differences of the plurality of groups are characterized by different colors, and the degree of similarity between two network users in each group is characterized by the displayed distance. In this embodiment, as shown in fig. 11, in an interface E, a dot is used to represent a network user, a "green" color represents a group with a reference number 0, a "red" color represents a group with a reference number 1, and a "blue" color represents a group with a reference number 2, wherein the "blue" color represents a shorter user distance in the group with a reference number 2, the group is distributed in a cluster, the "red" color represents a shorter user distance in the group with a reference number 1, the group is distributed in a cluster, the "green" color represents a distribution of randomly sampled normal users, and the distance between the normal users is longer and the distribution is more dispersed. It can be considered that the greater the probability that a group, if it is a dense cluster, is considered to be a fraudulent group. For example, in the embodiment shown in fig. 11, the groups represented by the "green" colors are distributed more dispersedly, and thus the "green" color group is a normal group, and the user represented by the "green" dot is also a normal user. On the contrary, the group represented by "red" (i.e. the group labeled 1) and the group represented by "blue" (i.e. the group labeled 2) are distributed in a cluster, which means that the "red" and "blue" color groups are abnormal groups, wherein the users represented by the "red" point and the "blue" point are abnormal users. In one embodiment, a user using the visualization system can interactively view the specific information and feature values of the users in each group by hovering over a mouse.
In other embodiments, in the output interface, the network user may also be represented by, for example, a shape, an icon, and/or a label, for example, a geometric figure with a shape of triangle, rectangle, etc., for example, an icon is a smiling face or a crying face, a skeleton avatar, a pirate avatar, etc., for example, a label is a character or a symbol with definite distinction, etc.
The user data visualization system displays the grouping process of the group users determined in the fraudulent event detection process, data characteristic distribution, classification lists and the like, so that the grouping in the fraudulent event detection period is displayed in a plurality of relation interfaces, and the detection algorithm of the fraudulent event detection system can be evaluated and revised by field experts and algorithm experts.
It should be noted that all modules in the user data visualization system may be configured on a single computer device. Or all modules in the user data visualization system are respectively configured on a client side of a user and a server of a network side, and the client side is connected with the server through a network. For example, an acquisition module of the user data visualization system is installed in a server, a display module is installed in a client, the client logs in the server based on a request sent, and the server runs the user data visualization system to the client based on an operation of the client executing the request and displays a corresponding interface through the client. The clients include but are not limited to: an interface of a browser or dedicated client software provided in the user terminal, and hardware for executing a display interface program.
It should also be noted that, through the above description of the embodiments, those skilled in the art can clearly understand that part or all of the present application can be implemented by software and combined with necessary general hardware platform. With this understanding in mind, the technical solutions of the present application and/or portions thereof that contribute to the prior art may be embodied in the form of a software product that may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may cause the one or more machines to perform operations in accordance with embodiments of the present application. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc-read only memories), magneto-optical disks, ROMs (read only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It should be noted that, as will be understood by those skilled in the art, the above-mentioned part of the components may be programmable logic devices, including: one or more of Programmable Array Logic (PAL), Generic Array Logic (GAL), Field-Programmable Gate Array (FPGA), and Complex Programmable Logic Device (CPLD), which is not limited in this application.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims (25)

1. A user data visualization method is applied to a fraud event detection system, and is characterized by comprising the following steps:
acquiring a group of data sets, wherein the data characteristics of the data sets comprise user information, IP addresses, event types, event origin, event responders and event occurrence time; wherein the data characteristics of the data sets are determined as different decision priorities;
displaying a decision tree graph to characterize the attribute testing process of all users in the group, wherein:
displaying the data characteristics of the first priority of the root node of the decision tree graph and a decision value range of the data characteristics;
displaying a final attribute of at least one user characterized by each leaf node of the decision tree graph;
displaying the current attributes, the data characteristics of the current priority and the decision value range of the data characteristics of a plurality of users represented by each non-leaf node of the decision tree graph; and
and displaying the decision paths corresponding to the root node or each non-leaf node in the decision tree graph, wherein the decision paths are represented by lines with different colors, shapes or thicknesses.
2. The method of claim 1, wherein during the attribute testing process for displaying a decision tree graph to characterize all users in the group, the root node in the decision tree graph further displays the number of users in the group, and the number of users in each non-leaf node in the decision tree graph further displays the current attribute.
3. A method for visualizing user data according to claim 1 or 2, wherein said displayed decision tree graph is a pruned decision tree graph.
4. A method for visualizing user data according to claim 1, further comprising the steps of:
determining one user in the group as a target user;
and displaying a time axis on one side of the decision tree graph to present the operation log of the target user on the time axis.
5. The method of claim 1, wherein the step of obtaining a group of datasets comprises:
acquiring an operation log of a cluster formed by a plurality of network users;
determining at least one data feature from the operation logs of the plurality of network users, and analyzing the similarity of at least one group of data features in the operation logs to determine the group; and
a data set for the group is obtained.
6. A method for visualizing user data according to claim 1 or 5, further comprising the step of displaying at least one group interface, the group size in said group interface being characterized by the displayed geometrical size.
7. The method according to claim 1 or 5, further comprising the step of displaying an interface of a group of data sets, wherein the data characteristics of the group of data sets comprise at least two data characteristics of user information, IP address, event type, event origin, event responder, and event occurrence time, and the group data sets are displayed in an ordered manner after being grouped in the interface of the group data sets.
8. A method for visualization of user data according to claim 1 or 5, further comprising the step of displaying an interface of the feature distribution of the data sets of the group:
selecting one of said groups and determining at least one data characteristic from the data set of said group,
counting a feature distribution of the determined at least one data feature in the group and cluster; and
and displaying a histogram of the feature distribution and a distribution contrast map of the histogram in the whole cluster histogram.
9. A method for visualization of user data according to claim 1 or 5, further comprising the step of displaying an interface of the feature distribution of the data sets of the plurality of groups:
determining a plurality of groups in a cluster consisting of a plurality of network users, and respectively representing the difference of the groups by different shapes, icons, labels and/or colors;
determining at least one data feature from the plurality of groups of data sets;
analyzing relative information entropy between every two network users in each group based on the at least one data characteristic to serve as a measure of the similarity between every two network users; and
outputting a display interface in which the network users are characterized by shapes, icons, and/or labels, the plurality of groups are characterized by different colors, and the degree of similarity between two network users in each group is characterized by the distance displayed.
10. The method of claim 1, wherein the event type comprises at least one of a network user's attention, likes, comments, gifts, login, logout, update status, registration, and modification information.
11. A computer device, comprising:
a processor;
a presentation engine executing on the processor, the presentation engine to perform the user data visualization method of any of claims 1-10.
12. A user data visualization system, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a data set of a group, and the data characteristics of the data set comprise user information, an IP address, an event type, an event origin, an event responder and event occurrence time; wherein the data characteristics of the data sets are determined as different decision priorities; and
the display module is used for displaying a decision tree graph to represent the attribute testing process of all users in the group, wherein the data characteristics of the first priority of the root node of the decision tree graph and the decision value range of the data characteristics are displayed; displaying a final attribute of at least one user characterized by each leaf node of the decision tree graph; displaying the current attributes, the data characteristics of the current priority and the decision value range of the data characteristics of a plurality of users represented by each non-leaf node of the decision tree graph; and displaying the decision paths corresponding to the root node or each non-leaf node in the decision tree graph, wherein the decision paths are represented by lines with different colors, shapes or thicknesses.
13. The system of claim 12, wherein the display module is further configured to display a number of group users at a root node in the decision tree graph and a number of users displaying the current attribute at each non-leaf node of the decision tree graph.
14. The system according to claim 12 or 13, wherein the decision tree graph displayed by the display module is a pruned decision tree graph.
15. The system of claim 12 or 13, wherein the display module is further configured to display a timeline on a side of the decision tree graph for presenting a log of operations of a target user on the timeline, the target user being determined by an input operation.
16. The system of claim 12, wherein the group is determined by analyzing similarity of at least one set of data features in the oplogs of the plurality of web users obtained by the obtaining module.
17. The system of claim 12 or 16, wherein the display module is further configured to display at least one group interface, wherein the group size in the group interface is characterized by the displayed geometric size.
18. The system of claim 12 or 16, wherein the display module is further configured to display an interface of a group of data sets, the data characteristics of the group of data sets include at least two data characteristics of user information, IP address, event type, event origin, event responder, and event occurrence time, and the group of data sets are displayed in the group of data sets in an ordered manner after being grouped in the interface.
19. The system of claim 12 or 16, wherein the display module is further configured to display an interface of feature distributions of the data sets of the group, a histogram of the feature distributions and a distribution contrast map corresponding to the histogram in the entire cluster histogram.
20. A user data visualization system as in claim 12 or 16 wherein the display module is further configured to display an interface that characterizes the network users by shapes, icons, and/or labels, characterizes the differences of the plurality of groups by different colors, and characterizes the degree of similarity between two network users in each group by a displayed distance.
21. The user data visualization system of claim 12, wherein the event type includes at least one of a network user's attention, likes, comments, gifts, login, logout, update status, registration, modify information.
22. A client connected to a server via a network, wherein the client performs the steps of the user data visualization method according to any one of claims 1 to 10 based on sending a request to log into the server.
23. A server connected to a client via a network, wherein the server transmits the process of the user data visualization method according to any one of claims 1 to 10 to the client and displays the execution result through the client based on the operation of the client to execute the request.
24. A browser, connected to a server via a network, wherein the browser executes the steps of the user data visualization method according to any one of claims 1 to 10 based on sending a request to log in to the server.
25. A computer-readable storage medium storing a data visualization computer program, characterized in that the data visualization computer program when executed implements the steps of the user data visualization method according to any of claims 1 to 10.
CN201810022133.7A 2018-01-10 2018-01-10 User data visualization method and system Active CN108268624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810022133.7A CN108268624B (en) 2018-01-10 2018-01-10 User data visualization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810022133.7A CN108268624B (en) 2018-01-10 2018-01-10 User data visualization method and system

Publications (2)

Publication Number Publication Date
CN108268624A CN108268624A (en) 2018-07-10
CN108268624B true CN108268624B (en) 2020-04-24

Family

ID=62773340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810022133.7A Active CN108268624B (en) 2018-01-10 2018-01-10 User data visualization method and system

Country Status (1)

Country Link
CN (1) CN108268624B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213904B (en) * 2018-08-02 2021-09-28 陶雷 System and method for processing presentation data based on structured scheme
CN109063131B (en) * 2018-08-02 2021-09-28 陶雷 System and method for outputting content based on structured data processing
CN109767269B (en) * 2019-01-15 2022-02-22 网易(杭州)网络有限公司 Game data processing method and device
CN111125658B (en) * 2019-12-31 2024-03-22 深圳市分期乐网络科技有限公司 Method, apparatus, server and storage medium for identifying fraudulent user
CN113806594A (en) * 2020-12-30 2021-12-17 京东科技控股股份有限公司 Business data processing method, device, equipment and storage medium based on decision tree

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278464B1 (en) * 1997-03-07 2001-08-21 Silicon Graphics, Inc. Method, system, and computer program product for visualizing a decision-tree classifier
CN105408894A (en) * 2014-06-25 2016-03-16 华为技术有限公司 Method and device for determining user identity category
CN105956122A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Object attribute determining method and device
CN106295702A (en) * 2016-08-15 2017-01-04 西北工业大学 A kind of social platform user classification method analyzed based on individual affective behavior
CN107438050A (en) * 2016-05-26 2017-12-05 北京京东尚科信息技术有限公司 Identify the method and system of the potential malicious user of website

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278464B1 (en) * 1997-03-07 2001-08-21 Silicon Graphics, Inc. Method, system, and computer program product for visualizing a decision-tree classifier
CN105408894A (en) * 2014-06-25 2016-03-16 华为技术有限公司 Method and device for determining user identity category
CN105956122A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Object attribute determining method and device
CN107438050A (en) * 2016-05-26 2017-12-05 北京京东尚科信息技术有限公司 Identify the method and system of the potential malicious user of website
CN106295702A (en) * 2016-08-15 2017-01-04 西北工业大学 A kind of social platform user classification method analyzed based on individual affective behavior

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于大数据的互联网金融欺诈行为识别研究;丁爽斯;《中国优秀硕士学位论文全文数据库经济与管理科学辑》;20170215(第02期);J157-371 *
基于行为分析的恶意代码分类与可视化;王博;《中国优秀硕士学位论文全文数据库信息科技辑》;20150615(第06期);I138-66 *

Also Published As

Publication number Publication date
CN108268624A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN108268624B (en) User data visualization method and system
CN110399925B (en) Account risk identification method, device and storage medium
Sapiezynski et al. Quantifying the impact of user attentionon fair group representation in ranked lists
CN108170830B (en) Group event data visualization method and system
JP6771751B2 (en) Risk assessment method and system
Lin et al. Voices of victory: A computational focus group framework for tracking opinion shift in real time
CN108280644B (en) Group membership data visualization method and system
CN107330731B (en) Method and device for identifying click abnormity of advertisement space
US8412712B2 (en) Grouping methods for best-value determination from values for an attribute type of specific entity
CN111614690A (en) Abnormal behavior detection method and device
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
CN112884092B (en) AI model generation method, electronic device, and storage medium
US20210112101A1 (en) Data set and algorithm validation, bias characterization, and valuation
CN103793484A (en) Fraudulent conduct identification system based on machine learning in classified information website
Le Merrer et al. Setting the record straighter on shadow banning
Chen et al. A comprehensive empirical study of bias mitigation methods for machine learning classifiers
Duval Explainable artificial intelligence (XAI)
US20180285432A1 (en) Extracting and labeling custom information from log messages
CN110738527A (en) feature importance ranking method, device, equipment and storage medium
CN104202291A (en) Anti-phishing method based on multi-factor comprehensive assessment method
WO2020048056A1 (en) Risk decision method and apparatus
CN111754241A (en) User behavior perception method, device, equipment and medium
Saleem et al. Personalized decision-strategy based web service selection using a learning-to-rank algorithm
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
CN109478219A (en) For showing the user interface of network analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20181024

Address after: 100084 10 floor 1009-1, 3 building, 1 Zhongguancun East Road, Haidian District, Beijing.

Applicant after: Hua Ching Qing Chiao information technology (Beijing) Co., Ltd.

Address before: 100084 Tsinghua Yuan, Beijing, Haidian District

Applicant before: Tsinghua University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant