CN108170830B - Group event data visualization method and system - Google Patents

Group event data visualization method and system Download PDF

Info

Publication number
CN108170830B
CN108170830B CN201810022368.6A CN201810022368A CN108170830B CN 108170830 B CN108170830 B CN 108170830B CN 201810022368 A CN201810022368 A CN 201810022368A CN 108170830 B CN108170830 B CN 108170830B
Authority
CN
China
Prior art keywords
group
data
time
shape
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810022368.6A
Other languages
Chinese (zh)
Other versions
CN108170830A (en
Inventor
徐葳
孙娇
姚期智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huakong Tsingjiao Information Technology Beijing Co Ltd
Original Assignee
Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huakong Tsingjiao Information Technology Beijing Co Ltd filed Critical Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority to CN201810022368.6A priority Critical patent/CN108170830B/en
Publication of CN108170830A publication Critical patent/CN108170830A/en
Application granted granted Critical
Publication of CN108170830B publication Critical patent/CN108170830B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Computer Hardware Design (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a group event data visualization method and a group event data visualization system, which are applied to a fraud event detection system, wherein the method comprises the following steps: acquiring a data set of a group, wherein the data characteristics in the data set at least comprise event types and time information associated with the event types; creating a first time axis and a second time axis; displaying a first time axis with a first shape as a node based on the encoding of the data features to characterize the type and number of events that occur in the group within each time granularity of the first time axis; displaying a second shape to characterize a total number of each event type occurring within the time interval of the second timeline; displaying a second time axis, associating the event types characterized in the second shape with the time granularities of the event types on the second time axis, and distributing the event types characterized by a third shape on the second time axis; and displaying a fourth shape to characterize the type and number of events that occur within each time granularity of the second timeline for the group.

Description

Group event data visualization method and system
Technical Field
The application relates to the technical field of computer processing, in particular to a group event data visualization method and system.
Background
Online fraud, which is now well known to the public as a dark aspect of the internet, causes immeasurable losses worldwide each year. In 2015, the internet crime complaint center receives millions of complaints about fraud problems worldwide, online fraud causes billions of economic losses worldwide every year, and fraudulent users generally get a reward from helping to promote a specific commodity or distribute junk information. In internet finance, fraudulent users apply for loans with false identities, purchase goods with credit cards they steal, and even perform illegal activities such as money laundering. Therefore, in internet business scenarios, the need to find suitable anti-fraud algorithms becomes increasingly critical.
Although there are many methods for identifying fraud on the internet today, due to the limitations of the constructed fraud event detection system, the credibility of the screened data corresponding to the suspected fraud person requires a large amount of subsequent human verification, for example, the platform supervisor needs to check and verify one by one. This makes the revision of algorithm parameters, the design of data feature priorities, the selection of algorithm models, etc. in the fraud event detection system not only require the software design of algorithm experts, but also require the participation of domain experts. Therefore, improving the transparency of the fraud identification algorithm can effectively improve the fraud event detection accuracy, so that how to realize data visualization is an urgent problem to be solved in the field.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present application aims to provide a group event data visualization method and system, which are used to solve the problem of visualization of fraud identification algorithms in the prior art.
To achieve the above and other related objects, a first aspect of the present application provides a group data visualization method applied in a fraud event detection system, including the following steps: acquiring a group of data sets, wherein the data characteristics in the data sets at least comprise event types and time information associated with the event types; creating a first time axis and a second time axis; displaying a first timeline with a first shape as a node to characterize a type and number of events that occurred within each time granularity of the first timeline based on encoding of the data features; displaying a second shape to characterize a total number of each event type occurring within a time interval of the second timeline; displaying a second time axis, associating the event types characterized in the second shape with the time granularities of the event types in the second time axis, and distributing the event types characterized by a third shape on the second time axis; and displaying a fourth shape to characterize the type and number of events that occurred for the group within each time granularity of the second timeline.
A second aspect of the present application provides a computer device comprising: one or more processors; and a presentation engine executing on the one or more processors, the presentation engine to perform the group data visualization method as described in the first aspect of the application.
A third aspect of the present application provides a group data visualization system, including: the acquisition module acquires a group of data sets through a network, wherein the data characteristics in the data sets at least comprise event types and time information associated with the event types; the processing module is used for creating a first time axis and a second time axis and encoding the data characteristics; the display module displays a first time axis and a second time axis and displays a first shape, a second shape, a third shape and a fourth shape in one interface through display equipment, wherein the first shape is used as a node of the first time axis to represent the type and the number of events of the group in each time granularity of the first time axis; the second shape characterizes a total number of each event type occurring within a time interval of the second timeline; the third shape characterizes a distribution of event types characterized in the second shape on the second timeline; the fourth shape characterizes a type and number of events that occur for the group within each time granularity of the second timeline.
A fourth aspect of the present application provides a client, connected to a server via a network, where the client logs in the server based on a sending request to execute the steps of the group data visualization method of the first aspect of the present application
A fifth aspect of the present application provides a server, which is connected to a client through a network, and based on an operation of a request executed by the client, the server sends the process of the group data visualization method according to the first aspect of the present application to the client, and displays an execution result through the client.
A sixth aspect of the present application provides a browser, which is connected to a server through a network, and the browser logs in the server based on a sending request to execute the steps of the group data visualization method according to the first aspect of the present application.
A seventh aspect of the present application provides a computer readable storage medium storing a data visualization computer program, characterized in that the data visualization computer program, when executed, implements the steps of the group data visualization method according to the first aspect of the present application.
As described above, the group data visualization method and system of the present application present the data set of the group determined in the fraudulent event detection process based on the time axis, type distribution, classification list, etc., so as to display the data characteristics of the group grouped during the detection of the fraudulent event in a variety of relationship interfaces, which is beneficial for the field experts and the algorithm experts to evaluate and revise the detection algorithm of the fraudulent event detection system.
Drawings
Fig. 1 is a flowchart illustrating a group data visualization method according to an embodiment of the present application.
FIG. 2 is a flow chart illustrating the steps of the present application in one embodiment of obtaining a group dataset.
FIG. 3 illustrates an interface including a plurality of groups according to an embodiment of the present application.
Fig. 4 is a schematic display interface diagram illustrating group data visualization according to an embodiment of the present disclosure.
Fig. 5 is a schematic display interface diagram illustrating group data visualization according to another embodiment of the present application.
Fig. 6a-6d show schematic interface diagrams of the present application for displaying several states using the visualization method of the present application, respectively.
FIG. 7 is a diagram illustrating a list interface for a group of datasets as shown in an embodiment of the present application.
FIG. 8 is a flow chart illustrating an interface for feature distribution of a cluster data set according to an embodiment of the present application.
FIG. 9 is an interface showing a histogram and a contrast chart of the feature distribution of registration time in a group, as shown in one embodiment of the present application.
FIG. 10 is a flow chart illustrating the steps of the present application in one embodiment to distribute a plurality of groups among a cluster.
FIG. 11 is a schematic diagram illustrating an example of a cluster distribution interface for displaying a plurality of groups according to the present application.
FIG. 12 is a block diagram of a computer device provided in an embodiment of the present application.
Fig. 13 is a schematic block diagram illustrating a group data visualization system provided in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application is provided for illustrative purposes, and other advantages and capabilities of the present application will become apparent to those skilled in the art from the present disclosure.
In the following description, reference is made to the accompanying drawings that describe several embodiments of the application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.
In the fraud detection technology, domain experts provide experience of data classification and requirements for accuracy of classification results for core technology of fraud identification, but the algorithm architecture itself and parameters in the algorithm are not well known to them. The domain expert can not obtain the data classification mode during the detection period, and when the fraud event detection system is used for obtaining the detection result, the domain expert can not judge the accuracy of the obtained detection result except for verifying the detection result. In order to improve the accuracy of the fraud detection system, the application provides a group data visualization method applied to the fraud detection system, groups obtained by classification in the fraud detection system and data sets thereof are displayed to an algorithm expert and a field expert in a visualization mode, so that different users (such as the field expert or the algorithm expert) can explore various fraud behaviors through various interactive means, and can explore different depths according to own needs.
The various components may include hardware elements (e.g., chips and circuits), software elements (e.g., tangible, non-transitory computer readable media storing instructions), or a combination of hardware and software elements.
The visualization method is mainly performed by a fraud event detection system. The fraud event detection system may include software and hardware in one or more computer devices. In order to provide the user with the behavior of a fraudulent group over different time periods, the responses are made to "what a group does as a fraudulent group" proposed by the domain expert and "whether the users of the same group all have the same behavior habit" proposed by the algorithm expert. The application provides a visualization method from a timeline. Referring to fig. 1, a flow chart of a group data visualization method according to an embodiment of the present application is shown. As shown in the figure, the group data visualization method includes the following steps:
in step S11, a data set of one group is acquired. The data features in the dataset include at least an event type and temporal information associated with the event type. In some embodiments, the manner of determining a group is described as follows, please refer to fig. 2, which shows a flowchart for acquiring a group data set according to an embodiment of the present application, and as shown in the figure, the step S11 further includes:
in different embodiments, the cluster is a cluster formed by all the network users that can be obtained, and the network users in the cluster come from the same website or different websites, or come from different network channels, such as the internet, one or more intranets, local area networks (L AN), wide area networks (W L AN), Storage Area Networks (SAN), or the like, or a suitable combination thereof, or a mobile communication network of a mobile phone, or the like.
Step S112, determining at least one data characteristic from the operation logs of the plurality of network users, and analyzing the similarity of at least one group of data characteristics in the operation logs to determine the group; in a specific embodiment, aiming at the characteristic that the network fraud behavior inevitably leaves user use data in the network, the fraud event detection system collects operation logs of a plurality of network users from at least one website, and groups the users generating the corresponding operation logs by analyzing the similarity of at least one data feature in the operation logs to obtain groups and data sets of the groups in the operation logs.
In some embodiments, the data sets located in a group include, but are not limited to, data characteristics of at least two of user information, IP address, event type, source of event occurrence, event responder, and event occurrence time. The user information includes a mobile phone number, a mailbox, an ID number, an identification number, a gender, a user equipment number used by the user, registration time and the like. Wherein the same user information may correspond to at least one event type, each event type corresponding to an event origin, an event responder and an event occurrence time. The event features include, but are not limited to: the network users perform at least one of social behaviors such as attention, praise, comment and give away (or referred to as gift sending) among the network users, and operation behaviors such as login, logout, state update, registration and information modification among the network users. For example, the same user information may correspond to a plurality of event types, each corresponding to a respective event origin, event responder and event occurrence time.
Step S113, a data set of the group is acquired. In some embodiments, the data set may be obtained from a database storing the groups and their data sets, for example, on a remote storage server or in a storage device in a local computer device, and the obtained data set of one group may be obtained by extracting it from the database based on an input operation of a user. For example, the fraud detection system obtains a plurality of groups by using an unsupervised detection algorithm, and the user selects one of the groups through the selection interface to obtain the data set of the corresponding group.
Specifically, the fraud event detection system calculates the similarity of all data in the operation log in the same type of data features, wherein the similarity can be measured by using information entropy, for example, the fraud event detection system calculates the information entropy of the dimension of the IP usage amount or the maximum IP usage amount by using user information, calculates the information entropy of the dimension of the operation type by using the event type, and calculates the information entropy of the dimension of poor operation by using the information entropy of the registration time dimension or the operation time; by means of the calculation, each obtained information entropy is detected in an unsupervised detection mode and is divided into a plurality of groups. The unsupervised detection mode includes, for example, using a dense subgraph-based algorithm or a vector space-based algorithm. The groups presented by the visualization method provided by the application are used for reflecting shared resources, user relations and the like used by the fraud event, so that a user using the fraud event detection system can more clearly determine whether the classification strategy in the unsupervised detection algorithm is reasonable. Wherein, the shared resources include but are not limited to shared IP, mailbox, etc., and the user relationship includes but is not limited to: user attention, interaction, etc.
In one embodiment, the visualization method further comprises the step of displaying at least one group interface, the group size in the group interface being characterized by the displayed geometric figure size. Referring to fig. 3, which shows an interface including a plurality of groups shown in an embodiment of the present application, as shown in the figure, there are 11 groups shown in the interface, and the geometric figure used for representing the groups is a circle, the 11 groups are all located in a maximum dotted circle, for example, the dotted circle is used for representing a cluster composed of N network users, for example, the group with the label of 0 is a normal group, for example, and there are 10 groups with different sizes with labels of 1-10 in a smaller dotted circle, and the size of the circle is proportional to the number of members of the group, that is, the large group represents a larger number of members, and the small group represents a smaller number of members, for example, the group with the labels of 1-10 is an abnormal group. In different embodiments, the geometric figures of the groups may be of arbitrary shape. The colors of the geometric figures may be randomly set or related to the number of groups or members of a group. For example, N colors are preset, and the fraud detection system randomly corresponds different colors to the geometric figures representing the groups. For another example, the fraud event detection system sequentially corresponds to the geometric figures representing each group according to a preset color sequence and a sequence from small member number to large member number. When a user selects a geometric figure by operating the display interface, the fraud event detection system obtains a group of data sets.
In a preferred embodiment, the display of at least one group interface may further include an information bar for displaying group information, and when a user selects one group in the group interface, basic information of the group is displayed in a form of a window or a text box on one side of the interface, where the basic information is, for example: group encoding, number of members, data characteristics for determining the most preferred group, group attributes (such as normal group or abnormal group), etc.
In step S12, a first time axis and a second time axis are created. The first time axis and the second time axis are created according to time information in a data set, for example, if a time span in a plurality of time information in the data set is 10 days at most, a maximum time interval of the first time axis or the second time axis is 10 days. In one embodiment, a first timeline and a second timeline are created in the same time interval and time granularity pair; in another embodiment, the first timeline and the second timeline are created in different time interval and time granularity pairs, as described in detail below.
In step S13, based on the encoding of the data features, a first time axis with a first shape as a node is displayed to characterize the type and number of events that the group occurred within each time granularity of the first time axis. The fraud event detection system counts the number of event types in the data set according to the time granularity of the first time axis, codes the counted event types into a preset graph with a first shape, and presents the coded first shapes on the first time axis as nodes of the first time axis according to a time sequence. By displaying each node on the first time axis, a domain expert can clearly obtain the change process of the event types counted according to time on distribution or quantity. Wherein the first shape includes, but is not limited to: a pie shape, or a cylindrical shape. In some implementation examples, the fraud event detection system may encode a percentage of the number of event types within a time granularity as a first shaped graph and display on a first timeline, where the percentage of event types are the same color. Referring to fig. 4, which is a schematic view of a display interface displayed as a visualization of group data in an embodiment of the present application, as shown in the displayed interface, the first time axis T1 is located in a lower area of the display interface, and is displayed as a time interval from 8 months 1 to 8 months 10 days, with days as time granularity, a percentage distribution of the number of event types counted each day is encoded into a pie graph and is displayed as nodes on the first time axis T1, colors in the pie graph are used to represent event types, such as an event of interest represented as "yellow" in the graph, an event of interest represented as "red" in the graph is a bonus event, an event represented as "blue" in the graph is a bonus event, such as an event of 8 months 7 days displayed with the pie graph as a node on the first time axis T1 in the graph, and a larger proportion of event types are generated, the bonus events are less and the like events are least.
In step S13, based on the encoding of the data features, a second shape is displayed to characterize the total number of each event type occurring within the time interval of the second timeline. The fraud event detection system sums the number of event types in the data set according to a time interval of a second time axis, codes each accumulated event type into a preset graph in a second shape, and displays the total number of each event type in the time interval of the second time axis. Wherein the second shape includes, but is not limited to: histograms, line graphs, etc. The total number of the various event types displayed reflects the comparison of the number of the various event types within the same time interval according to the time interval of the created second time axis. When the time interval of the second time axis represents one day or one week, the user may determine the comparison of the total number of the three event types according to the length of the displayed columnar shape corresponding to the total number of the three event types of "red", "yellow", and "blue". In addition, the displayed bar graph can also determine the comparison of the three event types in the total number according to thickness, transparency and the like. Referring to fig. 3, as shown, a horizontal histogram is displayed on a side (right side in the drawing) adjacent to the first time axis T1, three bars "red", "yellow", and "blue" are displayed in the histogram from top to bottom, and the length of the bar represents the total number of events generated in the time interval of the second time axis, and it can be seen from the second form that the bar marked with "yellow" color in the event type generated in the time interval of the second time axis represents the most interesting events, the bar marked with "red" color represents the next complimentary events, and the bar marked with "blue" color represents the least complimentary events.
By displaying the total number of the types of events occurring within the time interval of the second time axis, the domain expert can clearly obtain the change process of the types of events counted according to time in terms of quantity from another view point. In order to more clearly display the association relationship between the first time axis and the second time axis, in step S13, based on the encoding of the data features, the second time axis is displayed, the event types characterized in the second shape are associated with the time granularities of the event types in the second time axis, and the distribution of the event types characterized by the third shape on the second time axis is displayed. And the second time axis is presented to take the corresponding time granularity as the axis of the node, and the event types distributed at each adjacent node are associated with the second shape by using the third shape, so that the user clearly obtains the association relation between the second shape and each time granularity of the second time axis. The third shape may be a line, and the color of the line may be determined according to the color of the corresponding event type in the second shape, so as to allow a user to clearly distinguish the uniform event type.
Referring back to FIG. 3, the second shape is associated with the second timeline by a third shape, such as an arc, that is spread over the nodes of the second timeline at each time granularity based on the time information for each event type in the data set. For example, in the figure, the dotted line (the first dotted line) represents the association between the gift event represented by the "red" bar and the corresponding time node (time granularity) on the second time axis, the continuous line represents the association between the attention event represented by the "yellow" bar and the corresponding time node (time granularity) on the second time axis, and the line formed by the dots and the line segment (the second dotted line) represents the association between the favorite event represented by the "blue" bar and the corresponding time node (time granularity) on the second time axis. In various embodiments, the third shape describes the number of event types generated within a respective time granularity interval using line thickness or transparency, thereby facilitating presentation of high frequency periods or regularity of event occurrences.
In order to more intuitively display the type and number of events occurring within each time granularity interval on the second time axis, in step S13, a fourth shape is displayed to characterize the type and number of events occurring within each time granularity of the second time axis for the group. The fraud event detection system performs addition or distribution statistics on the number of event types in the data set according to the time interval of the second time axis, codes the accumulated event types or distribution conditions into a preset fourth-shaped graph, and presents the coded fourth shapes on the second time axis as nodes of the second time axis according to a time sequence. Wherein the corresponding fourth shape is displayed under the direction of the third shape according to the time granularity of the created second timeline. Through the display of each node on the second time axis, the user can clearly obtain the change process of the event types counted according to the time in quantity from another view angle. Wherein the fourth shape includes, but is not limited to: a pie shape, or a column shape, and a shape different from the first shape is selected. In some implementation examples, the fraud event detection system may encode and display the number of each event type within the time granularity of the second timeline as a graph of a fourth shape, respectively, on the second timeline, wherein the accumulated sum of the same event types are in the same color as the third shape and the second shape.
The timeline is one of the ways to present group data because it is critical to understand the user's collective behavior over a period of time, whether domain experts or algorithm experts. For this reason, such a concentrated behavior is described by combining the first time axis and the second time axis by executing step S13.
Referring to fig. 3, each pie chart in the first time axis T1 shows the proportion of different event types (e.g., a user is focused on or a gift is sent to the user) at each time granularity (e.g., every day). Encoding each event type into a different color, encoding the number of each event type within a unit time granularity of the first time axis T1 into an area ratio of each region in a pie chart to form one pie chart, encoding the number of each event type within a time interval of the second time axis T2 into a length of a histogram to form a histogram (i.e., a second shape) corresponding to each event type, encoding the number of each event type within a unit time granularity of the second time axis T2 into a length of a histogram to form a separate histogram (i.e., a fourth shape); when the user selects a pie chart on the first time axis T1, arcs (i.e., third shapes) in the color of the event type are ejected from the second shapes corresponding to the event types and correspond to the fourth shapes corresponding to the time granularity on the second time axis T2, thereby clearly presenting the time axis relationship of the event types in a group data set to the user.
In one embodiment, the first timeline and the second timeline are created in the same time interval and time granularity pair. For example, the fraud detection system preloads a first time axis and a second time axis with the same time granularity, so that the fraud detection system corresponds each event type to each time axis according to the time information and the time granularity in the data set, so as to obtain at least one time interval of each time axis. For another example, the fraud event detection system determines the preset time intervals of the first time axis and the second time axis according to the sorting of the time information in the data set, and corresponds each event type to each time axis according to the time information and the time granularity in the data set. Please refer to fig. 3, which shows an interface including a first time axis T1 and a second time axis T2. Wherein the T1 and T2 time axes each have a time granularity of days and each have a time interval of 10 days, the fraud event detection system may display the data characteristics in the data set on the first time axis T1 and the second time axis T2 according to the time information in the data set by performing the above steps. For example, with the second time axis T2 shown in fig. 3 at a time granularity of days, the fraud event detection system may encode the aggregate distribution of each event type counted each day into a histogram and display it as nodes on the second time axis T2.
In another example, the first timeline and the second timeline are created in different time interval and time granularity pairs. And the time interval of the second time axis is the time granularity of the first time axis. For example, the time granularities of the first time axis and the second time axis are preset to be different, and the corresponding relation between the time granularities of the two time axes is preset, and the fraud event detection system corresponds each event type to each time axis according to the time information in the data set. Please refer to fig. 5, which shows an interface including a first time axis T1 and a second time axis T2. Wherein, the T1 time axis takes 10 days as a time interval and takes days as time granularity, and the T2 time axis takes days as a time interval and takes hours as time granularity; the fraud event detection system may display the data features in the data sets on the first time axis T1 and the second time axis T2 according to the time information in the data sets by performing subsequent steps. For example, with the interface C2 shown in fig. 5, the second time axis T2 is time-granular in hours, and the fraud event detection system may encode the sum of the counted event types per hour into a bar graph and display the bar graph as nodes on the second time axis T2.
Encoding the number of event types within the time granularity of the first time axis T1 as the area fraction of regions in the pie chart to form a pie chart; when a user selects one pie chart on the first time axis T1, the number of event types (corresponding to the event types corresponding to the selected pie chart) in the time interval of the second time axis T2 is encoded as the length of a histogram to form a histogram (i.e., a second shape) corresponding to the event types, the number of event types in the unit time granularity of the second time axis T2 is encoded as the length of the histogram to form a separate histogram (i.e., a fourth shape), and an arc (i.e., a third shape) in which the event types are colors is projected from the second shape corresponding to the event types and corresponds to the fourth shapes corresponding to the time granularity on the second time axis T2, thereby clearly presenting the time axis relationship of the event types in one group data set to the user.
As shown in FIG. 4, interface C1, each pie chart in the first time axis T1 shows the percentage of different event types (e.g., a user is focused on or a gift is sent to the user) at each time granularity (e.g., daily). The event types are coded into different colors, for example, colors in the pie graph in the graph are used for representing the event types, for example, the event marked with 'yellow' in the graph is represented as an attention event, the event marked with 'red' in the graph is represented as a gift event, and the event marked with 'blue' in the graph is represented as a like event, for example, the day of 8 months and 7 days displayed on a first time axis T1 by taking the pie graph as a node in the graph, the events in the generated event types are more in attention, the gift events in the generated event types are less in proportion, and the like events in the generated event types are least in proportion. When the user selects the node 8, month and 7 days on the first time axis T1, the event types and the number corresponding to each event type occurring in each hour within 24 hours on the day 8, month and 7 days are displayed on the second time axis T2.
It should be noted that, the time intervals and the time granularities of the first time axis and the second time axis in the above embodiments are not limited to the illustrated cases, and in different embodiments, the user can set the time intervals and the time granularities of the first time axis and the second time axis according to actual situations, for example, the time units are weeks, months, quarters, even years, and the like.
The user can use the presentation process and the presented statistics to detect groups classified by the fraud detection system and use the visual interface to allow domain experts to discover or correct deficiencies in the detection algorithm. In addition, in order to more clearly display the association relationship between the two time axes, the visualization method further comprises dynamically, brightly or dynamically and brightly displaying the distribution of the event types occurring in the time granularity represented by the first shape in the second time axis through the third shape when the first shape is selected. For example, in the interface C1 shown in fig. 4, when the user selects one pie chart on the first time axis T1, each third shape connected to the bar chart corresponding to the selected pie chart on the second time axis T2 blinks for several seconds or more and also blinks or highlights, when the user selects another pie chart on the first time axis T1, the previously blinked and highlighted third shape restores the original shape and color, and each third shape connected to the bar chart corresponding to the selected pie chart on the second time axis T2 blinks for several seconds and highlights.
In some embodiments, when the user selects a first shape on the first time axis, the visualization method may further perform the step of displaying a magnification of the first shape as it is selected so that the user more clearly views a comparison of the number of event types characterized by the first shape. In a specific example, the first shape is displayed enlarged on one side of the first timeline when selected. For example, the selected first shape is displayed in an enlarged scale on the first timeline, as shown in interface C3 of FIG. 6 a. In another specific example, the first shape is displayed in enlargement in the first time axis when selected. For example, the selected first shape is displayed enlarged at the same center of the first time axis, and is shown as an interface C4 shown in fig. 6 b.
In another specific example, when the user selects a pie chart on the first time axis T1, the first shape, when selected, is displayed enlarged on one side of the first time axis TI, and simultaneously, the third shape connected to the bar chart corresponding to the selected pie chart on the second time axis T2 is displayed in blinking for several seconds or longer, as shown in an interface C5 of fig. 6C, when the user selects the pie chart representing 8 month and 7 days on the first time axis T1, the pie chart representing 8 month and 7 days is displayed enlarged on one side of the first time axis TI when selected, and the line connected to the bar chart corresponding to 8 month and 7 days on the second time axis T2 is displayed in blinking for several seconds or longer. For example, in the interface C6 shown in fig. 6d, when the user selects the pie chart representing 8/month/7/day on the first time axis T1, the pie chart representing 8/month/10/day is enlarged and displayed on one side of the first time axis TI when selected, and the bars connected to the bar chart corresponding to 8/month/10/day on the second time axis T2 are highlighted.
In some embodiments, the user is not only concerned with the variation of event types in the group dataset as presented by the timeline, but is more concerned with whether the assigned groups are reasonable, which requires the user to be able to view detailed data features in each group and the preferred order of the data features constructed for categorizing the groups. The visualization method may include the step of displaying an interface of the data sets of one group. The displayed data sets are displayed in a list, thereby displaying detailed information of the data features in the same group for the user. To improve the accuracy of the classification of the group data sets, the list displayed in the interface may display the list of data features in a group by columns according to the classification priority according to which the fraud detection system is classified. For example, please refer to fig. 7, which shows a list interface diagram of a group of data sets displayed in an embodiment of the present application. In the list interface schematic diagram, the displayed data sets of a group are sorted from high to low according to the similarity of data features as priority. When the similarity of the data features in the first priority is the same, the data features in the second priority are sorted, and in the embodiment shown in fig. 7, the priority is in the order from high to low: IP address, source of event occurrence (source), responder of event (target), type of event (event _ type), and time of event occurrence (timestamp). In this embodiment, the head-up (header) of the table is encoded with the importance of different columns, and if the value of a feature is more concentrated, the feature is more important. In one embodiment provided herein, the fraud detection system represents this characteristic by computing the entropy of information for each feature. If the entropy of the information is lower, it means that the consistency is higher. Then, the fraud event detection system sorts the features according to the ascending order of the information entropy, and finally prompts the attention of the user by advancing the order of the list head with low information entropy, and certainly, under different implementation conditions, the color rendering can be performed according to the list head in the displayed table, for example, the color rendering of the list head with low information entropy as the deepest color can be performed to prompt the user to pay the most attention to the data features represented by the column, and so on, the color rendering of other data features represented by the column is performed, and then the data set list interface shown in the figure is obtained. The list interface may be displayed after the step of displaying a plurality of group interfaces or the step S13, or based on a selection operation of the user selecting the list interface.
In some embodiments, to further characterize whether the acquired data set of the group can reflect the characteristics of a fraudulent event, it may be necessary to perform the presentation from other dimensions. The accuracy of the detected fraud events is further confirmed, for example, by comparing the normal user's network operation data to the group data set. To this end, the visualization method further comprises: a step of displaying an interface of feature distributions of the data sets of the group. The feature distribution interface may show the distribution of each data type in the whole network, the whole network is opposite, for example, a cluster is formed by a plurality of network users, the distribution of a certain data feature in a certain group in the cluster may be displayed through the interface, please refer to fig. 2, for example, the maximum dotted circle in fig. 2 represents a cluster formed by a plurality of network users, there are 11 groups in the cluster, each group is numbered 0-10, and one group is selected from the 11 groups for information display.
In some embodiments, the types of data that the feature distribution interface may present are, for example: the entropy of information in the average operation time interval dimension (average operation interval entropy), the entropy of information in the IP address usage dimension (IPused _ amount _ entropy), the entropy of information in the gender dimension (sex _ amount), the entropy of information in the email dimension (email _ amount), the entropy of information in the registration time dimension (reg _ time _ entry), the entropy of information in the operation time dimension (operation time _ entry), the entropy of information in the device number dimension (device _ amount _ entry), the entropy of information in the operation type dimension (operation type _ entry), the maximum entropy of information in which an IP used by others is used (maxIP used _ be _ amount), and the like. In the embodiment shown in fig. 7, the information entropy of the registration time dimension is taken as an example of data characteristics, that is, fig. 7 shows the characteristic distribution of the information entropy of the registration time (registration period) dimension in one group in the network cluster. In order to effectively compare the difference between the obtained group data set and the feature distribution of the network operation data of the normal user, please refer to fig. 8, which shows a flowchart of an interface for displaying the feature distribution of the group data set, as shown in the figure, the method includes the following steps:
in step S211, one of the groups is selected and at least one data feature is determined from the data set of the group. In one embodiment, for example, the group labeled 2 in fig. 3 is selected, and a data characteristic that is user information, for example, registration time, is determined from the data set in the group labeled 2.
In step S212, the feature distribution of the determined at least one data feature in the group and cluster is counted. In this embodiment, the feature distribution of the data feature at the registration time in the group is counted, and the feature distribution of the data feature at the registration time in the whole cluster is counted.
In step S213, a histogram of the feature distribution and a distribution contrast map corresponding to the histogram in the entire cluster histogram are displayed. In this embodiment, based on the encoding of the data feature, a histogram of the feature distribution of the data feature at the registration time in the group is displayed, and a histogram of the feature distribution of the data feature at the registration time in the entire cluster is displayed. Referring to fig. 9, an interface showing a histogram and a comparison chart of the feature distribution of the registration time in a group according to an embodiment of the present application is shown, in which, as shown in the drawing, a thumbnail of the feature distribution of the registration time in the selected group labeled 2 is shown in the interface D, and corresponding to the enlargement of the thumbnail, the thumbnail is an enlargement (D) of the lowest side in the interface D, as seen from the enlargement, in one month from 1 st 8 th to 31 th 8 th in the group, the time of the registration operation by the group member is concentrated on 5 th 8 th 5 th, 6 th 8 th, 11 th 8 th, 12 th 8 th, and 16 th 8 th, and in the interface D, a histogram (c) of the time distribution of the registration operation by the group user in 8 th month is represented in the interface D, and as seen from the diagram (c), the registration distribution of the group user in 8 th month has a certain regularity, in interface D, graph (b) is characterized by overlapping graph (D) and graph (c) to show the difference in registration time of data features in the whole cluster and in the selected group. In order to enable the user to know the differences and the connections among different features, in the embodiment provided by the application, the histogram is presented in three layers, and after the user clicks one of the thumbnails, the page is scrolled to the normalized distribution contrast diagram. Of course, in a particular application, there may be multiple thumbnails of the data features, each representing a different data feature.
In some embodiments, the histogram may be further color-rendered to distinguish or emphasize the distribution of features of a data feature in the group and the entire cluster, or dynamically displayed (e.g., blinking) to distinguish or emphasize the distribution of features of a data feature in the group and the entire cluster.
In some embodiments, in order to further analyze the difference between the plurality of groups in one network cluster, the group data visualization method further includes a step of displaying an interface of the feature distribution of the data set of the plurality of groups, please refer to fig. 10 and 11, fig. 10 is a flowchart illustrating the step of the present application in one embodiment of distributing the plurality of groups in the cluster, and fig. 11 is a flowchart illustrating the step of the present application in one embodiment of distributing the plurality of groups in the cluster, as shown in the figure, the steps include:
in step S311, a plurality of groups are determined in a cluster composed of a plurality of network users, and the groups are characterized by different shapes, icons, labels and/or colors, respectively; in one embodiment, for example, 3 groups numbered 0, 1 and 2 in fig. 3 are selected, wherein the group numbered 0 is represented by "green", the group numbered 1 is represented by "red", and the group numbered 2 is represented by "blue".
In step S312, at least one data feature is determined from the plurality of groups of data sets; in this embodiment, a data characteristic, such as an IP address, is determined from the 3 groups of data sets.
In step S313, analyzing relative entropy between each two network users in each group as a measure of similarity between each two network users based on the at least one data feature; in the present embodiment, the relative entropy (entropy of IP usage amount dimension) between each two network users in 3 groups of reference numerals 0, 1, and 2 is analyzed based on the IP address as a measure of the degree of similarity between the each two network users. For example, a data dimension reduction method t-SNE (t-distribution neighborhood embedding algorithm) is adopted, and relative entropy between two users is used as an index for measuring the distance between the network users.
In step S314, a display interface is output, in which the network users are characterized by shapes, icons, and/or labels, the differences of the plurality of groups are characterized by different colors, and the degree of similarity between two network users in each group is characterized by the displayed distance. In this embodiment, as shown in fig. 11, in an interface E, a dot is used to represent a network user, a "green" color represents a group with a reference number 0, a "red" color represents a group with a reference number 1, and a "blue" color represents a group with a reference number 2, wherein the "blue" color represents a group with a reference number 2, in which the user distance is relatively short, and the group is distributed in a cluster, the "red" color represents a group with a reference number 1, in which the user distance is also relatively short, and the group is distributed in a cluster, and the "green" color represents a distribution of randomly sampled normal users, and the distance between the normal users is relatively long, and the distribution is more dispersed. It can be considered that the greater the probability that a group, if it is a dense cluster, is considered to be a fraudulent group. For example, in the embodiment shown in fig. 11, the groups represented by the "green" colors are distributed more dispersedly, and thus the "green" color group is a normal group, and the user represented by the "green" dot is also a normal user. On the contrary, the group represented by "red" (i.e. the group labeled 1) and the group represented by "blue" (i.e. the group labeled 2) are distributed in a cluster, which means that the "red" and "blue" color groups are abnormal groups, wherein the users represented by the "red" and "blue" dots are abnormal users. In one embodiment, a user using the visualization system can interactively view the specific information and feature values of the users in each group by hovering over a mouse.
In other embodiments, in the output interface, the network user may also be represented by, for example, a shape, an icon, and/or a label, for example, a geometric figure with a shape of triangle, rectangle, etc., for example, an icon is a smiling face or a crying face, a skeleton avatar, a pirate avatar, etc., for example, a label is a character or a symbol with definite distinction, etc.
According to the group data visualization method, the data set of the group determined in the fraudulent event detection process is presented based on the time axis, type distribution, classification list and other modes, so that the data characteristics of the group during the detection of the fraudulent event are displayed in various relation interfaces, and the method is beneficial for field experts and algorithm experts to evaluate and revise the detection algorithm of the fraudulent event detection system.
The present application also provides a computer device that may be a suitable computer device such as a handheld computer device, a tablet computer device, a notebook computer, a desktop computer, a server, or the like. The computer device includes a display, input means, input/output (I/O) ports, one or more processors, memory, non-volatile storage, network interfaces, and power supplies, among others. The various components described may include hardware elements (e.g., chips and circuits), software elements (e.g., a tangible, non-transitory computer-readable medium storing instructions), or a combination of hardware and software elements. Further, it is noted that the various components may be combined into fewer components or separated into additional components. For example, the memory and the non-volatile storage device may be included in a single component. The computer device can execute the visualization method alone or in cooperation with other computer devices.
Referring to fig. 12, which is a schematic diagram of an embodiment of a computer device according to the present application, as shown in the drawing, in the present embodiment, the computer device 1 includes one or more processors and a rendering engine executed on the processors, for executing the above visualization method and presenting a corresponding visualization interface, for example, a computer device includes a processor, a display, and a rendering engine executed on the processor, wherein the rendering engine (or display engine) is executed on the processor, and is configured to execute the group data visualization method described in the above embodiment and display the group data visualization method through the display, and the description of the implementation process of executing the group data visualization method refers to the description of fig. 1 to 11.
The present application further provides a client, where the client is connected to a server through a network, and in this embodiment, the client is, for example, a web client, and the client is, for example, a web server, and the web client sends a web service request to log in the web server to execute the group data visualization method described in the above embodiments and displays the group data visualization method through a display, and the description of the implementation process of executing the group data visualization method refers to the description of fig. 1 to fig. 11.
The present application further provides a server, which is connected to a client through a network, in this embodiment, the client is, for example, a web client, and the client is, for example, a web server, and the web server, based on an operation of a request executed by the web client, sends the group data visualization method described in the above embodiments to the client and displays the group data visualization method through a display, and reference is made to the description of the implementation process of executing the group data visualization method with reference to fig. 1 to 11.
The present application further provides a browser, which is connected to a server through a network, and the browser logs in the server to execute the group data visualization method described in the above embodiment based on a sending request and displays the group data visualization method through a display, and reference is made to the description of fig. 1 to 11 for the description of the implementation process of executing the group data visualization method. In the present embodiment, the browser is, for example, a web browser, including but not limited to a QQ browser, an Internet Explorer browser, a Firefox browser, a Safari browser, an Opera browser, a Google Chrome browser, a hundredth browser, a dog hunting browser, a leopard hunting browser, a 360 browser, a UC browser, an audact browser, a window around the world browser, and the like.
The present application also provides a group data visualization system that may include software and hardware in one or more computer devices. In order to provide the user with the behavior of a fraudulent group over different time periods, the responses are made to "what a group does as a fraudulent group" proposed by the domain expert and "whether the users of the same group all have the same behavior habit" proposed by the algorithm expert. The application provides a visual group data visualization system from a timeline. Please refer to fig. 13, which is a schematic diagram illustrating a module structure of a group data visualization system according to the present application. As shown, the cohort data visualization system 3 includes an acquisition module 31, a processing module 32, and a display module 33.
The obtaining module 31 is configured to obtain a group of data sets. The data features in the dataset include at least an event type and temporal information associated with the event type.
In some embodiments, the obtaining module 31 obtains the operation log of a cluster formed by a plurality of network users, and in different embodiments, the cluster is a cluster formed by all network users that can be obtained, and the network users in the cluster are from the same website or different websites, or from different network channels, such as the internet, one or more intranets, local area networks (L AN), wide area networks (W L AN), storage local area networks (SAN), and the like, or a suitable combination thereof, or a mobile communication network of a mobile phone, and the like.
The obtaining module 31 delivers the obtained operation log to the processing module 32, and the processing module 32 determines at least one data feature from the operation logs of the plurality of network users and analyzes the similarity of at least one set of data features in the operation log to determine the group. In a specific embodiment, aiming at the characteristic that the network fraud behavior inevitably leaves user usage data in the network, the operation logs of a plurality of network users from at least one website are collected in the group data visualization system, and the processing module 32 groups the users generating the corresponding operation logs by analyzing the similarity of at least one data feature in the operation logs to obtain the group and the data set of the group in the operation logs.
In some embodiments, the data sets located in a group include, but are not limited to, data characteristics of at least two of user information, IP address, event type, source of event occurrence, event responder, and event occurrence time. The user information includes representations such as a mobile phone number, a mailbox, an ID number, an identity card number, a gender, a user equipment number used by a user, registration time and the like. Wherein the same user information may correspond to at least one event type, each event type corresponding to an event origin, an event responder and an event occurrence time. The event features include, but are not limited to: the network users perform at least one of social behaviors such as attention, praise, comment and give away (or referred to as gift sending) among the network users, and operation behaviors such as login, logout, state update, registration and information modification among the network users. For example, the same user information may correspond to a plurality of event types, each corresponding to a respective event origin, event responder and event occurrence time.
The processing module 32 may store the resulting data sets for each group in a database. In some embodiments, the data set may be obtained from a database storing the groups and the data set thereof, for example, the database may be configured on a remote storage server or in a storage device in a local computer device, and the obtaining module 31 may obtain the data set by extracting the data set from the database based on an input operation of a user. For example, the processing module 32 obtains a plurality of groups by using an unsupervised detection algorithm, and the user selects one of the groups through the selection interface to obtain the data set of the corresponding group.
Specifically, the processing module 32 calculates similarity of all data in the operation log in the same type of data features, where the similarity can be measured by using information entropy, for example, the processing module 32 calculates information entropy of IP usage or maximum IP usage dimension by using user information, calculates information entropy of operation type dimension by using event type, calculates information entropy of bad operation dimension by using information entropy of registration time dimension or operation time, and so on; by the above calculation, the processing module 32 detects and divides the obtained information entropies into a plurality of groups by using an unsupervised detection method. The unsupervised detection mode includes, for example, using a dense subgraph-based algorithm or a vector space-based algorithm. The groups presented by the visualization method provided by the application are used for reflecting shared resources, user relations and the like used by a fraud event, so that a user using the group data visualization system 3 can more clearly determine whether the classification strategy in the unsupervised detection algorithm is reasonable. Wherein, the shared resources include but are not limited to shared IP, mailbox, etc., and the user relationship includes but is not limited to: user attention, interaction, etc.
In one embodiment, the display module 33 in the group data visualization system 3 displays at least one group interface, the group size in the group interface being characterized by the displayed geometric size. Referring to fig. 3, which shows an interface including a plurality of groups, as shown in the figure, 11 groups are displayed in the interface, the geometric figure for representing the groups is a circle, the 11 groups are all located in a maximum dotted circle, for example, the dotted circle is used to represent a cluster composed of N network users, for example, the group with the number 0 is a normal group, and 10 groups with different sizes with the number 1-10 are located in a smaller dotted circle, the size of the circle is proportional to the number of members of the group, that is, a large group represents a larger number of members, and a small group represents a smaller number of members, for example, the group with the number 1-10 is an abnormal group. In different embodiments, the geometric figures of the groups may be of arbitrary shape. The colors of the geometric figures may be randomly set or related to the number of groups or members of a group. For example, N colors are preset, and the processing module 32 randomly encodes different colors onto the geometric figures representing the groups and displays the geometric figures on the display device through the display module 33. For another example, the processing module 32 sequentially encodes geometric figures representing each group according to a preset color sequence and a sequence from small to large of the number of members, and displays the geometric figures on the display device through the display module 33. When a user operates the display interface to select a geometric figure, the obtaining module 31 obtains a group of data sets.
In a preferred embodiment, the display module 33 displays an information bar that may further include group information in at least one group interface, and when a user selects one group in the group interface, basic information of the group is displayed in a form of a window or a text box on one side of the interface, where the basic information is, for example: group encoding, number of members, data characteristics for determining the most preferred group, group attributes (such as normal group or abnormal group), etc. The display module includes, for example, a display.
In order to describe the analysis results of the data set of the acquired group by the group data visualization system 3 in a time axis manner, the processing module 32 is used for creating a first time axis and a second time axis and encoding the data characteristics. The display module 33 displays a first time axis and a second time axis and displays a first shape, a second shape, a third shape and a fourth shape in one interface through a display device, wherein the first shape is used as a node of the first time axis to represent the type and the number of events occurring in the group in each time granularity of the first time axis; the second shape characterizes a total number of each event type occurring within a time interval of the second timeline; the third shape characterizes a distribution of event types characterized in the second shape on the second timeline; the fourth shape characterizes a type and number of events that occur for the group within each time granularity of the second timeline. The display device can be a display screen externally connected or integrated with the computer device, a driver of the display screen, and a presentation engine specially configured for processing display data; the presentation engine includes, but is not limited to: an image processing chip, a display program running in the image processing chip, and the like.
The first time axis and the second time axis are created according to time information in a data set, for example, if a time span in a plurality of time information in the data set is 10 days at most, a maximum time interval of the first time axis or the second time axis is 10 days. In one embodiment, a first timeline and a second timeline are created in the same time interval and time granularity pair; in another embodiment, the first timeline and the second timeline are created in different time interval and time granularity pairs, as described in detail below.
The processing module 32 performs patterned coding on all data to be presented, such as data characteristics, event types, and the number of event types, so that the presented interface is beautiful and clear. Here, the processing module 32 counts the number of event types in the data set according to the time granularity of the first time axis, encodes the counted event types into a preset graph of first shapes, and presents each encoded first shape on the first time axis as a node of the first time axis according to a time sequence by the display module 33. By displaying each node on the first time axis, a domain expert can clearly obtain the change process of the event types counted according to time on distribution or quantity. Wherein the first shape includes, but is not limited to: a pie shape, or a cylindrical shape. In some example implementations, the processing module 32 can encode a percentage of the number of event types within a time granularity into a first shaped graph and display the graph on the first timeline by the display module 33, wherein the color of the percentage of event types is the same. Referring to fig. 4, fig. 4 is a schematic diagram of the group data visualization system in an embodiment, as shown in the displayed interface, the first time axis T1 is located in the lower region of the display interface, and is displayed as a time interval from 8 months 1 day to 8 months 10 days, and the percentage distribution of the number of event types counted each day is encoded into a pie graph with days as time granularity and is displayed as nodes on the first time axis T1, the color in the pie graph is used to represent the event type, such as the event of interest represented by "yellow" color in the graph, the event of interest represented by "red" color in the graph is a gift event, the event represented by "blue" color in the graph is a click event, such as the day of 8 months 7 days displayed with the pie graph as nodes on the first time axis T1 in the diagram, the event types generated are more events of interest, the bonus events are less and the like events are least.
In addition, the processing module 32 sums the number of event types in the data set according to the time interval of the second time axis, encodes each of the accumulated event types into a preset second-shaped graph, and displays the total number of the event types in the time interval of the second time axis through the display module 33. Wherein the second shape includes, but is not limited to: histograms, line graphs, etc. The total number of the various event types displayed reflects the comparison of the number of the various event types within the same time interval according to the time interval of the created second time axis. When the time interval of the second time axis represents one day or one week, the user may determine the comparison of the total number of the three event types according to the length of the displayed columnar shape corresponding to the total number of the three event types of "red", "yellow", and "blue". In addition, the displayed bar graph can also determine the comparison of the three event types in the total number according to thickness, transparency and the like. Referring to fig. 3, as shown, a horizontal histogram is displayed on a side (right side in the drawing) adjacent to the first time axis T1, three bars "red", "yellow", and "blue" are displayed in the histogram from top to bottom, and the length of the bar represents the total number of events generated in the time interval of the second time axis, and it can be seen from the second form that the bar marked with "yellow" color in the event type generated in the time interval of the second time axis represents the most interesting events, the bar marked with "red" color represents the next complimentary events, and the bar marked with "blue" color represents the least complimentary events.
By displaying the total number of the types of events occurring within the time interval of the second time axis, the domain expert can clearly obtain the change process of the types of events counted according to time in terms of quantity from another view point. In order to more clearly display the association relationship between the first time axis and the second time axis, the processing module 32 displays the second time axis by the display module 33 based on the coding of the data features, and the processing module 32 further associates the event type represented in the second shape with each time granularity of the event type on the second time axis, and displays the distribution of each event type represented in the third shape on the second time axis by the display module 33. And the second time axis is presented to take the corresponding time granularity as the axis of the node, and the event types distributed at each adjacent node are associated with the second shape by using the third shape, so that the user clearly obtains the association relation between the second shape and each time granularity of the second time axis. The third shape may be a line shape, and the color of the third shape may be determined according to the color of the second shape corresponding to the event type, so as to allow the user to clearly distinguish the uniform event type.
Referring back to FIG. 3, the second shape is associated with the second timeline by a third shape, such as an arc, that is spread over the nodes of the second timeline at each time granularity based on the time information for each event type in the data set. For example, in the figure, the dotted line (the first dotted line) represents the association between the gift event represented by the "red" bar and the corresponding time node (time granularity) on the second time axis, the continuous line represents the association between the attention event represented by the "yellow" bar and the corresponding time node (time granularity) on the second time axis, and the line formed by the dots and the line segment (the second dotted line) represents the association between the favorite event represented by the "blue" bar and the corresponding time node (time granularity) on the second time axis. In various embodiments, the third shape describes the number of event types generated within a respective time granularity interval using line thickness or transparency, thereby facilitating presentation of high frequency periods or regularity of event occurrences.
To more intuitively display the type and number of events occurring within each time granularity interval on the second timeline, the display module 33 also displays a fourth shape under the control of the processing module 32 to characterize the type and number of events occurring within each time granularity of the second timeline for the group. The processing module 32 performs addition or distribution statistics on the number of event types in the data set according to the time interval of the second time axis, encodes the accumulated event types or distribution conditions into a preset graph with a fourth shape, and the display module 33 presents the encoded fourth shapes on the second time axis as nodes of the second time axis according to a time sequence. Wherein the processing module 32 controls the display module 33 to display the corresponding fourth shape under the guidance of the third shape according to the time granularity of the created second time axis. Through the display of each node on the second time axis, the user can clearly obtain the change process of the event types counted according to the time in quantity from another view angle. Wherein the fourth shape includes, but is not limited to: a pie shape, or a column shape, and a shape different from the first shape is selected. In some implementation examples, the processing module 32 may encode the number of accumulated sums of each event type within the time granularity of the second time axis into a graph of a fourth shape and display the graph on the second time axis by the display module 33, respectively, wherein the accumulated sums of the same event types are in the same color as the third shape and the second shape.
The timeline is one of the ways to present group data because understanding the user's concentration over a period of time is critical, whether it be a domain expert or an algorithm expert. For this reason, a combination of the first time axis and the second time axis is required to describe this concentrated behavior.
Referring to fig. 3, each pie chart in the first time axis T1 shows the proportion of different event types (e.g., a user is focused on or a gift is sent to the user) at each time granularity (e.g., every day). The processing module 32 encodes each event type into a different color, encodes the number of each event type within the unit time granularity of the first time axis T1 into an area ratio of each region in the pie chart to form one pie chart, encodes the number of each event type within the time interval of the second time axis T2 into the length of the histogram to form a histogram (i.e., a second shape) corresponding to each event type, and encodes the number of each event type within the unit time granularity of the second time axis T2 into the length of the histogram to form a separate histogram (i.e., a fourth shape); when the user selects a pie chart on the first time axis T1, arcs (i.e., third shapes) in the color of the event type are ejected from the second shapes corresponding to the event types and correspond to the fourth shapes corresponding to the time granularity on the second time axis T2, thereby clearly presenting the time axis relationship of the event types in a group data set to the user.
In one embodiment, the first timeline and the second timeline are created in the same time interval and time granularity pair. For example, the processing module 32 pre-loads a first time axis and a second time axis with the same time granularity, so that the display module 33 corresponds each event type to each time axis according to the time information and the time granularity in the data set, so as to obtain at least one time interval of each time axis. For another example, the processing module 32 determines the preset time intervals of the first time axis and the second time axis according to the sorting of the time information in the data set, and the display module 33 corresponds each event type to each time axis according to the time information and the time granularity in the data set. Please refer to fig. 3, which shows an interface including a first time axis T1 and a second time axis T2. The T1 and T2 time axes both use days as time granularity, and both use 10 days as time intervals, and the processing module 32 displays data features in the data sets on the first time axis T1 and the second time axis T2 according to time information in the data sets. For example, as shown in fig. 3, the second time axis T2 takes days as time granularity, and the processing module 32 may encode the sum of each event type counted each day into a bar graph and display the bar graph as nodes on the second time axis T2 by the display module 33.
In another example, the first timeline and the second timeline are created in different time interval and time granularity pairs. And the time interval of the second time axis is the time granularity of the first time axis. For example, the processing module 32 presets that the time granularities of the first time axis and the second time axis are different, and presets the corresponding relationship between the time granularities of the two time axes, and the display module 33 corresponds each event type to each time axis according to the time information in the data set. Please refer to fig. 5, which shows an interface including a first time axis T1 and a second time axis T2. Wherein, the T1 time axis takes 10 days as a time interval and takes days as time granularity, and the T2 time axis takes days as a time interval and takes hours as time granularity; the display module 33 displays the data features in the data set on the first time axis T1 and the second time axis T2 according to the time information in the data set. For example, with the interface C2 shown in fig. 5, the second time axis T2 takes hours as time granularity, and the processing module 32 may encode the sum of the counted event types per hour into a bar graph and display the bar graph as a node on the second time axis T2 by the display module 33.
The processing module 32 encodes the number of event types within the time granularity of the first time axis T1 as the area fraction of the regions in the pie chart to form a pie chart; when a user selects one pie chart on the first time axis T1, the number of event types (corresponding to the event types corresponding to the selected pie chart) in the time interval of the second time axis T2 is encoded as the length of a histogram to form a histogram (i.e., a second shape) corresponding to the event types, the number of event types in the unit time granularity of the second time axis T2 is encoded as the length of the histogram to form a separate histogram (i.e., a fourth shape), and an arc (i.e., a third shape) in which the event types are colors is projected from the second shape corresponding to the event types and corresponds to the fourth shapes corresponding to the time granularity on the second time axis T2, thereby clearly presenting the time axis relationship of the event types in one group data set to the user.
As shown in FIG. 4, interface C1, each pie chart in the first time axis T1 shows the percentage of different event types (e.g., a user is focused on or a gift is sent to the user) at each time granularity (e.g., daily). The event types are coded into different colors, for example, colors in the pie graph in the graph are used for representing the event types, for example, the event marked with 'yellow' in the graph is represented as an attention event, the event marked with 'red' in the graph is represented as a gift event, and the event marked with 'blue' in the graph is represented as a like event, for example, the day of 8 months and 7 days displayed on a first time axis T1 by taking the pie graph as a node in the graph, the events in the generated event types are more in attention, the gift events in the generated event types are less in proportion, and the like events in the generated event types are least in proportion. When the user selects the node 8, month and 7 days on the first time axis T1, the event types and the number corresponding to each event type occurring in each hour within 24 hours on the day 8, month and 7 days are displayed on the second time axis T2.
It should be noted that, the time intervals and the time granularities of the first time axis and the second time axis are not limited to the illustrated cases, and in different embodiments, the user can set the time intervals and the time granularities of the first time axis and the second time axis according to actual situations, such as time units of week, month, quarter, even year, and the like.
The user can use the presentation process and the presented statistics to detect groups classified by the group data visualization system and use the visualization interface to allow domain experts to discover or correct deficiencies in the detection algorithm. In addition, in order to more clearly display the association relationship between the two time axes, the group data visualization system further includes a first detection module (not shown). Detecting a distribution of event types occurring within a time granularity at which the first shape representation is dynamically, highlighted, or both dynamically and highlighted by the third shape in the second timeline when the first shape is selected by a user based on the first detection module. For example, in the interface C1 shown in fig. 4, when the user selects one pie chart on the first time axis T1, each third shape connected to the bar chart corresponding to the selected pie chart on the second time axis T2 blinks for several seconds or more and also blinks or highlights, when the user selects another pie chart on the first time axis T1, the previously blinked and highlighted third shape restores the original shape and color, and each third shape connected to the bar chart corresponding to the selected pie chart on the second time axis T2 blinks for several seconds and highlights.
In certain embodiments, the cohort data visualization system further comprises a second detection module. When it is detected that the user selects the first shape based on the second detection module, the display module 33 displays a zoom-in when the first shape is selected, so that the user more clearly views a contrast of the number of event types characterized by the first shape. In a specific example, the first shape is displayed enlarged on one side of the first timeline when selected. For example, the selected first shape is displayed in an enlarged scale on the first timeline, as shown in interface C3 of FIG. 6 a. In another specific example, the first shape is displayed in enlargement in the first time axis when selected. For example, the selected first shape is displayed enlarged at the same center of the first time axis, and is shown as an interface C4 shown in fig. 6 b.
In some embodiments, the user is not only concerned with the variation of event types in the group dataset as presented by the timeline, but is more concerned with whether the assigned groups are reasonable, which requires the user to be able to view detailed data features in each group and the preferred order of the data features constructed for categorizing the groups. The display module 33 is also used to display an interface of the data sets of a group. The displayed data sets are displayed in a list, thereby displaying detailed information of the data features in the same group for the user. In order to improve the classification accuracy of the group data sets, the list displayed in the interface can display the data feature list in one group row by row according to the classification priority of the group data visualization system during classification. For example, please refer to FIG. 6, which shows a schematic diagram of a list interface of a data set as a group. In the list interface schematic diagram, the displayed data sets of a group are sorted from high to low according to the similarity of data features as priority. When the similarity of the data features in the first priority is the same, the data features in the second priority are sorted, and in the embodiment shown in fig. 7, the priority is in the order from high to low: IP address, source of event occurrence (source), responder of event (target), type of event (event _ type), and time of event occurrence (timestamp). In this embodiment, the processing module 32 encodes the head-up of the table with the importance of different columns, and if the value of a feature is more concentrated, the more important the feature is. In one embodiment provided herein, the cohort data visualization system represents this characteristic by computing the entropy of information for each feature. If the entropy of the information is lower, it means that the consistency is higher. Then, the processing module 32 sorts the features according to the ascending order of the information entropy, and finally, the display module 33 prompts the user to notice the list head with low information entropy in the front order, and in different implementation cases, the color rendering may be performed according to the list head in the displayed table, for example, the color rendering of the list head with low information entropy is performed to be the deepest to prompt the user to notice that the data feature represented by the column is the most important, and so on, the color rendering is performed on the other data features represented by the column, so as to obtain the data set list interface shown in the figure. The list interface may be displayed after displaying a plurality of group interfaces or a timeline display interface, or based on a selection operation by a user selecting the list interface.
In some embodiments, to further characterize whether the acquired data set of the group can reflect the characteristics of a fraudulent event, it may be necessary to perform the presentation from other dimensions. The accuracy of the detected fraud events is further confirmed, for example, by comparing the normal user's network operation data to the group data set. To this end, the display module 33 is also configured to display an interface of the feature distribution of the data sets of the group, a histogram of the feature distribution, and a distribution contrast map corresponding to the histogram in the entire cluster histogram. The feature distribution interface may show the distribution of each data type in the whole network, the whole network is opposite, for example, a cluster is formed by a plurality of network users, the distribution of a certain data feature in a certain group in the cluster may be displayed through the interface, please refer to fig. 2, for example, the maximum dotted circle in fig. 2 represents a cluster formed by a plurality of network users, there are 11 groups in the cluster, each group is numbered 0-10, and one group is selected from the 11 groups for information display.
In some embodiments, the types of data that the feature distribution interface may present are, for example: the entropy of information in the average operation time interval dimension (average operation interval entropy), the entropy of information in the IP address usage dimension (IPused _ amount _ entropy), the entropy of information in the gender dimension (sex _ amount), the entropy of information in the email dimension (email _ amount), the entropy of information in the registration time dimension (reg _ time _ entry), the entropy of information in the operation time dimension (operation time _ entry), the entropy of information in the device number dimension (device _ amount _ entry), the entropy of information in the operation type dimension (operation type _ entry), the maximum entropy of information in which an IP used by others is used (maxIP used _ be _ amount), and the like. In the embodiment shown in fig. 7, the information entropy of the registration time dimension is taken as an example of data characteristics, that is, fig. 7 shows the characteristic distribution of the information entropy (registration period) of the registration time dimension in one group in the network cluster. In order to effectively compare the feature distribution difference between the acquired group data set and the network operation data of the normal user, as shown in fig. 8, the processing module 32 performs the following steps to obtain data for displaying a feature distribution histogram and a distribution contrast map corresponding to the histogram in the whole cluster histogram, and further displays the data by the display module 33.
In step S211, one of the groups is selected and at least one data feature is determined from the data set of the group. In one embodiment, for example, the group labeled 2 in fig. 3 is selected, and a data characteristic that is user information, for example, registration time, is determined from the data set in the group labeled 2.
In step S212, the feature distribution of the determined at least one data feature in the group and cluster is counted. In this embodiment, the feature distribution of the data feature at the registration time in the group is counted, and the feature distribution of the data feature at the registration time in the whole cluster is counted.
In step S213, a histogram of the feature distribution and a distribution contrast map corresponding to the histogram in the entire cluster histogram are displayed. In this embodiment, based on the encoding of the data feature, a histogram of the feature distribution of the data feature at the registration time in the group is displayed, and a histogram of the feature distribution of the data feature at the registration time in the entire cluster is displayed. Referring to fig. 9, an interface showing a histogram and a comparison chart of the feature distribution of the registration time in a group according to an embodiment of the present application is shown, in which, as shown in the drawing, a thumbnail of the feature distribution of the registration time in the selected group labeled 2 is shown in the interface D, and corresponding to the enlargement of the thumbnail, the thumbnail is an enlargement (D) of the lowest side in the interface D, as seen from the enlargement, in one month from 1 st 8 th to 31 th 8 th in the group, the time of the registration operation by the group member is concentrated on 5 th 8 th 5 th, 6 th 8 th, 11 th 8 th, 12 th 8 th, and 16 th 8 th, and in the interface D, a histogram (c) of the time distribution of the registration operation by the group user in 8 th month is represented in the interface D, and as seen from the diagram (c), the registration distribution of the group user in 8 th month has a certain regularity, in interface D, graph (b) is characterized by overlapping graph (D) and graph (c) to show the difference in registration time of data features in the whole cluster and in the selected group. In order to enable the user to know the differences and the connections among different features, in the embodiment provided by the application, the histogram is presented in three layers, and after the user clicks one of the thumbnails, the page is scrolled to the normalized distribution contrast diagram. Of course, in a particular application, there may be multiple thumbnails of the data features, each representing a different data feature.
In some embodiments, the display module 33 may also distinguish or emphasize the feature distribution of a certain data feature in the group and the whole cluster by color rendering the histogram, or dynamically display (such as blinking) the data feature in the group and the whole cluster.
In some embodiments, in order to further analyze the difference between the groups in a network cluster, the display module 33 further displays an interface of feature distribution of the data sets of the groups, please refer to fig. 10 and 11, fig. 10 shows a step of the present application in one embodiment in which the groups are distributed in the cluster, fig. 11 shows a step of the present application in one embodiment in which the groups are distributed in the cluster, as shown in the figure, the processing module 32 executes according to the step shown in fig. 10, and the display module 33 displays the interface shown in fig. 11.
In step S311, a plurality of groups are determined in a cluster composed of a plurality of network users, and the groups are characterized by different shapes, icons, labels and/or colors, respectively; in one embodiment, for example, 3 groups numbered 0, 1 and 2 in fig. 3 are selected, wherein the group numbered 0 is represented by "green", the group numbered 1 is represented by "red", and the group numbered 2 is represented by "blue".
In step S312, at least one data feature is determined from the plurality of groups of data sets; in this embodiment, a data characteristic, such as an IP address, is determined from the 3 groups of data sets.
In step S313, analyzing relative entropy between each two network users in each group as a measure of similarity between each two network users based on the at least one data feature; in the present embodiment, the relative entropy (entropy of IP usage amount dimension) between each two network users in 3 groups of reference numerals 0, 1, and 2 is analyzed based on the IP address as a measure of the degree of similarity between the each two network users. For example, a data dimension reduction method t-SNE (t-distribution neighborhood embedding algorithm) is adopted, and relative entropy between two users is used as an index for measuring the distance between the network users.
In step S314, a display interface is output, in which the network users are characterized by shapes, icons, and/or labels, the differences of the plurality of groups are characterized by different colors, and the degree of similarity between two network users in each group is characterized by the displayed distance. In this embodiment, as shown in fig. 11, in an interface E, a dot is used to represent a network user, a "green" color represents a group with a reference number 0, a "red" color represents a group with a reference number 1, and a "blue" color represents a group with a reference number 2, wherein the "blue" color represents a shorter user distance in the group with a reference number 2, the group is distributed in a cluster, the "red" color represents a shorter user distance in the group with a reference number 1, the group is distributed in a cluster, the "green" color represents a distribution of randomly sampled normal users, and the distance between the normal users is longer and the distribution is more dispersed. It can be considered that the greater the probability that a group, if it is a dense cluster, is considered to be a fraudulent group. For example, in the embodiment shown in fig. 11, the groups represented by the "green" colors are distributed more dispersedly, and thus the "green" color group is a normal group, and the user represented by the "green" dot is also a normal user. On the contrary, the group represented by "red" (i.e. the group labeled 1) and the group represented by "blue" (i.e. the group labeled 2) are distributed in a cluster, which means that the "red" and "blue" color groups are abnormal groups, wherein the users represented by the "red" and "blue" dots are abnormal users. In one embodiment, a user using the visualization system can interactively view the specific information and feature values of the users in each group by hovering over a mouse.
In other embodiments, the network user may also be characterized in the output interface by, for example, a shape, an icon, and/or a label, for example, a geometric figure with a shape of triangle, rectangle, etc., for example, an icon is a smiling face or a crying face, for example, a label is a character or a symbol with definite distinction, etc.
It should be noted that all modules in the group data visualization system may be configured on a single computer device. Or all modules in the group data visualization system are respectively configured on a client side of a user side and a server of a network side, and the client side is connected with the server through a network. For example, an acquisition module and a processing module of the group data visualization system are installed in a server, a display module is installed in a client, the client logs in the server based on a request sent, and the server runs the group data visualization system to the client based on an operation of the client executing the request and displays a corresponding interface through the client. The clients include but are not limited to: an interface of a browser or dedicated client software provided in the user terminal, and hardware for executing a display interface program.
It should also be noted that, through the above description of the embodiments, those skilled in the art can clearly understand that part or all of the present application can be implemented by software and combined with necessary general hardware platform. With this understanding in mind, the technical solutions of the present application and/or portions thereof that contribute to the prior art may be embodied in the form of a software product that may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may cause the one or more machines to perform operations in accordance with embodiments of the present application. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc-read only memories), magneto-optical disks, ROMs (read only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It should be noted that, as will be understood by those skilled in the art, the above-mentioned part of the components may be Programmable logic devices, including one or more of Programmable Array logic (PA L) Array L logic, general Array logic (GA L) Array L logic, Field-Programmable Gate Array (FPGA), and Complex Programmable logic Device (CP L D), and the present application is not limited in particular.
In summary, the data sets of the groups in the fraudulent event detection process are presented based on the time axis, type distribution, classification list and other modes, so that the data characteristics of the groups in the fraudulent event detection period are displayed in various relation interfaces, and the method is beneficial for field experts and algorithm experts to evaluate and revise the detection algorithm of the fraudulent event detection system.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims (27)

1. A group data visualization method is applied to a fraud event detection system and is characterized by comprising the following steps:
acquiring a group of data sets, wherein the data characteristics in the data sets at least comprise event types and time information associated with the event types;
creating a first time axis and a second time axis according to the time information in the data set;
displaying a first timeline with a first shape as a node to characterize a type and number of events that occurred within each time granularity of the first timeline based on encoding of the data features;
displaying a second shape to characterize a total number of each event type occurring within a time interval of the second timeline;
displaying a second time axis, associating the event types characterized in the second shape with the time granularities of the event types in the second time axis, and distributing the event types characterized by a third shape on the second time axis; and
a fourth shape is displayed to characterize the type and number of events that the group occurred within each time granularity of the second timeline.
2. The method of claim 1, wherein the step of obtaining a group of data sets comprises:
acquiring an operation log of a cluster formed by a plurality of network users;
determining at least one data feature from the operation logs of the plurality of network users, and analyzing the similarity of at least one group of data features in the operation logs to determine the group; and
a data set for the group is obtained.
3. The method for visualizing group data as in claim 1 or 2, further comprising the step of displaying at least one group interface, wherein the group size in said group interface is characterized by the displayed geometric figure size.
4. The method for visualizing group data as in claim 1 or 2, further comprising the step of displaying an interface of a group of data sets, wherein the data characteristics of the group of data sets comprise at least two data characteristics of user information, IP address, event type, event origin, event responder and event occurrence time, and the group data sets are displayed in an ordered manner after being grouped in the interface of the group of data sets.
5. The method of visualizing a cohort of data according to claim 2, further comprising the step of displaying an interface of the distribution of the characteristics of the data sets of the cohort:
selecting one of said groups and determining at least one data characteristic from the data set of said group,
counting a feature distribution of the determined at least one data feature in the group and cluster; and
and displaying a histogram of the feature distribution and a distribution contrast map of the histogram in the whole cluster histogram.
6. The method of visualizing group data as in claim 1, further comprising the step of displaying an interface of feature distributions of data sets of a plurality of groups:
determining a plurality of groups in a cluster consisting of a plurality of network users, and respectively representing the difference of the groups by different shapes, icons, labels and/or colors;
determining at least one data feature from the plurality of groups of data sets;
analyzing relative information entropy between every two network users in each group based on the at least one data characteristic to serve as a measure of the similarity between every two network users; and
outputting a display interface in which the network users are characterized by shapes, icons, and/or labels, the plurality of groups are characterized by different colors, and the degree of similarity between two network users in each group is characterized by the distance displayed.
7. The method of claim 1, further comprising the step of the first shape when selected to display a magnification comprising:
the first shape, when selected, is displayed magnified on one side of the first time axis; or
The first shape, when selected, is displayed in enlargement in the first time axis.
8. The group data visualization method of claim 1, wherein the event type comprises at least one of a network user's attention, likes, comments, gifts, login, logout, update status, registration, and modification information.
9. The method for visualizing group data as in claim 1, wherein said step of creating a first timeline and a second timeline is a step of creating a first timeline and a second timeline according to the same time interval and time granularity pair.
10. The method for visualizing group data as recited in claim 1, wherein said step of creating a first timeline and a second timeline is a step of creating a first timeline and a second timeline according to different time interval and time granularity pairs, wherein the time interval of the second timeline is the time granularity of the first timeline.
11. The cohort data visualization method according to claim 9 or 10, further comprising displaying dynamically and/or highlighted by the third shape, when the first shape is selected, the distribution of the types of events occurring within the time granularity of the first shape representation in the second time axis.
12. A computer device, comprising:
one or more processors; and
a presentation engine executing on the one or more processors, the presentation engine to perform the group data visualization method of any of claims 1-11.
13. A group data visualization system, comprising:
the acquisition module acquires a group of data sets through a network, wherein the data characteristics in the data sets at least comprise event types and time information associated with the event types;
the processing module is used for creating a first time axis and a second time axis according to the time information in the data set and encoding the data characteristics; and
the display module displays a first time axis and a second time axis and displays a first shape, a second shape, a third shape and a fourth shape in one interface through display equipment, wherein the first shape is used as a node of the first time axis to represent the type and the number of events of the group in each time granularity of the first time axis; the second shape characterizes a total number of each event type occurring within a time interval of the second timeline; the third shape characterizes a distribution of event types characterized in the second shape on the second timeline; the fourth shape characterizes a type and number of events that occur for the group within each time granularity of the second timeline.
14. The system of claim 13, wherein the group is determined by the acquiring module acquiring the operation logs of the plurality of network users and analyzing similarity of at least one set of data features in the operation logs.
15. The cohort data visualization system of claim 13, wherein the display module is further configured to display at least one cohort interface, the cohort size in the cohort interface being characterized by a displayed geometric size.
16. The system of claim 13, wherein the display module is further configured to display an interface of the group of data sets, the data characteristics of the group of data sets include at least two of user information, IP address, event type, event origin, event responder, and event occurrence time, and the group of data sets are displayed in the group of data sets in the interface in a sorted manner after being grouped.
17. The system of claim 13, wherein the display module is further configured to display an interface of a feature distribution of the dataset of the cluster, a histogram of the feature distribution, and a distribution contrast map corresponding to the histogram across the cluster histogram.
18. The system of claim 13, wherein the display module is further configured to display an interface that characterizes the network users by shapes, icons, and/or labels, characterizes the differences of the plurality of groups by different colors, and characterizes the degree of similarity between two network users in each group by a displayed distance.
19. The group data visualization system according to claim 13, further comprising a detection module, wherein the first shape displayed in the display module is displayed in a magnified manner on one side of the first time axis when the user is detected to select the first shape based on the detection module; or the first shape displayed in the display module is displayed in an enlarged scale in the first time axis.
20. The system of claim 13, wherein the first timeline and the second timeline created by the processing module have the same time intervals and time granularity.
21. The system of claim 13, wherein the first timeline and the second timeline created by the processing module are created according to different time interval and time granularity pairs, and wherein the time interval of the second timeline is the time granularity of the first timeline.
22. The cohort data visualization system according to claim 20 or 21, further comprising a detection module for detecting a distribution of event types occurring within a temporal granularity of the first shape representation over the third shape in the second time axis when the user selects the first shape based on the detection module.
23. The group data visualization system of claim 13, wherein the event type includes at least one of a network user's attention, likes, comments, gifts, login, logout, update status, registration, modify information.
24. A client connected to a server via a network, wherein the client executes the steps of the group data visualization method according to any one of claims 1 to 11 based on sending a request to log in to the server.
25. A server connected to a client via a network, wherein the server sends the process of the group data visualization method according to any one of claims 1 to 11 to the client and displays the execution result through the client based on the operation of the client to execute the request.
26. A browser, connected to a server through a network, wherein the browser executes the steps of the group data visualization method according to any one of claims 1 to 11 based on sending a request to log in to the server.
27. A computer-readable storage medium storing a data visualization computer program, wherein the data visualization computer program when executed implements the steps of the group data visualization method of any of claims 1-11.
CN201810022368.6A 2018-01-10 2018-01-10 Group event data visualization method and system Active CN108170830B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810022368.6A CN108170830B (en) 2018-01-10 2018-01-10 Group event data visualization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810022368.6A CN108170830B (en) 2018-01-10 2018-01-10 Group event data visualization method and system

Publications (2)

Publication Number Publication Date
CN108170830A CN108170830A (en) 2018-06-15
CN108170830B true CN108170830B (en) 2020-07-31

Family

ID=62517777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810022368.6A Active CN108170830B (en) 2018-01-10 2018-01-10 Group event data visualization method and system

Country Status (1)

Country Link
CN (1) CN108170830B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033194B (en) 2018-06-28 2019-11-08 北京百度网讯科技有限公司 Affair displaying method and device
CN109191350A (en) * 2018-07-06 2019-01-11 贵州黔商科技有限公司 A kind of census management method based on big data family tree
CN108876479B (en) * 2018-07-18 2020-06-16 口口相传(北京)网络技术有限公司 Channel attribution method and device for object entity
CN114077711A (en) * 2020-08-12 2022-02-22 杨嶷 Information connection method and device based on map and entity information unit
CN113538058B (en) * 2021-07-23 2023-04-07 四川大学 Multi-level user portrait visualization method oriented to online shopping platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6567814B1 (en) * 1998-08-26 2003-05-20 Thinkanalytics Ltd Method and apparatus for knowledge discovery in databases
CN101867489A (en) * 2010-06-11 2010-10-20 北京邮电大学 Method and system for realizing real-time displayed social network visualization
CN102629271A (en) * 2012-03-13 2012-08-08 北京工商大学 Complex data visualization method and equipment based on stacked tree graph
CN104536956A (en) * 2014-07-23 2015-04-22 中国科学院计算技术研究所 A Microblog platform based event visualization method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050043961A1 (en) * 2002-09-30 2005-02-24 Michael Torres System and method for identification, detection and investigation of maleficent acts
US20080215576A1 (en) * 2008-03-05 2008-09-04 Quantum Intelligence, Inc. Fusion and visualization for multiple anomaly detection systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6567814B1 (en) * 1998-08-26 2003-05-20 Thinkanalytics Ltd Method and apparatus for knowledge discovery in databases
CN101867489A (en) * 2010-06-11 2010-10-20 北京邮电大学 Method and system for realizing real-time displayed social network visualization
CN102629271A (en) * 2012-03-13 2012-08-08 北京工商大学 Complex data visualization method and equipment based on stacked tree graph
CN104536956A (en) * 2014-07-23 2015-04-22 中国科学院计算技术研究所 A Microblog platform based event visualization method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"可视化数据挖掘在信贷欺诈检测中的应用";童新安 等;《宜春学院学报》;20100425;第32卷(第4期);论文第69-71页 *

Also Published As

Publication number Publication date
CN108170830A (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN108170830B (en) Group event data visualization method and system
Sapiezynski et al. Quantifying the impact of user attentionon fair group representation in ranked lists
US11928733B2 (en) Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data
CN108268624B (en) User data visualization method and system
US20200192894A1 (en) System and method for using data incident based modeling and prediction
CN110268409B (en) Novel nonparametric statistical behavior recognition ecosystem for power fraud detection
Lin et al. Voices of victory: A computational focus group framework for tracking opinion shift in real time
CN111614690B (en) Abnormal behavior detection method and device
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
CN108280644B (en) Group membership data visualization method and system
CN103793484A (en) Fraudulent conduct identification system based on machine learning in classified information website
Duval Explainable artificial intelligence (XAI)
US20150205693A1 (en) Visualization of behavior clustering of computer applications
CN110738527A (en) feature importance ranking method, device, equipment and storage medium
US20230004979A1 (en) Abnormal behavior detection method and apparatus, electronic device, and computer-readable storage medium
CN113095712A (en) Enterprise credit granting score obtaining method and device and computer equipment
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
CN109478219A (en) For showing the user interface of network analysis
CN116737495A (en) Method, device, computer equipment and storage medium for determining running state
CN110489732A (en) Method for processing report data and equipment
CA3183463A1 (en) Systems and methods for generating predictive risk outcomes
CN115033891A (en) Vulnerability assessment method and device, storage medium and electronic equipment
EP3493082A1 (en) A method of exploring databases of time-stamped data in order to discover dependencies between the data and predict future trends
CN111026981B (en) Visual display method, device and equipment for hot topics
JP7412821B2 (en) information processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20181018

Address after: 100084 10 floor 1009-1, 3 building, 1 Zhongguancun East Road, Haidian District, Beijing.

Applicant after: Hua Ching Qing Chiao information technology (Beijing) Co., Ltd.

Address before: 100084 Tsinghua Yuan, Beijing, Haidian District

Applicant before: Tsinghua University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant