CN108170830A - Group event data visualization method and system - Google Patents

Group event data visualization method and system Download PDF

Info

Publication number
CN108170830A
CN108170830A CN201810022368.6A CN201810022368A CN108170830A CN 108170830 A CN108170830 A CN 108170830A CN 201810022368 A CN201810022368 A CN 201810022368A CN 108170830 A CN108170830 A CN 108170830A
Authority
CN
China
Prior art keywords
group
time
shape
data
event type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810022368.6A
Other languages
Chinese (zh)
Other versions
CN108170830B (en
Inventor
徐葳
孙娇
姚期智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huakong Tsingjiao Information Technology Beijing Co Ltd
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810022368.6A priority Critical patent/CN108170830B/en
Publication of CN108170830A publication Critical patent/CN108170830A/en
Application granted granted Critical
Publication of CN108170830B publication Critical patent/CN108170830B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Computer Hardware Design (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a kind of group event data visualization method and system, applied in a fraud detecting system, the described method comprises the following steps:The data set of a group is obtained, the data characteristics in data set includes at least event type and temporal information associated with event type;Create first time axis and the second time shaft;Based on the coding to data characteristics, the first time axis using first shape as node is shown, to characterize event type and the quantity that group occurs in each time granularity of first time axis;The second shape is shown, to characterize the total quantity of each event type occurred in the time interval of the second time shaft;It shows the second time shaft, the event type characterized in the second shape and each time granularity of the event type in the second time shaft is associated, and pass through distribution of each event type of third shape characterization on the second time shaft;And the 4th shape of display, to characterize event type and the quantity that group occurs in each time granularity of the second time shaft.

Description

Group event data visualization method and system
Technical field
This application involves computer processing technology fields, more particularly to a kind of group event data visualization method and are System.
Background technology
Online fraud has been current internet dark aspect known to everybody, it all can worldwide be caused every year Immeasurable loss.2015, million ranks that net crime complaint center has been connected in worldwide about taking advantage of The complaint of swindleness problem, and cheat on the net it is annual also can worldwide cause tens economic loss, fraudulent user is usual For can from help promote some specific commodity or spread junk information in receive remuneration.In internet finance, fraud is used The credit card that family is applied for loan, stolen with them using false identity buys commodity, even carries out the unlawful activities such as money laundering.Cause This, in internet business scenario, finding suitable anti-fraud algorithm becomes more crucial, this demand is also growing day by day.
Although nowadays there are many methods to identify the fraud on internet, by constructed fraud detecting system Limitation, the credible of the data of corresponding fraud suspect filtered out needs follow-up a large amount of manpower verification, for example, platform Supervisor need to investigate verification one by one.This so that the revision of such as algorithm parameter, data characteristics are excellent in fraud detecting system Design, algorithm model selection of first grade etc. not only need the Software for Design of algorithm expert, with greater need for the participation of domain expert.Cause This, fraud Detection accuracy can be efficiently modified by improving the transparency of fraud recognizer, how to realize the visual of data Turn to this field urgent problem to be solved.
Invention content
In view of the foregoing deficiencies of prior art, the application is designed to provide a kind of group event data visualization Method and system, for solving the problems, such as that fraud recognizer is visual in the prior art.
In order to achieve the above objects and other related objects, the first aspect of the application provides a kind of group data visualization side Method applied in a fraud detecting system, includes the following steps:The data set of a group is obtained, in the data set Data characteristics include at least event type and temporal information associated with the event type;Create first time axis and the Two time shafts;Based on the coding to the data characteristics, the first time axis using first shape as node is shown, to characterize State event type and the quantity that group occurs in each time granularity of the first time axis;The second shape is shown, with table Levy the total quantity of each event type occurred in the time interval of second time shaft;The second time shaft is shown, by described in The event type characterized in second shape is associated, and lead to each time granularity of the event type in second time shaft Cross distribution of each event type of third shape characterization on second time shaft;And the 4th shape of display, to characterize State event type and the quantity that group occurs in each time granularity of second time shaft.
The application second aspect provides a kind of computer equipment, including:One or more processors;And one Or the presentation engine performed on multiple processors, it is described that engine is presented for performing the group number as described in the application first aspect According to method for visualizing.
The application third aspect provides a kind of group data visualization system, including:Acquisition module obtains one by network The data set of a group, the data characteristics in the data set include at least event type and associated with the event type Temporal information;Processing module creates first time axis and the second time shaft and the coding to the data characteristics;It is and aobvious Show module, first, second time shaft and display first, second, third and the are shown by showing equipment in an interface Four shapes, wherein, the first shape is as the node of the first time axis to characterize the group in the first time The event type and quantity occurred in each time granularity of axis;The time zone of second time shaft described in second shape characterization The total quantity of each event type of interior generation;The event type characterized in second shape described in the third shape characterization exists Distribution on second time shaft;Group is in each time granularity of second time shaft described in 4th shape characterization The event type and quantity of interior generation.
The application fourth aspect provides a kind of client, and by one server-side of network connection, the client is based on sending It asks to log in the step of server-side performs the group data method for visualizing described in the application first aspect
The 5th aspect of the application provides a kind of server, and by one client of network connection, the server is based on described The operation of client executing request, group data method for visualizing described in the application first aspect is sent to the client Process simultaneously shows implementing result by the client.
The 6th aspect of the application provides a kind of browser, and by one server-side of network connection, the browser is based on sending It asks to log in the step of server-side performs the group data method for visualizing described in the application first aspect.
The 7th aspect of the application provides a kind of computer readable storage medium, is stored with data visualization computer program, It is characterized in that, the data visualization computer program, which is performed, realizes that the group data described in the application first aspect can The step of depending on changing method.
As described above, the group data method for visualizing and system of the application pass through in fraud detection process really The data set of grouping group is based on the modes such as time shaft, type distribution, tabulation and is presented, and realizes and detects fraud The data characteristics of period Suo Fen group is shown with a variety of relationship interfaces, is conducive to domain expert and algorithm expert to cheating thing The detection algorithm of part detecting system is assessed and is revised.
Description of the drawings
Fig. 1 is shown as the group data method for visualizing flow chart of the application in one embodiment.
Fig. 2 is shown as the flow chart that the application obtains group data collection step in one embodiment.
Fig. 3 is shown as the interface for including multiple groups that the application is shown in one embodiment.
Fig. 4 is shown as the application visual display interface schematic diagram of group data in one embodiment.
Fig. 5 is shown as the application visual display interface schematic diagram of group data in another embodiment.
Fig. 6 a-6d show the application to show that the interface of several states is illustrated using the method for visualizing of the application respectively Figure.
Fig. 7 shows the list interface schematic diagram for the data set of a group that the application is shown in one embodiment.
Fig. 8 is shown as the flow chart at the application interface of the feature distribution of group data collection in one embodiment.
Fig. 9 is shown as the Nogata of the feature distribution of the registion time in the group that the application is shown in one embodiment The interface of figure and comparison diagram.
Figure 10 is shown as the application and shows multiple groups distribution step flow chart in the cluster in one embodiment.
Figure 11 shows that the application shows multiple groups distribution interface schematic diagram in the cluster in one embodiment.
Figure 12 is shown as the modular structure schematic diagram of the application provided computer equipment in one embodiment.
Figure 13 is shown as the modular structure of group data visualization system that the application is provided in one embodiment and shows It is intended to.
Specific embodiment
Presently filed embodiment is illustrated by particular specific embodiment below, those skilled in the art can be by this explanation Content disclosed by book understands other advantages and effect of the application easily.
In described below, refer to the attached drawing, attached drawing describes several embodiments of the application.It should be appreciated that it also can be used Other embodiment, and can be carried out in the case of without departing substantially from spirit and scope mechanical composition, structure, electrically with And operational change.Following detailed description should not be considered limiting, and the range of embodiments herein Only limited by the claims for the patent announced.Term used herein is merely to describe specific embodiment, and be not It is intended to limitation the application.
Furthermore as used in herein, singulative " one ", "one" and "the" are intended to also include plural number shape Formula, unless there is opposite instruction in context.It will be further understood that term "comprising", " comprising " show that there are the spies Sign, step, operation, element, component, project, type, and/or group, but it is not excluded for other one or more features, step, behaviour Presence, appearance or the addition of work, element, component, project, type, and/or group.Term "or" used herein and "and/or" quilt It is construed to inclusive or means any one or any combinations.Therefore, " A, B or C " or " A, B and/or C " mean " with Descend any one:A;B;C;A and B;A and C;B and C;A, B and C ".Only when element, function, step or the combination of operation are in certain sides When inherently mutually exclusive under formula, it just will appear the exception of this definition.
In fraud detection technique, domain expert provides the warp of data classification for the core technology that fraud identifies The demand with classification results accuracy is tested, but parameter of the algorithm framework and in algorithm in itself is not known to them.Field Expert is examined due to the mode for having no way of classifying to data during being detected when obtaining fraud using fraud detecting system When surveying result, domain expert is other than verifying testing result, the accuracy for obtained testing result of having no way of judging. In order to improve the accuracy of fraud detecting system, the application provides a kind of group number applied to fraud detecting system According to method for visualizing, obtained group categorized in fraud detecting system and its data set are shown in a manner of visual To algorithm expert and domain expert so that different users (such as domain expert or algorithm expert) by a variety of interactive means come Various frauds are explored, and can be by according to the exploration for needing progress different depth of oneself.
The group data method for visualizing is mainly performed by computer equipment.The computer equipment can be following Suitable computer equipment, such as handheld computer device, tablet computer equipment, notebook computer, desktop PC, Server etc..Computer equipment includes display, input unit, input/output (I/O) port, one or more processors, deposits Reservoir, non-volatile memory device, network interface and power supply etc..The various parts may include hardware element (such as core Piece and circuit), software element (such as tangible non-transitory computer-readable medium of store instruction) or hardware element and software The combination of element.In addition, it may be noted that various parts can be combined into less component or be separated into additional component.For example, Memory and non-volatile memory device can be included in single component.The computer equipment can be individually performed described visual Change method coordinates execution with other computer equipments.In some embodiments, computer equipment performs method for visualizing simultaneously Corresponding visualization interface is shown.For example, computer equipment includes processor, display, wherein, in the processor The presentation engine (or display engine) of upper execution, the engine that presents are used to perform the group data method for visualizing and lead to It crosses display to be shown, here, the presentation engine includes but not limited to parse to be used for based on what program language was developed The software and hardware of interface display, such as XML, HTML script, C language etc..In yet other embodiments, a calculating Machine equipment performs method for visualizing and another computer equipment is supplied to be shown corresponding visualization interface.For example, Request of the client based on user, which is operated, to be initiated to ask and log in the server-side to server-side, and server-side performs method for visualizing To form corresponding interface data, and the interface data is fed back into client, by the browser of client or answering for customization With program corresponding diagram is shown according to respective interface data.
The method for visualizing is mainly performed by fraud detecting system.The fraud detecting system may include Software and hardware in one or more computer equipments.In order to provide a user a fraud group in different time sections Behavior, so as to answer " what group done as a fraud group " and the algorithm expert that domain expert is proposed " whether same group of user has identical behavioural habits " proposed.The application provides a kind of visual from time axis Change method.Referring to Fig. 1, it is shown as the group data method for visualizing flow chart of the application in one embodiment.As shown in the figure, The group data method for visualizing includes the following steps:
In step s 11, the data set of a group is obtained.Data characteristics in the data set includes at least event class Type and temporal information associated with the event type.In certain embodiments, determine that the mode of a group is described below, Referring to Fig. 2, a kind of flow chart of one group data collection of acquisition that embodiment is provided of the application is shown as, as schemed institute Show, the step S11 further comprises:
Step S111 obtains the operation log that cluster is made of multiple network users;In various embodiments, the collection Group is the cluster of all-network user composition that can be got, the network user in the cluster from same website or The different website of person also or from different Internet channels, for example can be internet, one or more intranets, local Net (LAN), wide area network (WLAN), storage area network (SAN) etc. or its appropriately combined or mobile phone mobile communication Network etc..
Step S112 determines at least one data characteristics from the operation log of the multiple network user, and analyzes institute The similarity of at least one set of data characteristics in operation log is stated to determine the group;In the particular embodiment, for network Fraud will necessarily leave the characteristics of user is using data in a network, be collected in fraud detecting system from least one The operation log of multiple network users of a website, by analyzing the similar of at least one data characteristics in the operation log Degree is grouped the user for generating corresponding operating daily record, obtains the data set of group and group in operation log.
In certain embodiments, the data set positioned at a group includes but unlimited user information, IP address, event class Type, event initiate at least the two data characteristics in source, event response side and Time To Event.Wherein, the user information Such as phone number, mailbox, ID number, identification card number, gender, user equipment used by a user number, registion time.Wherein, Same user information can correspond at least one event type, and each event type corresponds to event and initiates source, event response side and thing Part time of origin.The affair character includes but not limited to:The concern that is carried out between the network user, thumb up, comment on, presenting (or Person is referred to as to give a present) etc. Social behaviors or the network user logged in, published, more new state, registration, the behaviour such as modification information Make at least one of behavior.For example, same user information can correspond to it is multiple thumb up event type, each thumb up event type pair Respective event is answered to initiate source, event response side and Time To Event.
Step S113 obtains the data set of the group.In some embodiments, the data set can be obtained from a storage You Ge groups and its database of data set, the database are for example configured in the storage server of a distal end or are configured In storage device in local computer equipment, then the data set of an acquired group can be grasped based on the input of user Work is extracted from database and is obtained.For example, the fraud detecting system obtains multiple groups using unsupervised detection algorithm Group, user select one of group by selection interface, then obtain the data set of relevant groups.
Specifically, the fraud detecting system first to data all in operation log same class data characteristics phase It is calculated like degree, wherein, the similarity available information entropy is weighed, for example, the fraud detecting system point Not Li Yong user information calculate the comentropy of IP usage amounts or maximum IP usage amount dimensions, utilize event type calculating operation type The comentropy of dimension calculates the comentropy of bad operation dimension using the comentropy of registion time dimension or operating time;By By above-mentioned calculating, unsupervised detection mode is recycled to be detected obtained each comentropy and divides to obtain multiple groups Group.Wherein, the unsupervised detection mode citing is included using the algorithm based on dense subgraph or the calculation based on vector space Method etc..Each group that method for visualizing provided herein is presented for reflect shared resource used in fraud, Customer relationship etc., the user using the fraud detecting system to be allowed more clearly to determine in the unsupervised detection algorithm Classification policy it is whether reasonable.Wherein, the shared resource includes but not limited to shared IP, mailbox etc., and customer relationship includes But it is not limited to:User's concern, interactive relation etc..
In one embodiment, the method for visualizing further includes the step of showing at least one group interface, the group Group size in class boundary face is characterized with the geometric figure size shown.Implement referring to Fig. 3, being shown as the application one The interface for including multiple groups shown in example, as shown in the figure, 11 groups are shown in interface, for characterizing those groups Geometric figure is circle, and 11 groups are all located in a maximum circle of dotted line, in the circle of dotted line, such as the void Line circle is used for characterizing a cluster being made of N number of network user, such as the group marked as 0 is, for example, normal group, one There is of different sizes 10 group marked as 1-10 in a smaller circle of dotted line, the number of members of circular size and group is into just Than, that is, the big group's person's of being expressed as quantity is more, the small group's person's of being expressed as negligible amounts, such as the group marked as 1-10 For abnormal group.In various embodiments, the geometric figure of the group can be arbitrary shape.The face of geometric figure Color can be randomly provided or related to the number of members of the quantity of group or group.For example, N kind colors are preset with, the fraud thing Part detecting system randomly corresponds to different colours on the geometric figure for characterizing each group.For another example, the fraud detection System is corresponding in turn to the geometric figure for characterizing each group according to the ascending sequence of number of members according to preset color sequences On.When display interface described in user's operation chooses a geometric figure, the fraud detecting system obtains a group The data set of group.
In a preferred embodiment, display group information can also be included at least one group interface of display Information bar, when user selects a group in the group interface, in the side at interface with the side of form or text box Formula shows the essential information of the group, and the essential information is, for example,:Group's coding, number of members, for determining the group The information such as the most preferred data characteristics of group, group attribute (such as normal group or abnormal group).
In step s 12, first time axis and the second time shaft are created.The first time axis and the second time shaft are roots It is created according to the temporal information in data set, for example time span is up to 10 in multiple temporal informations in the data set My god, then the maximum time section of first time axis or the second time shaft is 10 days.In one embodiment, according to the identical time Section and time granularity are to creating first time axis and the second time shaft;In another embodiment, according to different time intervals And time granularity is detailed later to creating first time axis and the second time shaft.
In step s 13, based on the coding to the data characteristics, the first time using first shape as node is shown Axis, to characterize event type and the quantity that the group occurs in each time granularity of the first time axis.Wherein, institute Fraud detecting system is stated to count the quantity of event type that data are concentrated according to the time granularity of first time axis, The event type counted is encoded into the figure of preset first shape, and makees encoded each first shape according to sequential Node for first time axis is presented on first time axis.Pass through the display of each node on first time axis, domain expert's energy Enough clear event types counted according to the time that obtains are in distribution or quantitative change procedure.Wherein, first shape Shape includes but not limited to:Pie shape or columnar shape.In some implementation examples, fraud detecting system can be by one The accounting situation of the number percent of each event type in time granularity is encoded into the figure of first shape and is shown in first On time shaft, wherein, the color in the accounting region of similary event type is identical.Implement referring to Fig. 4, being shown as the application one The visual display interface schematic diagram of group data in example, as shown in the figure, in the interface of display, T1, the first time axis In the lower zone of display interface, it is shown as from August 1 day to the August time interval of 10 days on the 10th, it, will using day as time granularity The number percent distributed code of the event type counted daily is shown in first time axis into pie figure and as node On T1, color in the pie figure be used to representing in event type, such as figure be designated as " Huang " color be expressed as concern event, " red " color is designated as in figure is expressed as present event, be designated as in figure " indigo plant " color be expressed as thumb up event, than in as shown first The August shown on time shaft T1 using pie figure as node this day on the 7th, in the event type of generation concern event accounting compared with More, present event accounting is less, and it is minimum to thumb up event accounting.
In step s 13, based on the coding to the data characteristics, the second shape is shown, to characterize second time The total quantity of each event type occurred in the time interval of axis.Wherein, when the fraud detecting system is according to second The time interval of countershaft sums up the quantity for the event type that data are concentrated, and added up each event type is encoded into pre- If the second shape figure, and be shown in the total quantity of each event type in a time interval of the second time shaft.Wherein, institute The second shape is stated to include but not limited to:Histogram, block diagram, line chart etc..According to the time interval for creating the second time shaft, The total quantity of shown various event types reflects the comparison feelings of each event type quantitatively in same time interval Condition.When the time interval of second time shaft represents one day or one week, user can be according to shown correspondence " red ", " Huang " The length of the columnar shape of the total quantity of " indigo plant " three kinds of event types determines comparison of three kinds of event types in total quantity Situation.In addition, shown column figure can also determine three kinds of event types in total quantity according to thickness, transparency etc. On comparison situation.Again referring to Fig. 3, as shown in the figure, the side (right side in diagram) for closing on the first time axis T1 is aobvious Show there are one being in horizontal histogram, " red ", " Huang " and " indigo plant " three column-shape strips, column are shown from top to bottom in the histogram The total quantity of event type is generated in the time interval of second time shaft described in the length representative of item, it can be from second shape Find out, the concern event for the column-shape strip expression that " Huang " color is designated as in event type is generated in the time interval of the second time shaft most It is more, be designated as the present event that the column-shape strip of " red " color represents and take second place, be designated as " indigo plant " color column-shape strip expression to thumb up event minimum.
The total quantity of each event type occurred in time interval by showing the second time shaft, domain expert can The change procedure of the event type counted according to the time quantitatively is clearly obtained from another visual angle.For clearer display Incidence relation between first time axis and the second time shaft in step s 13, based on the coding to the data characteristics, is shown Show the second time shaft, by the event type characterized in second shape and the event type second time shaft it is each when Between granularity be associated, and pass through distribution of each event type of third shape characterization on second time shaft.Wherein, will Second time shaft is rendered into the axis using corresponding time granularity as node, and each adjacent node is will be distributed over using third shape Event type is associated with the second shape so that user clearly obtains each time granularity of the second shape and the second time shaft Between incidence relation.Wherein, third shape can be linear, and the colors of the lines can be according to corresponding to thing in the second shape Depending on the color of part type, in order to which user is allowed clearly to differentiate unified event type.
It is multiple referring to Fig. 3, as shown in the figure, the second shape and the second time shaft are associated by third shape, wherein, Third shape is by taking camber line as an example and the temporal information based on event type each in data set spreads to each time of the second time shaft On the node of granularity.For example the present event and that the column-shape strip of " red " color represents is represented with dotted lines (the first dotted line) in figure The association of corresponding timing node (time granularity) on two time shafts represents the concern thing of the column-shape strip expression of " Huang " color with continuous lines The association of corresponding timing node (time granularity) on part and the second time shaft, the line table formed with Points And lines section (second of dotted line) Show the association for thumbing up corresponding timing node (time granularity) on event and the second time shaft that the column-shape strip of " indigo plant " color represents.Not In same embodiment, the third shape describes to be produced in corresponding time granularity interval using line weight or transparency The quantity of raw event type, thus convenient for presentation event occur the high frequency period or rule.
In order to more intuitively show the event type occurred in each time granularity interval on the second time shaft and quantity, In step S13, the 4th shape is shown, to characterize the thing that the group occurs in each time granularity of second time shaft Part type and quantity.Wherein, the thing that the fraud detecting system concentrates data according to the time interval of the second time shaft The quantity of part type sums up or distribution statistics, and added up event type or distribution situation are encoded into preset 4th shape The figure of shape, and encoded each 4th shape is presented on the second time shaft as the node of the second time shaft according to sequential On.Wherein, according to the time granularity of the second time shaft created under the guide of third shape, corresponding 4th shape is shown Shape.By the display of each node on the second time shaft, user can clearly obtain the thing counted according to the time from another visual angle The change procedure of part type quantitatively.Wherein, the 4th shape includes but not limited to:Pie shape or columnar shape, and It is selected differently from the shape of first shape.In some implementation examples, fraud detecting system can be by the second time shaft The quantity of each event type in time granularity add up and be separately encoded into the 4th shape figure and be shown in the second time shaft On, wherein, similary event type adds up and uses the color identical with third shape and the second shape.
Using time shaft as one of in a manner of group data is presented, be because of either domain expert or algorithm expert, Understand that concentration sexual behaviour of the user within a section time is very crucial.It for this purpose, will be at the first time by performing step S13 The combination of axis and the second time shaft describes the behavior of this centrality.
Referring to Fig. 3, as shown in the figure, each cake chart in first time axis T1 presents each time granularity Ratio shared by (as daily) different event type (as paid close attention to a user or having sent a present on the net to some user) Example.Each event type is encoded to different colours, by the quantity of each event type in the unit interval granularity of first time axis T1 The area accounting in each region in pie chart is encoded to form a pie chart, by each event class in the time interval of the second time shaft T2 The quantity of type is encoded to the length of column figure to form the block diagram (i.e. the second shape) of corresponding each event type, during by second Each event type quantity is encoded to the length of block diagram to form individual block diagram (i.e. in the unit interval granularity of countershaft T2 Four shapes);When user selects a pie chart on first time axis T1, from the second shape of corresponding each event type project with Event type is the camber line (i.e. third shape) of color, and corresponds to each 4th shape that time granularity is corresponded on the second time shaft T2 On shape, the time shaft relationship that a group data thus is concentrated each event type is clearly presented to the user.
In one embodiment, during according to identical time interval and time granularity to creating first time axis and second Countershaft.For example, the first time axis and the second time shaft of the pre-loaded time granularity all same of fraud detecting system, So that the fraud detecting system corresponds to each event type respectively according to the temporal information in data set and time granularity On time shaft, to obtain at least one time interval of respective time shaft.For another example, the fraud detecting system is according to data The sequence of the temporal information of concentration determines the time interval of preset first time axis and the second time shaft, and according to number Each event type is corresponded on each time shaft according to the temporal information and time granularity of concentration.Please refer to that Fig. 3 shows comprising the The interface of one time shaft T1 and the second time shaft T2.Wherein, T1 and T2 time shafts are using day as time granularity, with 10 days for when Between section, the fraud detecting system can be by performing aforementioned each step according to the temporal information in data set described The data characteristics that display data is concentrated on first time axis T1 and the second time shaft T2.Such as with shown in Fig. 3, the second time shaft For T2 using day as time granularity, the summation cloth of each event type counted daily can be encoded into column by fraud detecting system Shape figure is simultaneously shown in as node on the second time shaft T2.
In another implementation example, according to different time interval and time granularity to creating first time axis and when second Countershaft.Wherein, the time interval of second time shaft is the time granularity of the first time axis.For example, preset Corresponding pass between different and default two time shafts of the time granularity of one time shaft and the second time shaft between time granularity System, the fraud detecting system correspond to each event type on each time shaft according to the temporal information in data set.Please Refering to Fig. 5, it is shown as including the interface of first time axis T1 and the second time shaft T2.Wherein, T1 time shafts with 10 days for when Between section, using day as time granularity, T2 time shafts are using day as time interval, using hour as time granularity;The fraud inspection Examining system can be by performing subsequent step according to the temporal information in data set in the first time axis T1 and the second time shaft The data characteristics that the upper display datas of T2 are concentrated.Such as with interface C 2 shown in fig. 5, the second time shaft T2 is using hour as time grain The summation of the event type counted per hour can be encoded into column figure and be shown as node by degree, fraud detecting system Show on the second time shaft T2.
The area that the quantity of each event type in the time granularity of first time axis T1 is encoded to each region in pie chart accounts for Than to form a pie chart;When user selects a pie chart on first time axis T1, by the time zone of the second time shaft T2 The quantity of interior each event type (being equivalent to each event type corresponding to selected pie chart) is encoded to the length of column figure Degree is to form the block diagram (i.e. the second shape) of corresponding each event type, by each thing in the unit interval granularity of the second time shaft T2 Part number of types is encoded to the length of block diagram to form individual block diagram (i.e. the 4th shape) and certainly corresponding each event class Second shape of type is projected using event type as the camber line (i.e. third shape) of color, and is corresponded to corresponding on the second time shaft T2 The time shaft relationship that one group data is concentrated each event type by each 4th of time granularity in shape, thus is clearly presented To user.
Interface C 1 as shown in Figure 4, each cake chart in first time axis T1 all show each time granularity (as every My god) ratio shared by different event type (as paid close attention to a user or having given a present on the net to some user).It will Each event type is encoded to different colours, than the color in pie figure described in as shown for representing event type, such as " Huang " color is designated as in figure is expressed as concern event, and " red " color is designated as in figure is expressed as present event, and " indigo plant " color is designated as in figure Be expressed as thumb up event, than August this day on the 7th shown on middle first time axis T1 as shown using pie figure as node, Concern event accounting is more in the event type of generation, and present event accounting is less, and it is minimum to thumb up event accounting.When user selects Node August 7 days on first time axis T1 this when, then August 24 hours of this day on the 7th are then shown on the second time shaft T2 Event type and the corresponding quantity of each event type interior, that each hour occurs.
It should be strongly noted that the time interval of the first time axis and the second time shaft in the various embodiments described above is timely Between granularity and be not merely limited to illustrated situation, in various embodiments, user can be according to practical situation setting first The time interval and time granularity of time shaft and the second time shaft, for example be week, the moon, season even Nian Deng chronomeres.
The group that user classifies to fraud detecting system using the presentation process and the statistical conditions shown It is detected and domain expert is allowed to find or correct the deficiency in detection algorithm using the visual interface.In addition, in order to more The incidence relation of two time shafts is clearly displayed, the method for visualizing further includes the first shape when selected, passes through Third shape dynamic, highlighted or dynamic and highlighted ground show the thing of the interior generation of time granularity of the first shape characterization Part type is in the distribution of second time shaft.For example, in interface C 1 shown in Fig. 4, when user chooses first time axis T1 On a pie chart when, each third shape flicker number that the block diagram of chosen pie chart corresponding on the second time shaft T2 is connected The flicker of either longer time second also or is highlighted, when user chooses another pie chart on first time axis T1, before this Flicker and highlighted third shape restore original shape and color, and the column of chosen pie chart corresponding on the second time shaft T2 Each third shape that shape figure is connected flickers the several seconds and is highlighted.
In certain embodiments, when user chooses the first shape on first time axis, the method for visualizing is also The step of first shape shows amplification when selected can be performed, so that user more clearly checks first shape institute table The comparison situation of the event type quantity of sign.In a kind of specific example, the first shape is when selected described first The side amplification display of time shaft.For example, the first shape chosen amplifies display on the upside of first time axis, in such as Fig. 6 a institutes The interface C 3 shown.In another specific example, the first shape amplifies display when selected in the first time axis. For example, the first shape chosen is displayed magnified in the same center location of first time axis, in interface as shown in Figure 6 b C4。
In another specific example, when user chooses a pie chart on first time axis T1, the first shape Amplify in the side of the first time axis TI when selected while show, it is corresponding on the second time shaft T2 to be chosen Each third shape that the block diagram of pie chart is connected flickers the several seconds or the flicker of longer time is shown, interface really as fig. 6 c C5, when user, which chooses, characterizes the August pie chart of 7 days on first time axis T1, the characterization August pie chart of 7 days is when selected Amplify in the side of the first time axis TI and show, moreover, the characterization August column of 7 days corresponding on the second time shaft T2 The flicker of figure connected each line ficker several seconds or longer time are shown.Interface C 6 for example shown in Fig. 6 d again, when user selects When the August pie chart of 7 days is characterized on middle first time axis T1, the characterization August pie chart of 10 days is when selected described first Time shaft TI side amplification display, moreover, it is corresponding on the second time shaft T2 characterize the August block diagram of 10 days be connected it is each Lines are highlighted.
In some embodiments, user is not only concerned about the variation that group data concentrates each event type to be presented according to time shaft Situation, it is whether reasonable more concerned with the group distributed, this need user can check the detailed data feature in each group and The preferred order of each data characteristics built for classifying group.The method for visualizing may include showing the number of a group According to collection interface the step of.Shown data set is shown with list mode, thus displays for a user number in same group According to the details of feature.To improve the group data collection classification accuracy, shown list can foundation in the interface Classification priority shows the data characteristics list in a group by column based on when fraud detecting system is classified.Example Such as, referring to Fig. 7, the list interface schematic diagram of the data set of a group that display the application is shown in one embodiment. In the list interface schematic diagram, the data set of shown group be according to data characteristics similitude for priority by Obtained by high to Low sequence sequence.When the data characteristics similitude in the first priority is identical, according to the second priority Data characteristics is ranked up, and in the embodiment shown in fig. 7, the sequence of the priority from high to low is:IP address, event hair Originate from (source), event response side (target), event type (event_type) and Time To Event (timestamp).In the present embodiment, the new line of table (gauge outfit) is encoded with the importance of different lines, if one The value of feature is more concentrated, then this feature is more important.In the embodiment provided in the application, the fraud inspection Examining system is to represent this characteristic by calculating the comentropy of each feature.If comentropy is lower, then means consistent Property is higher.Then feature is ranked up by the fraud detecting system according to the incremental sequence of comentropy, most low at last The list head front of comentropy come prompting family note that certain, can also be according to will show under different performances List head in table carries out color rendering, for example most the color rendering of the list head of low comentropy prompts family to be most deep at last The data characteristics that is characterized of the attention row it is mostly important, and so on to carry out other data that the color rendering row are characterized special Sign, and then obtain data set list interface shown in figure.The list interface can be undertaken on the step of showing multiple group interfaces Or step S13 and then the selection operation of the list interface is selected based on user and is shown.
In certain embodiments, whether the data set of the group acquired for further characterization can reflect fraud Characteristic, it is also necessary to be shown from other dimensions.For example, network operation data and group data by comparing normal users Collect the accuracy to further confirm that detected fraud.For this purpose, the method for visualizing further includes:Show the group Data set feature distribution interface the step of.Wherein, the feature distribution interface can be shown with each data type in entirety Distribution in network, the overall network are opposite, for example form a cluster by multiple network users, then can pass through The distribution of some data characteristics in the interface display cluster in some group, referring to Fig. 2, maximum empty in such as Fig. 2 Line circle represents one and forms cluster by multiple network users, and cluster Zhong You11Ge groups are the group that number is 0-10 respectively, Therefrom a group is selected to be shown into row information.
In some embodiments, the data type that feature distribution interface can be shown is, for example,:It ties up at average operating time interval The comentropy (average operation interval entropy) of degree, the comentropy (IP of IP address usage amount dimension Used amount entropy), the comentropy (sex entropy) of gender dimension, the comentropy (email of Email dimension Entropy), the comentropy (reg time entropy) of registion time dimension, the comentropy of number of operations dimension (operation times entropy), the comentropy (device amount entropy) of number of devices dimension operate class The comentropy (operation type entropy) of type dimension, the maximum amount of comentropy (max that is used by others using IP IP used be used amount entropy) etc..In the embodiment shown in fig. 7, with the information of registion time dimension For entropy to be shown for data characteristics, i.e. Fig. 7 is shown as the comentropy of registion time in a group (registration period) dimension Feature distribution in network cluster.In order to which effective ratio is to the network operation data of acquired group data collection and normal users Feature distribution difference, referring to Fig. 8, its flow for being shown as showing the interface of the feature distribution of the data set of the group Figure, as shown in the figure, including the following steps:
In step S211, a group is selected, and at least one data are determined from the data set of the group Feature.In one embodiment, group marked as 2 such as in selection Fig. 3, and from the data in the group marked as 2 The data characteristics for determining that one is user information is concentrated, for example the user information is registion time.
In step S212, determining at least one data characteristics feature in the group and cluster point is counted Cloth.In the present embodiment, the statistics feature distribution and statistics institute for the data characteristics of registion time in the group State feature distribution of the data characteristics for registion time in the entire cluster.
In step S213, show the histogram of the feature distribution and correspond to the histogram in entire cluster histogram In profiles versus figure.In the present embodiment, based on the coding to the data characteristics, the display data for registion time Feature in the group histogram of feature distribution and the display data characteristics for registion time in the entire collection The histogram of feature distribution in group.Referring to Fig. 9, it is shown as registion time of the application in one embodiment in a group The histogram of feature distribution and the interface of comparison diagram, as shown in the figure, in the interface D, figure (a) is shown as selected marked as 2 Group in registion time feature distribution thumbnail, the amplification of the corresponding thumbnail, then the amplification for lower side in the D of interface Scheme (d), it can be seen from the enlarged drawing in the group, from August 1 day to August one middle of the month of 31 days, the group member The time for carrying out registration operation concentrates on August 5, August 6 days, August 11 days, August 12 days and August this 5 days on the 16th, and in institute It states figure (c) in the D of interface and is characterized as the Nogata that registered user in the cluster carries out the Annual distribution of registration operation in August part Figure, from the figure (c) as can be seen that registered user has certain rule in the in one's duty registration distribution of August in the cluster, on boundary The data characteristics that figure (b) is characterized as overlaping in figure (d) and figure (c) being shown as registion time in the D of face is described whole Difference in a cluster and in the group of selection.In order to allow users to know the difference between different characteristic and connection It is to be presented this block diagram in the form of three layers in the embodiment that the application provides, user, which passes through, clicks one of contracting After sketch map, the page will be scrolled into schemes by normalized profiles versus.Certainly, in specific application, the data characteristics Thumbnail there may also be multiple, each represent different data characteristicses.
In some embodiments, it can also distinguish or emphasize some data characteristics by carrying out color rendering to histogram Feature distribution or Dynamic Announce (such as the mode flickered) are to distinguish or emphasize certain number in the group and entire cluster According to feature in the group and entire cluster feature distribution.
In some embodiments, it is described in order to further analyze the difference between multiple groups in a network cluster Group data method for visualizing further includes the step of interface of the feature distribution for the data set for showing multiple groups, please refers to Fig.1 0 And Figure 11, Figure 10 are shown as the application and show the step flow chart that multiple groups are distributed in the cluster in one embodiment, figure 11 are shown as the application shows multiple groups distribution interface E in the cluster in one embodiment, as shown in the figure, the step packet It includes:
In step S311, multiple groups are determined in the cluster be made of multiple network users, respectively with different shape, figure Mark, label and/or the difference of the multiple group of characterization;In one embodiment, for example, selection Fig. 3 in label 0,1 and 2 3 groups, wherein, the group marked as 0 shows that the group marked as 1 is shown with " red " color table, marked as 2 with " green " color table Group is shown with " indigo plant " color table.
In step S312, at least one data characteristics is determined from the data set of the multiple group;In the present embodiment In, a data characteristics, such as IP address are determined from the data set of this 3 groups.
In step S313, based between each two network user at least one data characteristics analysis respectively group Relative Entropy as the similarity degree between measuring each two network user;In the present embodiment, based on the IP Relative Entropy (the letter of IP usage amount dimensions in 3 groups of adress analysis label 0,1 and 2 between each two network user Cease entropy, IP used amount entropy) as the similarity degree measured between each two network user.For example, it adopts It is used as with the method t-SNE of Data Dimensionality Reduction (t- distribution neighborhoods embedded mobile GIS) and with the relative entropy between two users and measures this The index of a little network user's distances.
In step S314, display interface is exported, in the interface, with shape, icon, and/or tag characterization network User characterizes the difference of the multiple group with different colours, and two network users in each group are characterized with the distance of display Between similarity degree.In the present embodiment, in interface E as shown in figure 11, the network user is characterized with dot, " green " color table shows Group marked as 0 shows the group marked as 1 with " red " color table, shows the group marked as 2 with indigo plant " indigo plant " color table, wherein, it uses " indigo plant " color table shows that the user distance in the group marked as 2 is shorter, the group form tufted distribution, with " red " color table show marked as User distance in 1 group is also shorter, which forms tufted distribution, and the normal users of random sampling are shown with " green " color table Farther out, distribution more disperses for the distance between distribution, normal users.Thereby it is believed that a group is if dense one Cluster, be considered as a fraud group possibility it is bigger.Than the group that in embodiment as shown in figure 11, which shows Group is in the distribution that more disperses, then it represents that for should " green " colo(u)r group group be normal group, the user of " green " point expression therein is also Normal users.Opposite, the group (group i.e. marked as 1) that is shown with " red " color table and the group shown with " indigo plant " color table are (i.e. Group marked as 2) be distributed in into tufted, then it represents that for should " red " color and " indigo plant " colo(u)r group group be exception group, wherein, with " red " The user that point and " indigo plant " point represent is abnormal user.In one embodiment, using user's interactive of the visualization system Ground checks the specifying information of user and feature value in each group by mouse suspension.
In other examples, in the interface of output, for example, shape, icon, and/or tag characterization can also be used The network user, such as shape are the geometric figures such as triangle, rectangle, for example icon is smiling face or face of crying, human skeleton head portrait, Qiang Daotou As etc. icons, such as label word or with symbol for clearly distinguishing etc..
The group data method for visualizing of the application passes through the data set that group will be determined in fraud detection process It is presented based on modes such as time shaft, type distribution, tabulations, realizes Suo Fen groups during fraud is detected Data characteristics is shown with a variety of relationship interfaces, is conducive to the inspection of domain expert and algorithm expert to fraud detecting system Method of determining and calculating is assessed and is revised.
The application also provides a kind of computer equipment, and the computer equipment can be following suitable computer equipment, Such as handheld computer device, tablet computer equipment, notebook computer, desktop PC, server etc..Computer is set It is standby to include display, input unit, input/output (I/O) port, one or more processors, memory, non-volatile memories Equipment, network interface and power supply etc..The various parts may include hardware element (such as chip and circuit), software member The combination of part (such as tangible non-transitory computer-readable medium of store instruction) or hardware element and software element.In addition, It may be noted that various parts can be combined into less component or be separated into additional component.For example, memory and non-volatile Storage device can be included in single component.The computer equipment can be individually performed the method for visualizing or and other Computer equipment cooperation performs.
2 are please referred to Fig.1, is shown as the configuration diagram of the application computer equipment in one embodiment, as shown in the figure, In present embodiment, the computer equipment 1 includes one or more processors and the presentation performed on the processor Engine, to perform above-mentioned method for visualizing and be shown corresponding visualization interface.For example, computer equipment includes place Device, display and the presentation engine performed on the processor are managed, wherein, the presentation engine performed on the processor (or display engine), the presentation engine are used to perform the group data method for visualizing described in above-described embodiment and pass through aobvious Show that device is shown, the description for performing the implementation process of the group data method for visualizing refers to retouching for Fig. 1 to Figure 11 It states.Under specific implementation state, it is described presentation engine be, for example, be stored on the memory of local computer device or In remote storage server, the presentation engine includes but not limited to parse to show for interface based on what program language was developed The software and hardware shown, such as XML, HTML script, C language etc..In yet other embodiments, a computer equipment It performs method for visualizing and another computer equipment is supplied to be shown corresponding visualization interface.For example, client Request based on user, which is operated, to be initiated to ask and log in the server-side to server-side, and server-side performs method for visualizing to be formed Corresponding interface data, and the interface data is fed back into client, by the browser of client or the application program of customization Corresponding diagram is shown according to respective interface data.
The application also provides a kind of client, and the client passes through one server-side of network connection, in the present embodiment, institute It is, for example, web client to state client, and the client is, for example, web services end, and the web client is based on sending web industry Business request performs the group data method for visualizing described in above-described embodiment and passes through display to log in the web services end It is shown, the description for performing the implementation process of the group data method for visualizing refers to the description for being directed to Fig. 1 to Figure 11.
The application also provides a kind of server, passes through one client of network connection, in the present embodiment, the client example It is such as web client, the client is, for example, web services end, and the web server performs request based on web client Operation sends the group data method for visualizing performed described in above-described embodiment to the client and passes through display and give It has been shown that, the description for performing the implementation process of the group data method for visualizing refer to the description for being directed to Fig. 1 to Figure 11.
The application also provides a kind of browser, by one server-side of network connection, the browser be based on sending request with It logs in the server-side and performs the group data method for visualizing described in above-described embodiment and pass through display and shown, held The description of the implementation process of the row group data method for visualizing refers to the description for Fig. 1 to Figure 11.In the present embodiment, The browser is, for example, web browser, including but not limited to QQ browsers, Internet Explorer browsers, Firefox browser, Safari browsers, Opera browsers, Google Chrome browsers, baidu browser, search dog are clear Look at device, cheetah browser, 360 browsers, UC browsers, proud trip browser, Window on the World browser etc..
The application also provides a kind of group data visualization system, the group data visualization system may include one or Software and hardware in multiple computer equipments.In order to provide a user behavior of the fraud group in different time sections, It is proposed so as to answer " what group done as a fraud group " that domain expert proposed and algorithm expert " whether same group of user has identical behavioural habits ".The application provides a kind of visual group from time axis Data visualisation system.3 are please referred to Fig.1, is shown as the modular structure of group data visualization system provided herein Schematic diagram.As shown in the figure, the group data visualization system 3 includes acquisition module 31, processing module 32 and display module 33.
Wherein, the acquisition module 31 is used to obtain the data set of a group.Data characteristics in the data set is extremely Include event type and temporal information associated with the event type less.
In certain embodiments, the acquisition module 31 obtains the operation log that cluster is made of multiple network users; In different embodiments, the cluster is the cluster of all-network user composition that can be got, in the cluster The network user from same website either different website also or from different Internet channels, such as can be internet, One or more intranets, LAN (LAN), wide area network (WLAN), storage area network (SAN) etc. or its is appropriately combined, also may be used To be mobile communications network of mobile phone etc..
Acquired operation log is transferred to processing module 32 by the acquisition module 31, and by processing module 32 from described more At least one data characteristics is determined in the operation log of a network user, and it is special to analyze at least one set of data in the operation log The similarity of sign is with the determining group.In the particular embodiment, it will necessarily be left in a network for network fraud behavior User uses the characteristics of data, and the behaviour of multiple network users from least one website is collected in group data visualization system Make daily record, processing module 32 is by analyzing the similarity of at least one data characteristics in the operation log, to generating corresponding behaviour The user for making daily record is grouped, and obtains the data set of group and group in operation log.
In certain embodiments, the data set positioned at a group includes but unlimited user information, IP address, event class Type, event initiate at least the two data characteristics in source, event response side and Time To Event.Wherein, the user information Such as phone number, mailbox, ID number, identification card number, gender, user equipment used by a user number, registion time characterization.Its In, same user information can correspond at least one event type, each event type correspond to event initiate source, event response side and Time To Event.The affair character includes but not limited to:The concern that is carried out between the network user is thumbed up, is commented on, presenting Social behaviors such as (being either referred to as to give a present) or the network user are logged in, are published, more new state, registration, modification information etc. At least one of operation behavior.For example, same user information can correspond to it is multiple thumb up event type, each thumb up event type Corresponding respective event initiates source, event response side and Time To Event.
The processing module 32 can store the data set of obtained each group in the database.In some embodiments In, the data set can be obtained from the database of a storage You Ge groups and its data set, and the database is for example configured at one In the storage server of distal end or in the storage device that is configured in local computer equipment, acquired module 31 can base It extracts and obtains from database in the input operation of user.For example, the processing module 32 is obtained using unsupervised detection algorithm To multiple groups, user selects one of group by selection interface, then obtains the data set of relevant groups.
Specifically, the processing module 32 first to data all in operation log same class data characteristics similarity into Row calculates, wherein, the similarity available information entropy is weighed, for example, the processing module 32 is utilized respectively user's letter Breath calculates the comentropy of IP usage amounts or maximum IP usage amount dimensions, utilizes the information of event type calculating operation type dimension Entropy calculates comentropy of bad operation dimension etc. using the comentropy of registion time dimension or operating time;By above-mentioned It calculates, processing module 32 recycles unsupervised detection mode to be detected obtained each comentropy and divides to obtain multiple groups Group.Wherein, the unsupervised detection mode citing is included using the algorithm based on dense subgraph or the calculation based on vector space Method etc..Each group that method for visualizing provided herein is presented for reflect shared resource used in fraud, Customer relationship etc., the user using the group data visualization system 3 to be allowed more clearly to determine that the unsupervised detection is calculated Whether the classification policy in method is reasonable.Wherein, the shared resource includes but not limited to shared IP, mailbox etc., customer relationship Including but not limited to:User's concern, interactive relation etc..
In one embodiment, the display module 33 in the group data visualization system 3 shows at least one group circle Face, the group size in the group interface are characterized with the geometric figure size shown.Referring to Fig. 3, it is shown as wrapping Interface containing multiple groups, as shown in the figure, 11 groups are shown in interface, for characterizing the geometric figure of those groups as circle Shape, described 11 groups are all located in a maximum circle of dotted line, and in the circle of dotted line, for example the circle of dotted line is used for characterizing One cluster being made of N number of network user, such as group marked as 0 is, for example, normal group, in a smaller circle of dotted line Inside there is of different sizes 10 group marked as 1-10, circular size is directly proportional to the number of members of group, that is, big group Group represents more, the small group's person's of the being expressed as negligible amounts of number of members, such as the group marked as 1-10 is abnormal group. In different embodiments, the geometric figure of the group can be arbitrary shape.The color of geometric figure can be randomly provided, It is or related to the number of members of the quantity of group or group.For example, being preset with N kind colors, the processing module 32 randomly will Different colours are encoded on the geometric figure for characterizing each group, and are passed through display module 33 and shown on the display device.For another example, institute Processing module 32 is stated according to preset color sequences, is encoded into successively according to the ascending sequence of number of members each for characterizing The geometric figure of group, and pass through display module 33 and show on the display device.When display interface described in user's operation is chosen During one geometric figure, the acquisition module 31 obtains the data set of a group.
In a preferred embodiment, the display module 33, which is shown, can also include display at least one group interface The information bar of group information, when user selects a group in the group interface, the side at interface with form or The mode of text box shows the essential information of the group, and the essential information is, for example,:Group's coding, is used for number of members Determine the information such as the most preferred data characteristics of the group, group attribute (such as normal group or abnormal group).The display Module is, for example, to include display.
In order to by group data visualization system 3 to the analysis result of the data set of acquired group with the side of time shaft Formula describes, and the processing module 32 is used to create first time axis and the second time shaft and the coding to the data characteristics. The display module 33 shows first, second time shaft and display first, second, the by showing equipment in an interface Three and the 4th shape, wherein, the first shape is as the node of the first time axis to characterize the group described The event type and quantity occurred in each time granularity of one time shaft;Second time shaft described in second shape characterization The total quantity of each event type occurred in time interval;The event characterized in second shape described in the third shape characterization Distribution of the type on second time shaft;Group described in 4th shape characterization is in second time shaft per for the moment Between the event type that occurs in granularity and quantity.Wherein, the display equipment can be that computer equipment institute is external or integrated Display screen, the driver of display screen and the presentation engine of special configuration for processing display data;The presentation engine packet It includes but is not limited to:Picture processing chip and operate in display program in the picture processing chip etc..
Wherein, the first time axis and the second time shaft are the temporal informations in data set and create, such as Time span is up to 10 days in multiple temporal informations in the data set, then during the maximum of first time axis or the second time shaft Between section be 10 days.In one embodiment, according to identical time interval and time granularity to creating first time axis and the Two time shafts;In another embodiment, during according to different time interval and time granularity to creating first time axis and second Countershaft is detailed later.
Wherein, the processing module 32 to data characteristics to be presented, event type, event type all numbers such as quantity According to carrying out patterning coding, in order to the beautiful interface, clear showed.Here, processing module 32 is according to first time axis Time granularity the quantity of event type that data are concentrated is counted, the event type counted is encoded into preset The figure of one shape, and be in as the node of first time axis using encoded each first shape according to sequential by display module 33 Now on first time axis.By the display of each node on first time axis, domain expert can be clearly obtained according to time institute The event type of statistics is in distribution or quantitative change procedure.Wherein, the first shape includes but not limited to:Pie shape Shape or columnar shape.In some implementation examples, processing module 32 can be by the quantity of each event type in a time granularity The accounting situation of percentage is encoded into the figure of first shape and is shown on first time axis by display module 33, wherein, together The color in the accounting region of sample event type is identical.Exist referring to Fig. 4, Fig. 4 is shown as the application group data visualization system The schematic diagram shown in one embodiment, as shown in the figure, in the interface of display, the first time axis T1 is located at display interface Lower zone, be shown as, from August 1 day to the August time interval of 10 days on the 10th, using day as time granularity, to be counted daily The number percent distributed code of event type be shown on first time axis T1 into pie figure and as node, the cake Color in shape figure be used to representing in event type, such as figure be designated as " Huang " color be expressed as concern event, be designated as in figure " red " color is expressed as present event, be designated as in figure " indigo plant " color be expressed as thumb up event, than middle first time axis T1 as shown On August this day on the 7th for being shown using pie figure as node, concern event accounting is more in the event type of generation, presents thing Part accounting is less, and it is minimum to thumb up event accounting.
In addition, the quantity of event type that the processing module 32 concentrates data according to the time interval of the second time shaft It sums up, added up each event type is encoded into the figure of preset second shape, and pass through display module 33 and show The total quantity of each event type in a time interval of the second time shaft.Wherein, second shape includes but not limited to:Directly Fang Tu, block diagram, line chart etc..According to the time interval for creating the second time shaft, shown various event types it is total Quantity reflects the comparison situation of each event type quantitatively in same time interval.When the time of second time shaft When section represents one day or one week, user can be according to the sum of shown correspondence " red ", " Huang " and " indigo plant " three kinds of event types The length of the columnar shape of amount determines comparison situation of three kinds of event types in total quantity.In addition, shown block diagram Shape can also determine the comparison situation of three kinds of event types in total quantity according to thickness, transparency etc..Figure is please referred to again 3, as shown in the figure, the side (right side in diagram) for closing on the first time axis T1 shows that there are one in horizontal histogram, institute It states in histogram and shows " red ", " Huang " and " indigo plant " three column-shape strips from top to bottom, when second described in the length representative of column-shape strip The total quantity of event type is generated in the time interval of countershaft, can be found out from second shape, the second time shaft when Between generate in section the column-shape strip expression that " Huang " color is designated as in event type concern event it is most, be designated as the column-shape strip of " red " color The present event of expression is taken second place, be designated as that the column-shape strip of " indigo plant " color represents to thumb up event minimum.
The total quantity of each event type occurred in time interval by showing the second time shaft, domain expert can The change procedure of the event type counted according to the time quantitatively is clearly obtained from another visual angle.For clearer display Incidence relation between first time axis and the second time shaft, the processing module 32 based on the coding to the data characteristics, And the second time shaft is shown by display module 33, processing module 32 is again by the event type characterized in second shape and the thing Part type is associated in each time granularity of second time shaft, and passes through display module 33 by each of third shape characterization Distribution of the event type on second time shaft is shown.Wherein, the second time shaft was rendered into the corresponding time Granularity is the axis of node, will be distributed over the event type of each adjacent node using third shape and the second shape is associated, So that user clearly obtains the incidence relation between the second shape and each time granularity of the second time shaft.Wherein, third shape Shape can be linear its color can be according to the color that event type is corresponded in the second shape depending on, in order to which user is allowed clearly to divide Distinguish unified event type.
It is multiple referring to Fig. 3, as shown in the figure, the second shape and the second time shaft are associated by third shape, wherein, Third shape is by taking camber line as an example and the temporal information based on event type each in data set spreads to each time of the second time shaft On the node of granularity.For example the present event and that the column-shape strip of " red " color represents is represented with dotted lines (the first dotted line) in figure The association of corresponding timing node (time granularity) on two time shafts represents the concern thing of the column-shape strip expression of " Huang " color with continuous lines The association of corresponding timing node (time granularity) on part and the second time shaft, the line table formed with Points And lines section (second of dotted line) Show the association for thumbing up corresponding timing node (time granularity) on event and the second time shaft that the column-shape strip of " indigo plant " color represents.Not In same embodiment, the third shape describes to be produced in corresponding time granularity interval using line weight or transparency The quantity of raw event type, thus convenient for presentation event occur the high frequency period or rule.
In order to more intuitively show the event type occurred in each time granularity interval on the second time shaft and quantity, institute It states display module 33 and the 4th shape is shown also under the control of processing module 32, to characterize the group in second time shaft Each time granularity in occur event type and quantity.Wherein, the processing module 32 according to the second time shaft time The quantity for the event type that data are concentrated in section sums up or distribution statistics, by added up event type or distribution situation Be encoded into the figure of preset 4th shape, and by display module 33 according to sequential using encoded each 4th shape as second The node of time shaft is presented on the second time shaft.Wherein, the processing module 32 according to the second time shaft created when Between granularity under the guide of third shape, control display module 33 show corresponding 4th shape.By each on the second time shaft The display of node, user can clearly obtain the variation of the event type counted according to the time quantitatively from another visual angle Journey.Wherein, the 4th shape includes but not limited to:Pie shape or columnar shape, and it is selected differently from the shape of first shape Shape.In some implementation examples, processing module 32 can be by the quantity of each event type in the time granularity of the second time shaft It adds up and is separately encoded into the figure of the 4th shape and is shown on the second time shaft by display module 33, wherein, similary event Type adds up and uses the color identical with third shape and the second shape.
Using time shaft as one of in a manner of group data is presented, be because of either domain expert or algorithm expert, The behavior for understanding centrality of the user within a section time is very crucial.For this reason, it may be necessary to by first time axis and second The combination of time shaft describes the behavior of this centrality.
Referring to Fig. 3, as shown in the figure, each cake chart in first time axis T1 presents each time granularity Ratio shared by (as daily) different event type (as paid close attention to a user or having sent a present on the net to some user) Example.Each event type is encoded to different colours by processing module 32, by each event in the unit interval granularity of first time axis T1 The quantity of type is encoded to the area accounting in each region in pie chart to form a pie chart, by the time interval of the second time shaft T2 The quantity of interior each event type is encoded to the length of column figure to form block diagram (i.e. the second shape of corresponding each event type Shape), each event type quantity in the unit interval granularity of the second time shaft T2 is encoded to the length of block diagram to be formed individually Block diagram (i.e. the 4th shape);When user selects a pie chart on first time axis T1, from corresponding each event type Second shape is projected using event type as the camber line (i.e. third shape) of color, and is corresponded on the second time shaft T2 and corresponded to the time The time shaft relationship that one group data is concentrated each event type by each 4th of granularity in shape, thus is clearly presented to use Family.
In one embodiment, during according to identical time interval and time granularity to creating first time axis and second Countershaft.For example, the first time axis and the second time shaft of the 32 pre-loaded time granularity all same of processing module, for institute Display module 33 is stated to correspond to each event type on each time shaft according to the temporal information in data set and time granularity, with To at least one time interval of respective time shaft.For another example, the row of temporal information of the processing module 32 in data set Sequence determines the time interval of preset first time axis and the second time shaft, and by display module 33 according in data set Temporal information and time granularity each event type is corresponded on each time shaft.Please refer to that Fig. 3 shows comprising at the first time The interface of axis T1 and the second time shaft T2.Wherein, T1 and T2 time shafts are using day as time granularity, with 10 days for time zone Between, the processing module 32 is shown according to the temporal information in data set on the first time axis T1 and the second time shaft T2 Data characteristics in data set.Such as with shown in Fig. 3, using day as time granularity, processing module 32 can incite somebody to action the second time shaft T2 The summation cloth of each event type counted daily is encoded into column figure and is shown in the by display module 33 as node On two time shaft T2.
In another implementation example, according to different time interval and time granularity to creating first time axis and when second Countershaft.Wherein, the time interval of second time shaft is the time granularity of the first time axis.For example, processing module 32 Preset first time axis and the second time shaft time granularity it is different and preset between two time shafts time granularity it Between correspondence, each event type corresponds to each time shaft by the display module 33 according to the temporal information in data set On.Referring to Fig. 5, its interface for being shown as including first time axis T1 and the second time shaft T2.Wherein, T1 time shafts were with 10 days For time interval, using day as time granularity, T2 time shafts are using day as time interval, using hour as time granularity;The display mould Block 33 is according to the number that display data is concentrated on the first time axis T1 and the second time shaft T2 of the temporal information in data set According to feature.Such as with interface C 2 shown in fig. 5, for the second time shaft T2 using hour as time granularity, processing module 32 can will be per small When the summation of event type that is counted be encoded into column figure and the second time shaft be shown in by display module 33 as node On T2.
The quantity of each event type in the time granularity of first time axis T1 is encoded to pie chart Zhong Ge areas by processing module 32 The area accounting in domain is to form a pie chart;When user selects a pie chart on first time axis T1, by the second time shaft The quantity of each event type (being equivalent to each event type corresponding to selected pie chart) is encoded to column in the time interval of T2 The length of shape figure is to form the block diagram (i.e. the second shape) of corresponding each event type, by the unit interval of the second time shaft T2 Each event type quantity is encoded to the length of block diagram to form individual block diagram (i.e. the 4th shape) and from right in granularity Camber line (i.e. third shape) of the second shape injection of each event type using event type as color is answered, and corresponded to for the second time The each 4th of time granularity is corresponded on axis T2 in shape, thus a group data is concentrated to the time shaft relationship of each event type Clearly it is presented to the user.
Interface C 1 as shown in Figure 4, each cake chart in first time axis T1 all show each time granularity (as every My god) ratio shared by different event type (as paid close attention to a user or having given a present on the net to some user).It will Each event type is encoded to different colours, than the color in pie figure described in as shown for representing event type, such as " Huang " color is designated as in figure is expressed as concern event, and " red " color is designated as in figure is expressed as present event, and " indigo plant " color is designated as in figure Be expressed as thumb up event, than August this day on the 7th shown on middle first time axis T1 as shown using pie figure as node, Concern event accounting is more in the event type of generation, and present event accounting is less, and it is minimum to thumb up event accounting.When user selects Node August 7 days on first time axis T1 this when, then August 24 hours of this day on the 7th are then shown on the second time shaft T2 Event type and the corresponding quantity of each event type interior, that each hour occurs.
It should be strongly noted that above-mentioned first time axis and the time interval and time granularity of the second time shaft are not Illustrated situation is limited to, in various embodiments, user can set first time axis and second according to practical situation The time interval and time granularity of time shaft, for example be week, the moon, season even Nian Deng chronomeres.
The group that user classifies to group data visualization system using the presentation process and the statistical conditions shown Group is detected and domain expert is allowed to find or correct the deficiency in detection algorithm using the visual interface.In addition, in order to More clearly show the incidence relation of two time shafts, the group data visualization system further includes first detection module and (do not give Diagram).Detect user be based on the first detection module select the first shape when, by the third shape dynamic, Highlighted or dynamic and highlighted ground show the event type of the interior generation of time granularity of the first shape characterization at described second The distribution of countershaft.For example, in interface C 1 shown in Fig. 4, when user chooses a pie chart on first time axis T1, with The corresponding sudden strain of a muscle of each third shape flicker several seconds or longer time that the block diagram of pie chart is chosen to be connected on second time shaft T2 It is bright, also or it is highlighted, when user chooses another pie chart on first time axis T1, flicker before this and highlighted third shape Shape restores original shape and color, and each third shape that the block diagram of chosen pie chart corresponding on the second time shaft T2 is connected Shape flickers the several seconds and is highlighted.
In certain embodiments, the group data visualization system further includes the second detection module.When detecting user When selecting the first shape based on second detection module, the display module 33 is shown when the first shape is chosen Show amplification, so that user more clearly checks the comparison situation of event type quantity that first shape characterized.A kind of specific In example, the first shape, which is amplified in the side of the first time axis when selected, to be shown.For example, chosen first Shape amplifies display on the upside of first time axis, in interface C 3 as shown in Figure 6 a.In another specific example, first shape Shape amplifies display when selected in the first time axis.For example, the first shape chosen is in the same of first time axis One center location is displayed magnified, in interface C 4 as shown in Figure 6 b.
In some embodiments, user is not only concerned about the variation that group data concentrates each event type to be presented according to time shaft Situation, it is whether reasonable more concerned with the group distributed, this need user can check the detailed data feature in each group and The preferred order of each data characteristics built for classifying group.The display module 33 is additionally operable to the number of one group of display According to the interface of collection.Shown data set is shown with list mode, thus displays for a user data characteristics in same group Details.To improve the group data collection classification accuracy, shown list can be according to group number in the interface Priority of classifying based on when classifying according to visualization system shows the data characteristics list in a group by column.For example, Referring to Fig. 6, it is shown as the list interface schematic diagram of the data set of a group.In the list interface schematic diagram, institute The data set of one group of display is obtained by sorting according to the similitude of data characteristics for the sequence of priority from high to low. It when the data characteristics similitude in the first priority is identical, is ranked up according to the data characteristics of the second priority, in Fig. 7 institutes In the embodiment shown, the sequence of the priority from high to low is:IP address, event initiate source (source), event response side (target), event type (event_type) and Time To Event (timestamp).In the present embodiment, processing module 32 encode the importance of the new line of table different lines, if the value of a feature is more concentrated, then this feature It is more important.In the embodiment provided in the application, the group data visualization system is by calculating each feature Comentropy represents this characteristic.If comentropy is lower, then means that consistency is higher.Then the processing module 32 Feature is ranked up according to the incremental sequence of comentropy, finally by display module 33 by the list head front of low comentropy It, can also be according to by the list head progress color in table show under different performances come prompting family note that certain It renders, for example most the color rendering of the list head of low comentropy prompts the attention at the family data that the row are characterized to be most deep at last Feature is mostly important, and so on carry out other data characteristicses that the color rendering row are characterized, and then obtain shown in figure Data set list interface.The list interface can be undertaken on the multiple group interfaces of display or time shaft display interface and then The selection operation of the list interface is selected based on user and is shown.
In certain embodiments, whether the data set of the group acquired for further characterization can reflect fraud Characteristic, it is also necessary to be shown from other dimensions.For example, network operation data and group data by comparing normal users Collect the accuracy to further confirm that detected fraud.For this purpose, the display module 33 is additionally operable to show the group Data set feature distribution interface, the histogram of the feature distribution and the corresponding histogram are in entire cluster histogram In profiles versus figure.Wherein, the feature distribution interface can show the distribution with each data type in overall network, described Overall network be opposite, for example a cluster is formed by multiple network users, then can be by the interface display cluster The distribution of some data characteristics in some group, referring to Fig. 2, maximum circle of dotted line represents one by more in such as Fig. 2 A network user forms cluster, and cluster Zhong You11Ge groups are the group that number is 0-10 respectively, therefrom select a group It is shown into row information.
In some embodiments, the data type that feature distribution interface can be shown is, for example,:It ties up at average operating time interval The comentropy (average operation interval entropy) of degree, the comentropy (IP of IP address usage amount dimension Used amount entropy), the comentropy (sex entropy) of gender dimension, the comentropy (email of Email dimension Entropy), the comentropy (reg time entropy) of registion time dimension, the comentropy of number of operations dimension (operation times entropy), the comentropy (device amount entropy) of number of devices dimension operate class The comentropy (operation type entropy) of type dimension, the maximum amount of comentropy (max that is used by others using IP IP used be used amount entropy) etc..In the embodiment shown in fig. 7, with the information of registion time dimension For entropy to be shown for data characteristics, i.e. Fig. 7 is shown as the comentropy (registration period) of registion time dimension in a group Feature distribution in network cluster.In order to which effective ratio is to the network operation data of acquired group data collection and normal users Feature distribution difference, as shown in figure 8, processing module 32 perform following steps with obtain for show feature distribution histogram and The data of profiles versus figure of the corresponding histogram in entire cluster histogram, and then shown by display module 33.
In step S211, a group is selected, and at least one data are determined from the data set of the group Feature.In one embodiment, group marked as 2 such as in selection Fig. 3, and from the data in the group marked as 2 The data characteristics for determining that one is user information is concentrated, for example the user information is registion time.
In step S212, determining at least one data characteristics feature in the group and cluster point is counted Cloth.In the present embodiment, the statistics feature distribution and statistics institute for the data characteristics of registion time in the group State feature distribution of the data characteristics for registion time in the entire cluster.
In step S213, show the histogram of the feature distribution and correspond to the histogram in entire cluster histogram In profiles versus figure.In the present embodiment, based on the coding to the data characteristics, the display data for registion time Feature in the group histogram of feature distribution and the display data characteristics for registion time in the entire collection The histogram of feature distribution in group.Referring to Fig. 9, it is shown as registion time of the application in one embodiment in a group The histogram of feature distribution and the interface of comparison diagram, as shown in the figure, in the interface D, figure (a) is shown as selected marked as 2 Group in registion time feature distribution thumbnail, the amplification of the corresponding thumbnail, then the amplification for lower side in the D of interface Scheme (d), it can be seen from the enlarged drawing in the group, from August 1 day to August one middle of the month of 31 days, the group member The time for carrying out registration operation concentrates on August 5, August 6 days, August 11 days, August 12 days and August this 5 days on the 16th, and in institute It states figure (c) in the D of interface and is characterized as the Nogata that registered user in the cluster carries out the Annual distribution of registration operation in August part Figure, from the figure (c) as can be seen that registered user has certain rule in the in one's duty registration distribution of August in the cluster, on boundary The data characteristics that figure (b) is characterized as overlaping in figure (d) and figure (c) being shown as registion time in the D of face is described whole Difference in a cluster and in the group of selection.In order to allow users to know the difference between different characteristic and connection It is to be presented this block diagram in the form of three layers in the embodiment that the application provides, user, which passes through, clicks one of contracting After sketch map, the page will be scrolled into schemes by normalized profiles versus.Certainly, in specific application, the data characteristics Thumbnail there may also be multiple, each represent different data characteristicses.
In some embodiments, display module 33 can also be distinguished by carrying out color rendering to histogram or emphasize certain A data characteristics in the group and entire cluster feature distribution or Dynamic Announce (such as the mode flickered) to distinguish or Emphasize some data characteristics feature distribution in the group and entire cluster.
In some embodiments, it is described in order to further analyze the difference between multiple groups in a network cluster Display module 33 also shows the interface of the feature distribution of the data set of multiple groups, please refer to Fig.1 0 and Figure 11, Figure 10 be shown as The application shows the step of multiple groups are distributed in the cluster in one embodiment, and Figure 11 is shown as the application in an implementation Multiple groups distribution interface E in the cluster is shown in example, as shown in the figure, the processing module 32 is held according to step shown in Fig. 10 Row, display module 33 show the interface shown in Figure 11.
In step S311, multiple groups are determined in the cluster be made of multiple network users, respectively with different shape, figure Mark, label and/or the difference of the multiple group of characterization;In one embodiment, for example, selection Fig. 3 in label 0,1 and 2 3 groups, wherein, the group marked as 0 shows that the group marked as 1 is shown with " red " color table, marked as 2 with " green " color table Group is shown with " indigo plant " color table.
In step S312, at least one data characteristics is determined from the data set of the multiple group;In the present embodiment In, a data characteristics, such as IP address are determined from the data set of this 3 groups.
In step S313, based between each two network user at least one data characteristics analysis respectively group Relative Entropy as the similarity degree between measuring each two network user;In the present embodiment, based on the IP Relative Entropy (the letter of IP usage amount dimensions in 3 groups of adress analysis label 0,1 and 2 between each two network user Cease entropy, IP used amount entropy) as the similarity degree measured between each two network user.For example, it adopts It is used as with the method t-SNE of Data Dimensionality Reduction (t- distribution neighborhoods embedded mobile GIS) and with the relative entropy between two users and measures this The index of a little network user's distances.
In step S314, display interface is exported, in the interface, with shape, icon, and/or tag characterization network User characterizes the difference of the multiple group with different colours, and two network users in each group are characterized with the distance of display Between similarity degree.In the present embodiment, in interface E as shown in figure 11, the network user is characterized with dot, " green " color table shows Group marked as 0, shows the group marked as 1 with " red " color table, shows the group marked as 2 with " indigo plant " color table, wherein, with " indigo plant " Color table shows that the user distance in the group marked as 2 is shorter, which forms tufted distribution, is shown with " red " color table marked as 1 User distance in group is also shorter, which forms tufted distribution, and the normal users for representing random sampling are shown with " green " color table Distribution, farther out, distribution more disperses for the distance between normal users.Thereby it is believed that a group is if dense Cluster, be considered as a fraud group possibility it is bigger.Than in embodiment as shown in figure 11, which shows Group is in the distribution that more disperses, then it represents that for should " green " colo(u)r group group be normal group, the user of " green " point expression therein For normal users.Opposite, the group (group i.e. marked as 1) shown with " red " color table and the group shown with " indigo plant " color table (group i.e. marked as 2) is distributed in into tufted, then it represents that and it is abnormal group to be somebody's turn to do " red " color and " indigo plant " colo(u)r group group for this, wherein, The user represented with " red " point and " indigo plant " point is abnormal user.It in one embodiment, can using the user of the visualization system The specifying information of user and feature value in each group are interactively checked by mouse suspension.
In other examples, in the interface of output, for example, shape, icon, and/or tag characterization can also be used The network user, for example shape is the geometric figures such as triangle, rectangle, such as icon characterize, such as label for smiling face or face of crying With word or with symbol clearly distinguished etc..
It is set it should be noted that all modules in the group data visualization system can be configured in single computer It is standby upper.Or each module in the group data visualization system is arranged, respectively the client and network side of user side On server, and client is connect with server network.For example, the acquisition module and processing module of group data visualization system In the server, display module is mounted in client for installation, and the client is based on sending request to log in the server-side, The server runs the group data visualization system based on the operation that the client executing is asked to the client, And pass through client and show respective interface.The client includes but not limited to:It is configured in the browser of user terminal or special The interface of client software and for performing hardware of display interface program etc..
It should also be noted that, through the above description of the embodiments, those skilled in the art can be clearly Solving the part or all of of the application can realize by software and with reference to required general hardware platform.Based on such reason Solution, the part that the technical solution of the application substantially in other words contributes to the prior art can in the form of software product body Reveal and, which may include being stored thereon with machine readable Jie of one or more of machine-executable instruction Matter, these instructions can make by computer, computer network or other electronic equipments when one or more machines perform It obtains the one or more machine and performs operation according to an embodiment of the present application.Machine readable media may include, but be not limited to, soft Disk, CD, CD-ROM (compact-disc-read-only memory), magneto-optic disk, ROM (read-only memory), RAM (random access memory), EPROM (Erasable Programmable Read Only Memory EPROM), EEPROM (electrically erasable programmable read-only memory), magnetic or optical card, sudden strain of a muscle Deposit or suitable for store machine-executable instruction other kinds of medium/machine readable media.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as:Personal computer, service Device computer, handheld device or portable device, laptop device, multicomputer system, the system based on microprocessor, top set Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer, including any of the above system or equipment Distributed computing environment etc..
The application can be described in the general context of computer executable instructions, such as program Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, group Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these distributed computing environment, by Task is performed and connected remote processing devices by communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage device.
It should be noted that it will be understood by those skilled in the art that above-mentioned members can be programmable logic device, Including:Programmable logic array (Programmable Array Logic, PAL), Universal Array Logic (Generic Array Logic, GAL), field programmable gate array (Field-Programmable Gate Array, FPGA), complex programmable patrol One or more in volume device (Complex Programmable Logic Device, CPLD), the application, which does not do this, to be had Body limits.
In conclusion the application by the data set of fraud detection process Zhong Suofen groups by being based on time shaft, class The modes such as type distribution, tabulation are presented, and realize the data characteristics of Suo Fen groups during fraud is detected with more Kind relationship interface is shown, and is conducive to domain expert and algorithm expert and the detection algorithm of fraud detecting system is commented Estimate and revise.
The principles and effects of the application are only illustrated in above-described embodiment, not for limitation the application.It is any ripe Know the personage of this technology all can without prejudice to spirit herein and under the scope of, modifications and changes are carried out to above-described embodiment.Cause This, those of ordinary skill in the art is complete without departing from spirit disclosed herein and institute under technological thought such as Into all equivalent modifications or change, should be covered by claims hereof.

Claims (27)

1. a kind of group data method for visualizing, applied in a fraud detecting system, which is characterized in that including following step Suddenly:
Obtain the data set of a group, the data characteristics in the data set include at least event type and with the event class The associated temporal information of type;
Create first time axis and the second time shaft;
Based on the coding to the data characteristics, the first time axis using first shape as node is shown, to characterize the group The event type and quantity that group occurs in each time granularity of the first time axis;
The second shape is shown, to characterize the total quantity of each event type occurred in the time interval of second time shaft;
The second time shaft is shown, by the event type characterized in second shape and the event type in second time shaft Each time granularity be associated, and pass through distribution of each event type of third shape characterization on second time shaft; And
The 4th shape is shown, to characterize the event type that the group occurs in each time granularity of second time shaft And quantity.
2. group data method for visualizing according to claim 1, which is characterized in that the data for obtaining a group The step of collection, includes:
Obtain the operation log that cluster is made of multiple network users;
At least one data characteristics is determined from the operation log of the multiple network user, and is analyzed in the operation log extremely The similarity of few one group of data characteristics is with the determining group;And
Obtain the data set of the group.
3. group data method for visualizing according to claim 1 or 2, which is characterized in that it is at least one to further include display The step of group interface, the group size in the group interface are characterized with the geometric figure size shown.
4. group data method for visualizing according to claim 1 or 2, which is characterized in that further include one group of display Data set interface the step of, the data characteristics of the data set of the group include user information, IP address, event type, Event initiates at least the two data characteristics in source, event response side and Time To Event, on the boundary of the group data collection In face, sequencing display after the group data collection is grouped.
5. group data method for visualizing according to claim 2, which is characterized in that further include the number for showing the group According to the feature distribution of collection interface the step of:
A group is selected, and at least one data characteristics is determined from the data set of the group,
Count feature distribution of the determining at least one data characteristics in the group and cluster;And
Show the profiles versus's figure of the histogram and the corresponding histogram of the feature distribution in entire cluster histogram.
6. group data method for visualizing according to claim 1, which is characterized in that further include the number for showing multiple groups According to the feature distribution of collection interface the step of:
Multiple groups are determined in the cluster be made of multiple network users, respectively with different shape, icon, label and/or color Characterize the difference of the multiple group;
At least one data characteristics is determined from the data set of the multiple group;
Based on the Relative Entropy conduct between each two network user at least one data characteristics analysis respectively group Measure the similarity degree between each two network user;And
Display interface is exported, in the interface, with shape, icon, and/or the tag characterization network user, with different colours table The difference of the multiple group is levied, the similarity degree in each group between two network users is characterized with the distance of display.
7. group data method for visualizing according to claim 1, which is characterized in that further include the first shape in quilt The step of amplification is shown during selection includes:
The first shape, which is amplified in the side of the first time axis when selected, to be shown;Or
The first shape amplifies display when selected in the first time axis.
8. group data method for visualizing according to claim 1, which is characterized in that the event type is used including network The concern at family, thumb up, comment on, presenting, logging in, publishing, more new state, registration, at least one of modification information.
9. group data method for visualizing according to claim 1, which is characterized in that the establishment first time axis and the The step of two time shafts for according to identical time interval and time granularity to creating first time axis and the second time shaft.
10. group data method for visualizing according to claim 1, which is characterized in that it is described establishment first time axis and The step of second time shaft for according to different time interval and time granularity to creating first time axis and the second time shaft, institute The time interval for stating the second time shaft is the time granularity of the first time axis.
11. group data method for visualizing according to claim 9 or 10, which is characterized in that further include first shape Shape dynamically and/or highlightedly is shown when selected, by the third shape in the time granularity that the first shape characterizes The event type of generation is in the distribution of second time shaft.
12. a kind of computer equipment, which is characterized in that including:
One or more processors;And
The presentation engine performed on the one or more processors, the presentation engine is for execution such as claim 1-11 Any one of them group data method for visualizing.
13. a kind of group data visualization system, which is characterized in that including:
Acquisition module, by the data set of one group of network acquisition, the data characteristics in the data set includes at least event Type and temporal information associated with the event type;
Processing module creates first time axis and the second time shaft and the coding to the data characteristics;And
Display module shows first, second time shaft and display first, second, the by showing equipment in an interface Three and the 4th shape, wherein, the first shape is as the node of the first time axis to characterize the group described The event type and quantity occurred in each time granularity of one time shaft;Second time shaft described in second shape characterization The total quantity of each event type occurred in time interval;The event characterized in second shape described in the third shape characterization Distribution of the type on second time shaft;Group described in 4th shape characterization is in second time shaft per for the moment Between the event type that occurs in granularity and quantity.
14. group data visualization system according to claim 13, which is characterized in that the group is obtained by described The operation log for multiple network users that modulus block obtains, and analyze at least one set in the operation log through the processing module What the similarity of data characteristics determined.
15. group data visualization system according to claim 13, which is characterized in that the display module is additionally operable to show Show at least one group interface, the group size in the group interface is characterized with the geometric figure size shown.
16. group data visualization system according to claim 13, which is characterized in that the display module is additionally operable to show Show the interface of the data set of a group, the data characteristics of the data set of the group includes user information, IP address, event class Type, event initiate at least the two data characteristics in source, event response side and Time To Event, in the group data collection Interface in, the group data collection it is grouped after sequencing display.
17. group data visualization system according to claim 13, which is characterized in that the display module is additionally operable to show Show the interface of the feature distribution of the data set of the group, the histogram of the feature distribution and the corresponding histogram are entire Profiles versus's figure in cluster histogram.
18. group data visualization system according to claim 13, which is characterized in that the display module is additionally operable to show Show and use shape, icon, and/or the tag characterization network user, the difference of the multiple group is characterized with different colours, with display Distance characterizes the interface of the similarity degree between two network users in each group.
19. group data visualization system according to claim 13, which is characterized in that further include detection module, detect When being based on the detection module selection first shape to user, the first shape shown in the display module is in institute State the side amplification display of first time axis;Or the first shape shown in the display module is in the first time Amplify display in axis.
20. group data visualization system according to claim 13, which is characterized in that the processing module create the One time shaft and the second time shaft have identical time interval and time granularity.
21. group data visualization system according to claim 13, which is characterized in that the processing module create the One time shaft and the second time shaft are to creating first time axis and the second time according to different time interval and time granularity Axis, the time interval of second time shaft are the time granularity of the first time axis.
22. the group data visualization system according to claim 20 or 21, which is characterized in that detection module is further included, When detecting that user is based on the detection module selection first shape, pass through third shape dynamic and/or highlighted ground Show distribution of the event type in second time shaft of generation in the time granularity of the first shape characterization.
23. group data visualization system according to claim 13, which is characterized in that the event type includes network The concern of user, thumb up, comment on, presenting, logging in, publishing, more new state, registration, at least one of modification information.
24. a kind of client passes through one server-side of network connection, which is characterized in that the client is based on sending request to step on Record the step of server-side performs claim 1-11 any one of them group data method for visualizing.
25. a kind of server passes through one client of network connection, which is characterized in that the server is held based on the client The operation of row request sends claim 1-11 any one of them group data method for visualizing to the client Process simultaneously shows implementing result by the client.
26. a kind of browser passes through one server-side of network connection, which is characterized in that the browser is based on sending request to step on Record the step of server-side performs claim 1-11 any one of them group data method for visualizing.
27. a kind of computer readable storage medium is stored with data visualization computer program, which is characterized in that the data Visual calculation machine program is performed the step of realizing any one of the claim 1-11 group data method for visualizing.
CN201810022368.6A 2018-01-10 2018-01-10 Group event data visualization method and system Active CN108170830B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810022368.6A CN108170830B (en) 2018-01-10 2018-01-10 Group event data visualization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810022368.6A CN108170830B (en) 2018-01-10 2018-01-10 Group event data visualization method and system

Publications (2)

Publication Number Publication Date
CN108170830A true CN108170830A (en) 2018-06-15
CN108170830B CN108170830B (en) 2020-07-31

Family

ID=62517777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810022368.6A Active CN108170830B (en) 2018-01-10 2018-01-10 Group event data visualization method and system

Country Status (1)

Country Link
CN (1) CN108170830B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876479A (en) * 2018-07-18 2018-11-23 口口相传(北京)网络技术有限公司 The channel attribution method and device of object entity
CN109033194A (en) * 2018-06-28 2018-12-18 北京百度网讯科技有限公司 Affair displaying method and device
CN109191350A (en) * 2018-07-06 2019-01-11 贵州黔商科技有限公司 A kind of census management method based on big data family tree
CN113538058A (en) * 2021-07-23 2021-10-22 四川大学 Multi-level user portrait visualization method oriented to online shopping platform
WO2022033493A1 (en) * 2020-08-12 2022-02-17 杨嶷 Information connection method and apparatus based on map and entity information unit

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6567814B1 (en) * 1998-08-26 2003-05-20 Thinkanalytics Ltd Method and apparatus for knowledge discovery in databases
US20050043961A1 (en) * 2002-09-30 2005-02-24 Michael Torres System and method for identification, detection and investigation of maleficent acts
CN101867489A (en) * 2010-06-11 2010-10-20 北京邮电大学 Method and system for realizing real-time displayed social network visualization
US20110213788A1 (en) * 2008-03-05 2011-09-01 Quantum Intelligence, Inc. Information fusion for multiple anomaly detection systems
CN102629271A (en) * 2012-03-13 2012-08-08 北京工商大学 Complex data visualization method and equipment based on stacked tree graph
CN104536956A (en) * 2014-07-23 2015-04-22 中国科学院计算技术研究所 A Microblog platform based event visualization method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6567814B1 (en) * 1998-08-26 2003-05-20 Thinkanalytics Ltd Method and apparatus for knowledge discovery in databases
US20050043961A1 (en) * 2002-09-30 2005-02-24 Michael Torres System and method for identification, detection and investigation of maleficent acts
US20110213788A1 (en) * 2008-03-05 2011-09-01 Quantum Intelligence, Inc. Information fusion for multiple anomaly detection systems
CN101867489A (en) * 2010-06-11 2010-10-20 北京邮电大学 Method and system for realizing real-time displayed social network visualization
CN102629271A (en) * 2012-03-13 2012-08-08 北京工商大学 Complex data visualization method and equipment based on stacked tree graph
CN104536956A (en) * 2014-07-23 2015-04-22 中国科学院计算技术研究所 A Microblog platform based event visualization method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
童新安 等: ""可视化数据挖掘在信贷欺诈检测中的应用"", 《宜春学院学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033194A (en) * 2018-06-28 2018-12-18 北京百度网讯科技有限公司 Affair displaying method and device
CN109033194B (en) * 2018-06-28 2019-11-08 北京百度网讯科技有限公司 Affair displaying method and device
US11132387B2 (en) 2018-06-28 2021-09-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Event display method and device
CN109191350A (en) * 2018-07-06 2019-01-11 贵州黔商科技有限公司 A kind of census management method based on big data family tree
CN108876479A (en) * 2018-07-18 2018-11-23 口口相传(北京)网络技术有限公司 The channel attribution method and device of object entity
WO2022033493A1 (en) * 2020-08-12 2022-02-17 杨嶷 Information connection method and apparatus based on map and entity information unit
CN113538058A (en) * 2021-07-23 2021-10-22 四川大学 Multi-level user portrait visualization method oriented to online shopping platform
CN113538058B (en) * 2021-07-23 2023-04-07 四川大学 Multi-level user portrait visualization method oriented to online shopping platform

Also Published As

Publication number Publication date
CN108170830B (en) 2020-07-31

Similar Documents

Publication Publication Date Title
CN108170830A (en) Group event data visualization method and system
Zhang et al. Visual analytics for the big data era—A comparative review of state-of-the-art commercial systems
CN108268624A (en) User data method for visualizing and system
CN106056407A (en) Online banking user portrait drawing method and equipment based on user behavior analysis
Mahajan et al. Using visual symptoms for debugging presentation failures in web applications
US20050108196A1 (en) System and method for visualization of categories
CN102622552A (en) Detection method and detection system for fraud access to business to business (B2B) platform based on data mining
CN108280644A (en) Group member relation data method for visualizing and system
Petrillo et al. Interactive analysis of Likert scale data using a multichart visualization tool.
CN109858965A (en) A kind of user identification method and system
CN114692593B (en) Network information safety monitoring and early warning method
CN108009215B (en) A kind of search results pages user behavior pattern assessment method, apparatus and system
Paradies et al. Conceptual diagrams in public health research
Yin et al. Mechanisms of negativity bias: an empirical exploration of app reviews in Apple’s app store
CN115204881A (en) Data processing method, device, equipment and storage medium
Zdziebko et al. Monitoring human website interactions for online stores
US20140101159A1 (en) Knowledgebase Query Analysis
Qi et al. STBins: Visual tracking and comparison of multiple data sequences using temporal binning
CN108510007A (en) A kind of webpage tamper detection method, device, electronic equipment and storage medium
Alam et al. Developing a framework for analyzing social networks to identify human behaviours
Bharathy et al. Applications of social systems modeling to political risk management
Amyrotos Adaptive Visualizations for Enhanced Data Understanding and Interpretation
Khadka Data analysis theory and practice: Case: Python and Excel Tools
Gong et al. A Regional Approach to assessing and visualizing Spatiotemporal Clustering of Crime events
Hirschmeier et al. Information Quality Needs throughout the Purchase Process.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20181018

Address after: 100084 10 floor 1009-1, 3 building, 1 Zhongguancun East Road, Haidian District, Beijing.

Applicant after: Hua Ching Qing Chiao information technology (Beijing) Co., Ltd.

Address before: 100084 Tsinghua Yuan, Beijing, Haidian District

Applicant before: Tsinghua University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant