CN108280644A - Group member relation data method for visualizing and system - Google Patents

Group member relation data method for visualizing and system Download PDF

Info

Publication number
CN108280644A
CN108280644A CN201810022004.8A CN201810022004A CN108280644A CN 108280644 A CN108280644 A CN 108280644A CN 201810022004 A CN201810022004 A CN 201810022004A CN 108280644 A CN108280644 A CN 108280644A
Authority
CN
China
Prior art keywords
group
data
event
interface
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810022004.8A
Other languages
Chinese (zh)
Other versions
CN108280644B (en
Inventor
徐葳
孙娇
姚期智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huakong Tsingjiao Information Technology Beijing Co Ltd
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810022004.8A priority Critical patent/CN108280644B/en
Publication of CN108280644A publication Critical patent/CN108280644A/en
Application granted granted Critical
Publication of CN108280644B publication Critical patent/CN108280644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/382Payment protocols; Details thereof insuring higher security of transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A kind of group member relation data method for visualizing of the application offer and system.Wherein, the method for visualizing is applied in a fraud detecting system, includes the following steps:The data set of a group is obtained, the data characteristics of the data set includes one or more in user information, IP address, event type, event initiation source, event response side and Time To Event;Target signature is determined from the data characteristics;And be associated the member of the group according to the event type, and characterized with point and line chart in the display interface of output;Wherein, the point and/or line are for characterizing identified target signature.The application is presented by the way that the data set of fraud detection process Zhong Suofen groups is based on the modes such as member relation, it realizes the data characteristics of Suo Fen groups during detecting fraud to be shown with a variety of relationship interfaces, is conducive to domain expert and algorithm expert and the detection algorithm of fraud detecting system is assessed and revised.

Description

Group member relation data method for visualizing and system
Technical field
This application involves computer processing technical fields, more particularly to a kind of group member relation data method for visualizing And system.
Background technology
Online fraud has been current internet dark aspect known to everybody, it all can worldwide be caused every year Immeasurable loss.2015, million ranks that net crime complaint center has been connected in worldwide about taking advantage of The complaint of swindleness problem, and cheat on the net it is annual also can worldwide cause tens economic loss, fraudulent user is usual For can from help promote some specific commodity, or spread junk information in receive remuneration.In internet finance, fraud is used The credit card that family is applied for loan, stolen with them using false identity buys commodity, even carries out the unlawful activities such as money laundering.Cause This, in internet business scenario, finding suitable anti-fraud algorithm becomes more crucial, this demand is also growing day by day.
Although nowadays having many methods to identify the fraud on internet, by constructed fraud detecting system Limitation, the credible of the data of the corresponding fraud suspect filtered out needs follow-up a large amount of manpower verification, for example, platform Supervisor need to investigate verification one by one.This so that the revision of such as algorithm parameter, data characteristics are excellent in fraud detecting system Design, algorithm model selection of first grade etc., not only need the Software for Design of algorithm expert, with greater need for the participation of domain expert.Cause This, fraud Detection accuracy can be efficiently modified by improving the transparency of fraud recognizer, how to realize the visual of data Turn to this field urgent problem to be solved.
Invention content
In view of the foregoing deficiencies of prior art, a kind of group member relation data of being designed to provide of the application can Depending on changing method and system, for solving the problems, such as that fraud recognizer is visual in the prior art.
In order to achieve the above objects and other related objects, the first aspect of the application provides a kind of group data visualization side Method is applied in a fraud detecting system, includes the following steps:The data set of a group is obtained, the data set Data characteristics includes in user information, IP address, event type, event initiation source, event response side and Time To Event It is one or more;Target signature is determined from the data characteristics;And according to the event type by the member of the group It is associated, and is characterized with point and line chart in the display interface of output;Wherein, the point and/or line are true for characterizing Fixed target signature.
The second aspect of the application also provides a kind of computer equipment, including:Processor;It executes on the processor Engine is presented, the engine that presents is used to execute the group data method for visualizing.
The third aspect of the application also provides a kind of group data visualization system, including:Acquisition module, for obtaining one The data characteristics of the data set of a group, the data set includes user information, IP address, event type, event initiation source, thing It is one or more in part responder and Time To Event;Processing module, for determining that target is special from the data characteristics The member of the group, is associated by sign according to the event type;And display module, for being shown by display interface Point and line chart;Wherein, the point and/or line are for characterizing identified target signature.
The application provides a kind of client in fourth aspect, and by one server-side of network connection, the client is based on hair Request is sent to log in the step of server-side executes the group data method for visualizing.
The application provides a kind of server at the 5th aspect, and by one client of network connection, the server is based on institute The operation for stating client executing request sends the process of the group data method for visualizing and by described to the client Client shows implementing result.
The application provides a kind of browser at the 6th aspect, and by one server-side of network connection, the browser is based on hair Request is sent to log in the step of server-side executes the group data method for visualizing.
The application provides a kind of computer readable storage medium at the 7th aspect, is stored with data visualization computer journey Sequence, which is characterized in that the data visualization computer program is performed the step for realizing the group data method for visualizing Suddenly.
As described above, the group member relation data method for visualizing and system of the application, have the advantages that:This Application by the data set of fraud detection process Zhong Suofen groups by being based on member relation, type distribution, tabulation etc. Mode is presented, and is realized the data characteristics of Suo Fen groups during detecting fraud and is opened up with a variety of relationship interfaces Show, is conducive to domain expert and algorithm expert and the detection algorithm of fraud detecting system is assessed and revised.
Description of the drawings
Fig. 1 is shown as the application and is shown as the group data method for visualizing flow chart of the application in one embodiment.
What Fig. 2 was shown as the application provides a kind of flow chart of one group data collection of acquisition of embodiment.
Fig. 3 is shown as the interface for including multiple groups that the application is shown in one embodiment.
Fig. 4 is shown as generating the incidence relation interface schematic diagram of various event types in a group between each member
Fig. 5 is shown as the interface schematic diagram in the point and line chart interface side display text frame of group member relationship.
Fig. 6 shows the list interface schematic diagram of the data set for the group that the application is shown in one embodiment.
Fig. 7 is shown as showing the flow chart at the interface of the feature distribution of the data set of the group.
Fig. 8 is shown as the histogram of the feature distribution of registion time of the application in one embodiment in a group and right Than the interface of figure.
Fig. 9 is shown as the application and shows the step flow chart that multiple groups are distributed in the cluster in one embodiment.
Figure 10 is shown as the application and shows multiple groups distribution interface E in the cluster in one embodiment.
Figure 11 is shown as the configuration diagram of the application computer equipment in one embodiment
Figure 12 is shown as the modular structure schematic diagram of group data visualization system provided herein.
Specific implementation mode
Illustrate that presently filed embodiment, those skilled in the art can be by this explanations by particular specific embodiment below Content disclosed by book understands other advantages and effect of the application easily.
In described below, refer to the attached drawing, attached drawing describes several embodiments of the application.It should be appreciated that also can be used Other embodiment, and can be carried out without departing substantially from spirit and scope mechanical composition, structure, electrically with And operational change.Following detailed description should not be considered limiting, and the range of embodiments herein Only limited by the claims for the patent announced.Term used herein is merely to describe specific embodiment, and be not It is intended to limitation the application.
Furthermore as used in herein, singulative " one ", "one" and "the" are intended to also include plural number shape Formula, unless there is opposite instruction in context.It will be further understood that term "comprising", " comprising " show that there are the spies Sign, step, operation, element, component, project, type, and/or group, but it is not excluded for other one or more features, step, behaviour Presence, appearance or the addition of work, element, component, project, type, and/or group.Term "or" used herein and "and/or" quilt It is construed to inclusive, or means any one or any combinations.Therefore, " A, B or C " or " A, B and/or C " mean " with Descend any one:A;B;C;A and B;A and C;B and C;A, B and C ".Only when element, function, step or the combination of operation are in certain sides When inherently mutually exclusive under formula, it just will appear the exception of this definition.
In fraud detection technique, domain expert provides the warp of data classification for the core technology that fraud identifies The demand with classification results accuracy is tested, but the parameter in algorithm framework itself and algorithm is not known to them.Field Expert is examined due to the mode for having no way of classifying to data during being detected when obtaining fraud using fraud detecting system When surveying result, domain expert is other than verifying testing result, the accuracy for judging obtained testing result of having no way of. In order to improve the accuracy of fraud detecting system, the application provides a kind of group number applied to fraud detecting system According to method for visualizing, categorized obtained group and its data set in fraud detecting system are shown in a manner of visual To algorithm expert and domain expert so that different users (such as domain expert or algorithm expert) by a variety of interactive means come Various frauds are explored, and fraud detection algorithm can flexibly be changed according to fraud feature.
The group data method for visualizing is mainly executed by computer equipment.The computer equipment can be following Suitable computer equipment, such as handheld computer device, tablet computer equipment, notebook computer, desktop PC, Server etc..Computer equipment includes display, input unit, the port input/output (I/O), one or more processors, deposits Reservoir, non-volatile memory device, network interface and power supply etc..The various parts may include hardware element (such as core Piece and circuit), software element (such as tangible non-transitory computer-readable medium of store instruction) or hardware element and software The combination of element.In addition, it may be noted that various parts can be combined into less component or be separated into additional component.For example, Memory and non-volatile memory device can be included in single component.The computer equipment can be individually performed described visual Change method, or coordinate with other computer equipments and execute.In some embodiments, computer equipment executes method for visualizing simultaneously Corresponding visualization interface is shown.For example, computer equipment includes processor, display, wherein in the processor Engine (or display engine) is presented in upper execution, and the engine that presents is used to execute the group data method for visualizing and passes through Display is shown, here, the engine that presents includes but not limited to that can parse to be used for boundary based on what program language was developed The software and hardware, such as XML, HTML script, C language etc. that face is shown.In yet other embodiments, a computer Equipment executes method for visualizing and is supplied to another computer equipment to be shown corresponding visualization interface.For example, objective Family end group operates in the request of user and initiates to ask to server-side and log in the server-side, server-side execute method for visualizing with Corresponding interface data is formed, and the interface data is fed back into client, by the browser of client or the application of customization Program shows corresponding diagram according to respective interface data.
The method for visualizing is mainly executed by fraud detecting system.The fraud detecting system may include Software and hardware in one or more computer equipments.It is done as a fraud group to provide a group to domain expert " whether same group of user has identical behavioural habits " that and algorithm expert are proposed.The application is from group A kind of method for visualizing is provided in group internal members' relationship.Referring to Fig. 1, being shown as the group number of the application in one embodiment According to method for visualizing flow chart.As shown, the group data method for visualizing includes the following steps:
In step s 11, the data set of a group is obtained.The data characteristics of the data set includes user information, IP It is one or more in address, event type, event initiation source, event response side and Time To Event.Wherein, the use Family information refers to the information of characterization user identity, for example, User ID, unique user's pet name, certificate number etc..User's letter Breath further includes:Phone number, mailbox, ID number, gender, user equipment used by a user number, registion time etc..The IP Location indicates the IP address of computer equipment corresponding when same user information generates event in a network.The event type is It is recorded in the type that user behavior event is indicated in network operation daily record comprising but be not limited to:It is carried out between the network user It the Social behaviors such as pays close attention to, thumb up, commenting on, presenting and (being either referred to as to give a present) or the network user logs in, publishes, updates At least one of operation behaviors such as state, registration, modification information.Same user information can correspond at least one event type, Each event type corresponds to event and initiates source, event response side and Time To Event.For example, same user information can correspond to it is more It is a to thumb up event type, it each thumbs up event type and corresponds to respective event initiation source, event response side and Time To Event.
In certain embodiments, determine that the mode of a group is described below, referring to Fig. 2, being shown as the institute of the application A kind of flow chart of one group data collection of acquisition of embodiment is provided, as shown, the step S11 further comprises:
Step S111 obtains the operation log that cluster is made of multiple network users;In various embodiments, the collection Group is a cluster of the all-network user composition that can be got, the network user in the cluster from same website or The different website of person also or from different Internet channels, for example can be internet, one or more intranets, local Net (LAN), wide area network (WLAN), storage area network (SAN) etc. or its is appropriately combined, can also be the mobile communication of mobile phone Network etc..
Step S112 determines at least one data characteristics from the operation log of the multiple network user, and analyzes institute The similarity of at least one set of data characteristics in operation log is stated with the determination group;In the particular embodiment, for network Fraud will necessarily leave the characteristics of user is using data in a network, collected in fraud detecting system and come from least one The operation log of multiple network users of a website, by analyzing the similar of at least one data characteristics in the operation log Degree, the user to generating corresponding operating daily record are grouped, and obtain the data set of group and group in operation log.
Step S113 obtains the data set of the group.In some embodiments, the data set can be obtained from a storage The database of You Ge groups and its data set, the database are for example configured in the storage server of a distal end, or configuration In storage device in local computer equipment, then the data set of an acquired group can be grasped based on the input of user Work is extracted from database and is obtained.For example, the fraud detecting system obtains multiple groups using unsupervised detection algorithm Group, user select one of group by selection interface, then obtain the data set of relevant groups.
Specifically, the fraud detecting system is first to all data in operation log in the phase of same class data characteristics It is calculated like degree, wherein the similarity available information entropy is weighed, for example, the fraud detecting system point Not Li Yong user information calculate the comentropy of IP usage amounts or maximum IP usage amount dimensions, utilize event type calculating operation type The comentropy of dimension calculates the comentropy of bad operation dimension using the comentropy or operating time of registion time dimension;By By above-mentioned calculating, recycles unsupervised detection mode to be detected obtained each comentropy and divide to obtain multiple groups Group.Wherein, the unsupervised detection mode citing includes using the algorithm based on dense subgraph or the calculation based on vector space Method etc..Each group that method for visualizing provided herein is presented for reflect shared resource used in fraud, Customer relationship etc., to allow the user using the fraud detecting system more clearly to determine in the unsupervised detection algorithm Classification policy it is whether reasonable.Wherein, the shared resource includes but not limited to shared IP, mailbox etc., and customer relationship includes But it is not limited to:User's concern, interactive relation etc..
In one embodiment, the method for visualizing further includes the steps that at least one group interface of display, the group Group size in group interface is characterized with the geometric figure size shown.Implement one referring to Fig. 3, being shown as the application The interface for including multiple groups shown in example, as shown, 11 groups are shown in interface A, for characterizing those groups Geometric figure be circle, 11 groups are all located in a maximum circle of dotted line, in the circle of dotted line, such as described Circle of dotted line is used for characterizing cluster be made of N number of network user, such as marked as 0 group is normal group, at one There are 10 groups of different sizes marked as 1-10 in smaller circle of dotted line, the number of members of circular size and group is at just Than, that is, the big group's person's of being expressed as quantity is more, the small group's person's of being expressed as negligible amounts, for another example the group marked as 1-10 For abnormal group.In various embodiments, the geometric figure of the group can be arbitrary shape.The face of geometric figure Color can be randomly provided, or related to the number of members of the quantity of group or group.For example, N kind colors are preset with, the fraud thing Part detecting system randomly corresponds to different colours on the geometric figure for characterizing each group.For another example, the fraud detection System is corresponding in turn to the geometric figure for characterizing each group according to the ascending sequence of number of members according to preset color sequences On.The display interface described in the user's operation and when choosing a geometric figure, the fraud detecting system obtains a group The data set of group.
In a preferred embodiment, can also include display group information at least one group interface of display Information bar, when user selects a group in the group interface, interface side with the side of form or text box Formula shows that the essential information of the group, the essential information are, for example,:Group's coding, number of members, for determining the group The most preferred data characteristics of group, the information such as group attribute (such as normal group or abnormal group).
In step s 12, target signature is determined from the data characteristics.Here, will in order to be based on same data characteristics Incidence relation in a detected group between each user information is showed, and the fraud detecting system can press According at least one data characteristics with incidence relation in the fraud detected as can characterize between group member The target signature of incidence relation.In one embodiment, such as in detection corpse powder analysis, the fraud detecting system is certainly Source and event response side dynamicly are initiated as target signature using IP address, event.Alternatively, the fraud detecting system is based on using The selection operation at family selects at least one data characteristics as target signature.For example, it is target that user, which selects concern event type, Feature, the fraud detecting system will then be built used when the member relation of group based on the concern event type Each data characteristics target signature or user is used as to select IP address for target signature, the fraud detecting system Then it regard each data characteristics used when building the member relation of group based on the IP address as target signature.
In step s 13, the member of the group is associated according to the event type, and in display circle of output It is characterized with point and line chart in face.Wherein, the point and/or line are for characterizing identified target signature.Referring to Fig. 4, its It is shown as generating the incidence relation interface schematic diagram of various event types in a group between each member.As shown, scheming In the embodiment for showing interface B, such as user selects IP address for target signature, and the expression of " Huang " color dot is determined based on user information Member, the member in various event types be only event initiate source;" indigo plant " color dot is indicated based on user information and determination Member, the member are at least event response side in various event types;Between line two members of expression between any two points extremely A kind of event type is generated less;The color person of being expressed as of line IP of used computer equipment when initiating an event type Location (being specially the grouping of IP address).By showing the interface, domain expert can by analyze the point of same color quantity, Accounting relationship between identical IP address (grouping of IP address) proportion, different colours point etc. verifies detected group minute Class is preferably or bad accuracy.For example, as shown in figure 4, the line that " Huang " color dot is drawn far fewer than " indigo plant " color dot and " Huang " color dot Solid color accounting it is high, then illustrate the member represented by " Huang " color dot belong to corpse powder account possibility it is high, simultaneously Member represented by " indigo plant " color dot, which belongs to, employs the possibility of corpse powder account high.Domain expert is by observing shown be somebody's turn to do Quantity, the color etc. of each line for the line that the color at interface midpoint, the distribution of each color point, each point are drawn, are able to verify that through this Group belongs to the accuracy of the group result of fraud group.
In certain embodiments, the target signature for describing group member relationship may be one, other data are special Sign is rendered on as supplemental characteristic in display interface.Still by taking Fig. 4 as an example, the diversification of IP address is table in shown interface It is the degree of risk for cheating group to levy the group, so, in shown member relation, with the color of point, the shape of point, or The event of event type produced by the combination characterization of the color and shape of point initiates source and event response side;It is formed with line phenon Association between member, and the color of the line and/or shape are used to characterize identified target signature (such as point of IP address Group).Wherein, the color of line can utilize light tone system prompt domain expert to check, when domain expert observes that the color of line especially collects In, then assist to verify with the quantity of the point of a variety of colors and belong in the group member being grouped via fraud detecting system Whether meet design requirement in the accuracy of rogue member.
In other embodiments, in order to more easily shown in above-mentioned group member relationship interface group and at The related data of member, the method for visualizing further include being shown in a manner of text box in the side of display interface midpoint line chart The step of showing at least one of group information, user information, event information and predictive information.
Here, information shown in the text box of the point and line chart side can be opened up based on the selection operation of user Exhibition.For example, referring to Fig. 5, it is shown as illustrating at the interface of the point and line chart interface side display text frame of group member relationship Figure.It, can be first in the text box of right side when the user clicks when one in the point and line chart interface C in interface C as shown in the figure It first shows the user information represented by the point, such as at least one of User ID, gender, plays the part of in generated event type Role's (such as event initiates source or event response side), and by fraud detecting system regrouping prediction prediction result (such as Belong to fraudulent user or belong to normal users).
A text box can also be provided in the point and line chart interface C to show that the selected each member's of user is detailed Thin information.For example, as shown in figure 5, showing that the user information of the selected point of user includes in Node_Info text boxes:With Family ID (User_id:A1b2c3d4e5f6g), gender (Sex:Female), mailbox (Mail), user tag or type (Label: ) and registion time (Reg_time anomaly_source:2017-07-15,19:10:29) etc..
In interface as shown in Figure 5, also shows that the attribute information of selected user, i.e., be grouped by fraud detecting system The prediction result of prediction shows Abnormal if belonging to fraudulent user or belongs to normal users Normal.
When member represented by the point produces multiple event types, same user's letter can be shown with multiple text boxes Breath institute's role in each event type.In interface C as shown in Figure 5, the point of user will be characterized in a manner of perpendicular row Information is shown, and the content of display is, for example, User ID (User_id:abcdefg1234567);User's gender (Sex: Female), user tag or type (Label:anomaly_source);Also by characterize user between relationship line information into Row displaying, for example, event type (Event_type:follow);IP address is grouped (IP:123.123.123), in the relationship Event initiates source (Source_id:1234567abcdefg) and event response side (Target_id:7654321gfedcba);Thing Part time of origin (Timestamp:2017-08-07,21:49:05) etc..For example, showing the group in interface shown in fig. 5 The number of group is 4, and the quantity for generating concern event type (follow) is 131, generates the quantity of present event type (like) It is 21, sum is 152 etc..Thereby, all event informations corresponding to same USER_ID (include for example, event type and angle Color information) and predictive information be shown in individual text box.Moreover, group can also be shown in point and line chart interface The statistical information of the event type of member.
In some embodiments, whether user is not only concerned about the relationship of group member, reasonable more concerned with the group distributed, This needs the detailed data feature that user can check in each group and each data characteristics built for classifying group Preferred order.The method for visualizing may include the step of showing the interface of the data set of a group.Shown data set It is shown with list mode, thus displays for a user the details of data characteristics in same group.To improve the group Data set classification accuracy divides based on when shown list can classify according to fraud detecting system in the interface Class priority shows the data characteristics list in a group by column.For example, referring to Fig. 6, display the application is implemented one The list interface schematic diagram of the data set of the group shown in example.In the list interface schematic diagram, shown one It is obtained by the sequence sequence of priority from high to low that the data set of a group, which is according to the similitude of data characteristics,.When first excellent It when data characteristics similitude in first grade is identical, is ranked up according to the data characteristics of the second priority, implementation shown in Fig. 7 In example, the sequence of the priority from high to low is:IP address (segmentation or grouping of IP address), event initiate source (source), event response side (target), event type (event_type) and Time To Event (timestamp). In the present embodiment, the new line (gauge outfit) of table is encoded with the importance of different lines, if the value of a feature more collects In, then this feature is more important.In an embodiment provided by the present application, the fraud detecting system is to pass through meter The comentropy of each feature is calculated to represent this characteristic.If comentropy is lower, it means that consistency is higher.Then institute Fraud detecting system is stated to be ranked up feature according to the incremental sequence of comentropy, it is finally that the list head of low comentropy is suitable It is that sequence leans on prompt family note that certain, under different performances, can also according to the list head in the table that will be shown into Row color rendering, for example finally prompt the attention at the family row to be characterized to be most deep the color rendering of the list head of low comentropy Data characteristics it is mostly important, and so on carry out other data characteristicses that the color rendering row are characterized, and then obtain in figure Shown in data set list interface.The list interface can be undertaken on show multiple group interfaces the step of after or step S13 it Before, then the selection operation of the list interface is selected based on user and is shown.
In certain embodiments, it is whether the further data set for characterizing acquired group can reflect fraud Characteristic, it is also necessary to be shown from other dimensions.For example, network operation data and group data by comparing normal users Collect to further confirm that the accuracy of detected fraud.For this purpose, the method for visualizing further includes:Show the group Data set feature distribution interface the step of.Wherein, the feature distribution interface can be shown with each data type in entirety Distribution in network, the overall network are opposite, for example form a cluster by multiple network users, then can pass through The distribution of some data characteristics in the interface display cluster in some group, referring to Fig. 3, maximum empty in such as Fig. 3 Line circle indicates one and forms cluster by multiple network users, and cluster Zhong You11Ge groups are the group that number is 0-10 respectively, Therefrom a group is selected to be shown into row information.
In some embodiments, the data type that feature distribution interface can be shown is, for example,:It ties up at average operating time interval The comentropy (average operation interval entropy) of degree, the comentropy (IP of IP address usage amount dimension Used amount entropy), the comentropy (sex entropy) of gender dimension, the comentropy (email of Email dimension Entropy), the comentropy (reg time entropy) of registion time dimension, the comentropy of number of operations dimension (operation times entropy), the comentropy (device amount entropy) of number of devices dimension operate class The comentropy (operation type entropy) of type dimension, the maximum amount of comentropy (max that is used by others using IP IP used be used amount entropy) etc..In the embodiment shown in fig. 6, it is with the entropy of registion time dimension It is shown for data characteristics, i.e. Fig. 6 is shown as the comentropy of registion time in a group (registration period) dimension in net Feature distribution in network cluster.In order to which effective ratio is to the spy of the network operation data of acquired group data collection and normal users Distributional difference is levied, referring to Fig. 7, its flow chart for being shown as showing the interface of the feature distribution of the data set of the group, such as Shown in figure, include the following steps:
In step S211, a group is selected, and at least one data are determined from the data set of the group Feature.In one embodiment, group marked as 2 such as in selection Fig. 3, and from the data in the group marked as 2 It concentrates and determines a data characteristics for being user information, for example the user information is registion time.
In step S212, feature of at least one data characteristics of the determination in the group and cluster point is counted Cloth.In the present embodiment, the statistics feature distribution for the data characteristics of registion time in the group, and statistics institute State feature distribution of the data characteristics for registion time in the entire cluster.
In step S213, shows the histogram of the feature distribution and correspond to the histogram in entire cluster histogram In profiles versus figure.In the present embodiment, based on the coding to the data characteristics, the display data for registion time The histogram of feature feature distribution in the group, and the display data characteristics for registion time is in the entire collection The histogram of feature distribution in group.Referring to Fig. 8, being shown as registion time of the application in one embodiment in a group The histogram of feature distribution and the interface of comparison diagram, as shown, in the interface D, figure (a) is shown as selected marked as 2 Group in registion time feature distribution thumbnail, the amplification of the corresponding thumbnail, then be the amplification of lower side in the D of interface Scheme (d), it can be seen from the enlarged drawing in the group, from August 1 day to 31 middle of the month of August, the group member The time for carrying out registration operation concentrates on August 5, August 6 days, August 11 days, August 12 days and August this 5 days on the 16th, and in institute Figure (c) in the D of interface is stated to be characterized as registering the histogram for the Annual distribution that user carries out registration operation in August part in the cluster Figure, from the figure (c) as can be seen that registration user has certain rule in the in one's duty registration distribution of August in the cluster, on boundary (b) is schemed in the D of face is characterized as scheme data characteristics that (d) and figure (c) overlap to be shown as registion time described whole Difference in a cluster and in the group of selection.In order to allow users to know the difference between different characteristic and connection It is to be presented this block diagram in the form of three layers in embodiment provided by the present application, user, which passes through, clicks one of contracting After sketch map, the page will be scrolled into schemes by normalized profiles versus.Certainly, in specific application, the data characteristics Thumbnail there may also be multiple, each represent different data characteristicses.
It in some embodiments, can also be by carrying out color rendering to histogram to distinguish or emphasize some data characteristics Feature distribution or Dynamic Announce (such as the mode flickered) are to distinguish or emphasize certain number in the group and entire cluster According to feature in the group and entire cluster feature distribution.
In some embodiments, described in order to further analyze the difference between multiple groups in a network cluster Group data method for visualizing further includes the steps that the interface of the feature distribution for the data set for showing multiple groups, please refers to Fig. 9 And Figure 10, Fig. 9 are shown as the application and show the step flow chart that multiple groups are distributed in the cluster, Figure 10 in one embodiment It is shown as the application and shows multiple groups distribution interface E in the cluster in one embodiment, as shown, the step packet It includes:
In step S311, multiple groups are determined in the cluster be made of multiple network users, use different shape, figure respectively Mark, label and/or the difference of the multiple group of characterization;In one embodiment, for example, selection Fig. 3 in label 0,1 and 2 3 groups, wherein show marked as 0 group " green " color table, show marked as 1 group " red " color table, marked as 2 Group is shown with " indigo plant " color table.
In step S312, at least one data characteristics is determined from the data set of the multiple group;In the present embodiment In, a data characteristics, such as IP address are determined from the data set of this 3 groups.
In step S313, based between each two network user at least one data characteristics analysis respectively group Relative Entropy as the similarity degree between measuring each two network user;In the present embodiment, it is based on the IP Relative Entropy (the letter of IP usage amount dimensions in 3 groups of adress analysis label 0,1 and 2 between each two network user Cease entropy, IP used amount entropy) as the similarity degree measured between each two network user.For example, adopting It is used as with the method t-SNE of Data Dimensionality Reduction (t- distribution neighborhoods embedded mobile GIS) and with the relative entropy between two users and measures this The index of a little network user's distances.
In step S314, display interface is exported, in the interface, with shape, icon, and/or tag characterization network User characterizes the difference of the multiple group with different colours, and two network users in each group are characterized with the distance of display Between similarity degree.In the present embodiment, it is in interface E as shown in Figure 10, characterizes the network user with dot, " green " color table shows Group marked as 0 shows the group marked as 1 with " red " color table, the group marked as 2 is shown with " indigo plant " color table, wherein uses " indigo plant " Color table show it is shorter marked as the user distance in 2 group, the group form tufted distribution, shown marked as 1 with " red " color table User distance in group is also shorter, which forms tufted distribution, and point of the normal users of random sampling is shown with " green " color table Cloth, farther out, distribution more disperses distance between normal users.Thereby it is believed that a group is if it is dense cluster, Be considered as a fraud group possibility it is bigger.Such as in embodiment shown in Figure 11, this is in the group that " green " color table shows The distribution more disperseed, then it represents that for should " green " colo(u)r group group be normal group, it is therein it is " green " point expression user be also normal User.Opposite, what is shown with " red " color table group (group i.e. marked as 1) and group's (i.e. label for being shown with " indigo plant " color table Group for 2) in being distributed at tufted, then it represents that for should " red " and " indigo plant " colo(u)r group group be exception group, wherein use it is " red " put and The user that " indigo plant " point indicates is abnormal user.In one embodiment, led to using user's interactive of the visualization system Mouse is crossed to suspend to check the specifying information of user and feature value in each group.
In other examples, in the interface of output, for example, shape, icon, and/or tag characterization can also be used The network user, such as shape are the geometric figures such as triangle, rectangle, for example icon is smiling face or face of crying, human skeleton head portrait, Qiang Daotou As etc. icons, such as label word or with the symbol etc. clearly distinguished.
The group data method for visualizing of the application is by by the data set of determined group in fraud detection process It is presented based on the modes such as member relation, type distribution, tabulation in group, realizes institute during detecting fraud The data characteristics of point group is shown with a variety of relationship interfaces, is conducive to domain expert and algorithm expert and is detected to fraud The detection algorithm of system is assessed and is revised.
The application also provides a kind of computer equipment, and the computer equipment can be following suitable computer equipment, Such as handheld computer device, tablet computer equipment, notebook computer, desktop PC, server etc..Computer is set Standby includes display, input unit, the port input/output (I/O), one or more processors, memory, non-volatile memories Equipment, network interface and power supply etc..The various parts may include hardware element (such as chip and circuit), software member The combination of part (such as tangible non-transitory computer-readable medium of store instruction) or hardware element and software element.In addition, It may be noted that various parts can be combined into less component or be separated into additional component.For example, memory and non-volatile Storage device can be included in single component.The computer equipment can be individually performed the method for visualizing, or and other Computer equipment cooperation executes.
1 is please referred to Fig.1, the configuration diagram of the application computer equipment in one embodiment is shown as, as shown, In present embodiment, the computer equipment 1 include one or more processors 11 and what is executed on the processor 1 be in Existing engine 12, to execute above-mentioned method for visualizing and be shown corresponding visualization interface.For example, computer equipment packet Containing processor 11, display and the presentation engine 12 executed on the processor 11, wherein held on the processor 11 Capable presentation engine (or display engine), the group data that engine 12 is presented for executing described in above-described embodiment are visual Change method simultaneously shown by display, execute the description of the implementation process of the group data method for visualizing refering to for The description of Fig. 1 to Figure 10.Under specific implementation state, the presentation engine is, for example, to be stored in local computer device On memory or in remote storage server, the presentation engine includes but not limited to that can parse to develop based on program language The software and hardware, such as XML, HTML script, C language etc. for interface display.In yet other embodiments, one Platform computer equipment executes method for visualizing and is supplied to another computer equipment to be shown corresponding visualization interface. It initiates to ask and log in the server-side to server-side for example, request of the client based on user is operated, server-side executes visual The interface data is fed back to client by change method to form corresponding interface data, by the browser of client or fixed The application program of system shows corresponding diagram according to respective interface data.
The application also provides a kind of client, and the client passes through one server-side of network connection, in the present embodiment, institute It is, for example, web client to state client, and the client is, for example, web services end, and the web client is based on sending web industry Business request executes the group data method for visualizing described in above-described embodiment and passes through display to log in the web services end It is shown, executes the description of the implementation process of the group data method for visualizing refering to the description for Fig. 1 to Figure 10.
The application also provides a kind of server, by one client of network connection, in the present embodiment, the client example Such as it is web client, the client is, for example, web services end, and the web server executes request based on web client Operation sends the group data method for visualizing executed described in above-described embodiment to the client and is given by display It has been shown that, executes the description of the implementation process of the group data method for visualizing refering to the description for Fig. 1 to Figure 10.
The application also provides a kind of browser, by one server-side of network connection, the browser be based on sending request with It logs in the server-side to execute the group data method for visualizing described in above-described embodiment and shown by display, hold The description of the implementation process of the row group data method for visualizing is refering to the description for Fig. 1 to Figure 10.In the present embodiment, The browser is, for example, web browser, including but not limited to QQ browsers, Internet Explorer browsers, Firefox browser, Safari browsers, Opera browsers, Google Chrome browsers, baidu browser, search dog are clear Look at device, cheetah browser, 360 browsers, UC browsers, proud trip browser, Window on the World browser etc..
The application also provides a kind of group data visualization system, the group data visualization system may include one or Software and hardware in multiple computer equipments, and the data set for the group that fraud detecting system is detected carries out visually Change.Do what what and algorithm expert were proposed as a fraud group to provide group one by one to domain expert " whether same group of user has identical behavioural habits ".The application provides a kind of group data from group member relationship Visualization system.2 are please referred to Fig.1, the modular structure signal of group data visualization system provided herein is shown as Figure.As shown, the group data visualization system 3 includes acquisition module 31, processing module 32 and display module 33.
Wherein, the acquisition module 31 is used to obtain the data set of a group.The data characteristics of the data set includes User information, IP address, event type, event are initiated one or more in source, event response side and Time To Event. Wherein, the user information refers to the information of characterization user identity, for example, User ID, unique user's pet name, certificate number Deng.The user information further includes:When phone number, mailbox, ID number, gender, user equipment used by a user number, registration Between etc..The IP address indicates the IP address of computer equipment corresponding when same user information generates event in a network. The event type is recorded on the type that user behavior event is indicated in network operation daily record comprising but be not limited to:Network The concern that is carried out between user the Social behaviors such as thumbs up, comments on, presenting and (being either referred to as to give a present) or the network user carries out Log in, publish, more new state, registration, at least one of operation behaviors such as modification information.Same user information can correspond at least One event type, each event type correspond to event and initiate source, event response side and Time To Event.For example, same use Family information can correspond to it is multiple thumb up event type, each thumb up event type correspond to respective event initiate source, event response side and Time To Event.
In certain embodiments, determine that the mode of a group is described below, referring to Fig. 2, being shown as the institute of the application A kind of flow chart of one group data collection of acquisition of embodiment is provided, as shown, the acquisition module 31 it is executable with Lower step S111-S113:
Step S111 obtains the operation log that cluster is made of multiple network users;In various embodiments, the collection Group is a cluster of the all-network user composition that can be got, the network user in the cluster from same website or The different website of person also or from different Internet channels, for example can be internet, one or more intranets, local Net (LAN), wide area network (WLAN), storage area network (SAN) etc. or its is appropriately combined, can also be the mobile communication of mobile phone Network etc..
Step S112 determines at least one data characteristics from the operation log of the multiple network user, and analyzes institute The similarity of at least one set of data characteristics in operation log is stated with the determination group;In the particular embodiment, for network Fraud will necessarily leave the characteristics of user is using data in a network, collected in acquisition module 31 and come from least one website Multiple network users operation log, by analyzing the similarity of at least one data characteristics in the operation log, to production The user of raw corresponding operating daily record is grouped, and obtains the data set of group and group in operation log.
Step S113 obtains the data set of the group.In some embodiments, the data set can be obtained from a storage The database of You Ge groups and its data set, the database are for example configured in the storage server of a distal end, or configuration In storage device in local computer equipment, then the data set of an acquired group can be grasped based on the input of user Work is extracted from database and is obtained.For example, the acquisition module 31 obtains multiple groups, user using unsupervised detection algorithm One of group is selected by selection interface, then obtains the data set of relevant groups.
Specifically, the acquisition module 31 first to all data in operation log same class data characteristics similarity into Row calculates, wherein the similarity available information entropy is weighed, for example, the acquisition module 31 is utilized respectively user's letter Breath calculates the comentropy of IP usage amounts or maximum IP usage amount dimensions, utilizes the information of event type calculating operation type dimension Entropy calculates the comentropy of bad operation dimension using the comentropy or operating time of registion time dimension;By above-mentioned meter It calculates, recycles unsupervised detection mode to be detected obtained each comentropy and divide to obtain multiple groups.Wherein, described Unsupervised detection mode citing includes using the algorithm based on dense subgraph or the algorithm etc. based on vector space.The application Each group that the method for visualizing provided is presented for reflecting shared resource, customer relationship etc. used in fraud, To allow the user using the acquisition module 31 more clearly to determine whether the classification policy in the unsupervised detection algorithm closes Reason.Wherein, the shared resource includes but not limited to shared IP, mailbox etc., and customer relationship includes but not limited to:User is closed Note, interactive relation etc..
In one embodiment, the method for visualizing further includes the steps that at least one group interface of display, the group Group size in group interface is characterized with the geometric figure size shown.Implement one referring to Fig. 3, being shown as the application The interface for including multiple groups shown in example, as shown, 11 groups are shown in interface, for characterizing those groups Geometric figure is circle, and 11 groups are all located in a maximum circle of dotted line, in the circle of dotted line, such as the void Line circle is used for characterizing cluster be made of N number of network user, such as marked as 0 group is normal group, one compared with There are 10 groups of different sizes marked as 1-10 in small circle of dotted line, circular size is directly proportional to the number of members of group, That is, the big group's person's of being expressed as quantity is more, the small group's person's of being expressed as negligible amounts are for another example different marked as the group of 1-10 Normal group.In various embodiments, the geometric figure of the group can be arbitrary shape.The color of geometric figure can It is randomly provided, or related to the number of members of the quantity of group or group.For example, being preset with N kind colors, the acquisition module 31 Randomly different colours are corresponded on the geometric figure for characterizing each group.For another example, the acquisition module 31 is according to preset face Color sequence, is corresponding in turn to according to the ascending sequence of number of members on the geometric figure for characterizing each group.When user's operation institute When stating display interface and choosing a geometric figure, the acquisition module 31 obtains the data set of a group.
In a preferred embodiment, can also include display group information at least one group interface of display Information bar, when user selects a group in the group interface, interface side with the side of form or text box Formula shows that the essential information of the group, the essential information are, for example,:Group's coding, number of members, for determining the group The most preferred data characteristics of group, the information such as group attribute (such as normal group or abnormal group).
Processing module 32 is used to determine target signature from the data characteristics.Here, in order to be based on same data characteristics Incidence relation between each user information in a detected group is showed, the processing module 32 can be according to institute At least one data characteristics with incidence relation is as the association between capable of characterizing group member in the fraud of detection The target signature of relationship.For example, in detecting corpse powder event, the processing module 32 is automatically initiated with IP address, event Source and event response side are target signature.Alternatively, selection operation of the processing module 32 based on user selects at least one number According to feature as target signature.For example, it is target signature that user, which selects concern event type, the processing module 32 will be then based on The concern event type and build each data characteristics used when the member relation of group be used as target signature.
Display module 33 is used to show point and line chart by display interface;Wherein, the point and/or line are determined for characterizing Target signature.Referring to Fig. 4, its incidence relation circle for being shown as generating various event types in a group between each member Face schematic diagram.As shown, " Huang " color dot indicates based on user information the member of determination, it is only in various event types Event initiates source;" indigo plant " color dot indicates based on user information the member of determination, and event is at least in various event types Responder also can and initiate source for event;Line between any two points indicates at least to generate a kind of event type between two members; The IP address of the color person of being expressed as of line used computer equipment when initiating an event type.By showing the boundary Face, accounting between quantity, identical IP address proportion, different colours point that domain expert can be by analyzing the point of same color Detected group's classification is verified than relationship etc. preferably or bad accuracy.For example, as shown in figure 4, " Huang " color dot far fewer than The solid color accounting for the line that " indigo plant " color dot and " Huang " color dot are drawn is high, then illustrates that the member represented by " Huang " color dot belongs to High in the possibility of corpse powder account, the member represented by " indigo plant " color dot belongs to the possibility pole for employing corpse powder account simultaneously It is high.The number for the line that domain expert is drawn by the shown color at the interface midpoint of observation, the distribution of each color point, each point Amount, the color etc. of each line are able to verify that the accuracy for the group result for belonging to fraud group through the group.
In certain embodiments, the target signature for describing group member relationship may be one, other data are special Sign is rendered on as supplemental characteristic in display interface.Still by taking Fig. 4 as an example, the diversification of IP address is table in shown interface It is the degree of risk for cheating group to levy the group, so, in shown member relation, with the color and/or shape characterization of point The event of produced event type initiates source and event response side;The association between group member, and the line are characterized with line Color and/or shape are for characterizing identified target signature (i.e. IP address).Wherein, the color of line can utilize light tone system to prompt Domain expert checks, when domain expert observes that the color of line is especially concentrated, then assists with the quantity of the point of a variety of colors, can be with Whether the accuracy for belonging to rogue member in the group member that verification is grouped via fraud detecting system meets design requirement.
In other embodiments, in order to more easily shown in above-mentioned group member relationship interface group and at The related data of member, the display module 33 are additionally operable to show in a manner of text box in the side of display interface midpoint line chart Show at least one of group information, user information, event information and predictive information.
Here, information shown in the text box of the point and line chart side can be opened up based on the selection operation of user Exhibition.For example, referring to Fig. 5, it is shown as illustrating at the interface of the point and line chart interface side display text frame of group member relationship Figure.As shown, when the user clicks one in the point and line chart interface C when, can show this first in the text box of right side The represented user information of point, such as at least one of User ID, gender, the role in generated event type (such as event initiates source or event response side), and (such as belonged to and taken advantage of by the prediction result of fraud detecting system regrouping prediction Swindleness user belongs to normal users).
A text box can also be provided in the point and line chart interface C to show that the selected each member's of user is detailed Thin information.For example, as shown in figure 5, showing that the user information of the selected point of user includes in Node_Info text boxes:With Family ID (User_id:A1b2c3d4e5f6g), gender (Sex:Female), mailbox (Mail), user tag or type (Label: ) and registion time (Reg_time anomaly_source:2017-07-15,19:10:29) etc..
In interface as shown in Figure 5, also shows that the attribute information of selected user, i.e., be grouped by fraud detecting system The prediction result of prediction shows Abnormal if belonging to fraudulent user or belongs to normal users Normal.
When member represented by the point produces multiple event types, same user's letter can be shown with multiple text boxes Breath institute's role in each event type.In interface C as shown in Figure 5, the point of user will be characterized in a manner of perpendicular row Information is shown, and the content of display is, for example, User ID (User_id:abcdefg1234567);User's gender (Sex: Female), user tag or type (Label:anomaly_source);Also by characterize user between relationship line information into Row displaying, for example, event type (Event_type:follow);IP address is grouped (IP:123.123.123), in the relationship Event initiates source (Source_id:1234567abcdefg) and event response side (Target_id:7654321gfedcba);Thing Part time of origin (Timestamp:2017-08-07,21:49:05) etc..For example, showing the group in interface shown in fig. 5 The number of group is 4, and the quantity for generating concern event type (follow) is 131, generates the quantity of present event type (like) It is 21, sum is 152 etc..Thereby, all event informations corresponding to same USER_ID (include for example, event type and angle Color information) and predictive information be shown in individual text box.Moreover, group can also be shown in point and line chart interface The statistical information of the event type of member.
In some embodiments, whether user is not only concerned about the relationship of group member, reasonable more concerned with the group distributed, This needs the detailed data feature that user can check in each group and each data characteristics built for classifying group Preferred order.The method for visualizing may include the step of showing the interface of the data set of a group.Shown data set It is shown with list mode, thus displays for a user the details of data characteristics in same group.To improve the group Data set classification accuracy divides based on when shown list can classify according to fraud detecting system in the interface Class priority shows the data characteristics list in a group by column.For example, referring to Fig. 6, display the application is implemented one The list interface schematic diagram of the data set of the group shown in example.In the list interface schematic diagram, shown one It is obtained by the sequence sequence of priority from high to low that the data set of a group, which is according to the similitude of data characteristics,.When first excellent It when data characteristics similitude in first grade is identical, is ranked up according to the data characteristics of the second priority, implementation shown in Fig. 7 In example, the sequence of the priority from high to low is:IP address, event initiate source (source), event response side (target), Event type (event_type) and Time To Event (timestamp).In the present embodiment, by the new line of table (gauge outfit) It is encoded with the importance of different lines, if as soon as the value of feature is more concentrated, then this feature is more important.In this Shen In the embodiment that please be provided, the fraud detecting system is to represent this spy by calculating the comentropy of each feature Property.If comentropy is lower, it means that consistency is higher.Then the processing module 32 passs feature according to comentropy The sequence of increasing is ranked up, that the list head front of low comentropy is finally prompted family note that certain, different implementation In the case of, display module 33 can also carry out color rendering according to the list head in the table that will be shown, such as finally by low letter Cease the color rendering of the list head of entropy prompts the data characteristics that the attention at the family row are characterized mostly important to be most deep, with such Other data characteristicses for promoting the row color rendering row to be characterized, and then obtain data set list interface shown in figure.The row Before surface and interface can be undertaken on after the multiple group interfaces of display or display module shows point and line chart, then selected based on user It selects the selection operation of the list interface and shows.
In certain embodiments, it is whether the further data set for characterizing acquired group can reflect fraud Characteristic, it is also necessary to be shown from other dimensions.For example, network operation data and group data by comparing normal users Collect to further confirm that the accuracy of detected fraud.For this purpose, the display module 33 is additionally operable to show the group Data set feature distribution interface.Wherein, the feature distribution interface can be shown with each data type in overall network Distribution, the overall network is opposite, for example forms a cluster by multiple network users, then can be aobvious by interface Show the distribution of some data characteristics in the cluster in some group, referring to Fig. 3, maximum circle of dotted line table in such as Fig. 3 Show that one forms cluster by multiple network users, cluster Zhong You11Ge groups are the group that number is 0-10, Cong Zhongxuan respectively A group is selected to show into row information.
In some embodiments, the data type that feature distribution interface can be shown is, for example,:It ties up at average operating time interval The comentropy (average operation interval entropy) of degree, the comentropy (IP of IP address usage amount dimension Used amount entropy), the comentropy (sex entropy) of gender dimension, the comentropy (email of Email dimension Entropy), the comentropy (reg time entropy) of registion time dimension, the comentropy of number of operations dimension (operation times entropy), the comentropy (device amount entropy) of number of devices dimension operate class The comentropy (operation type entropy) of type dimension, the maximum amount of comentropy (max that is used by others using IP IP used be used amount entropy) etc..In the embodiment shown in fig. 6, it is with the entropy of registion time dimension It is shown for data characteristics, i.e. Fig. 6 is shown as the comentropy of registion time in a group (registration period) dimension in net Feature distribution in network cluster.In order to which effective ratio is to the spy of the network operation data of acquired group data collection and normal users Distributional difference is levied, referring to Fig. 7, its flow chart for being shown as showing the interface of the feature distribution of the data set of the group, such as Shown in figure, processing module 33 executes following steps:
In step S211, a group is selected, and at least one data are determined from the data set of the group Feature.In one embodiment, group marked as 2 such as in selection Fig. 3, and from the data in the group marked as 2 It concentrates and determines a data characteristics for being user information, for example the user information is registion time.
In step S212, feature of at least one data characteristics of the determination in the group and cluster point is counted Cloth.In the present embodiment, the statistics feature distribution for the data characteristics of registion time in the group, and statistics institute State feature distribution of the data characteristics for registion time in the entire cluster.
In step S213, shows the histogram of the feature distribution and correspond to the histogram in entire cluster histogram In profiles versus figure.In the present embodiment, based on the coding to the data characteristics, the display data for registion time The histogram of feature feature distribution in the group, and the display data characteristics for registion time is in the entire collection The histogram of feature distribution in group.Referring to Fig. 8, being shown as registion time of the application in one embodiment in a group The histogram of feature distribution and the interface of comparison diagram, as shown, in the interface D, figure (a) is shown as selected marked as 2 Group in registion time feature distribution thumbnail, the amplification of the corresponding thumbnail, then be the amplification of lower side in the D of interface Scheme (d), it can be seen from the enlarged drawing in the group, from August 1 day to 31 middle of the month of August, the group member The time for carrying out registration operation concentrates on August 5, August 6 days, August 11 days, August 12 days and August this 5 days on the 16th, and in institute Figure (c) in the D of interface is stated to be characterized as registering the histogram for the Annual distribution that user carries out registration operation in August part in the cluster Figure, from the figure (c) as can be seen that registration user has certain rule in the in one's duty registration distribution of August in the cluster, on boundary (b) is schemed in the D of face is characterized as scheme data characteristics that (d) and figure (c) overlap to be shown as registion time described whole Difference in a cluster and in the group of selection.In order to allow users to know the difference between different characteristic and connection It is to be presented this block diagram in the form of three layers in embodiment provided by the present application, user, which passes through, clicks one of contracting After sketch map, the page will be scrolled into schemes by normalized profiles versus.Certainly, in specific application, the data characteristics Thumbnail there may also be multiple, each represent different data characteristicses.
It in some embodiments, can also be by carrying out color rendering to histogram to distinguish or emphasize some data characteristics Feature distribution or Dynamic Announce (such as the mode flickered) are to distinguish or emphasize certain number in the group and entire cluster According to feature in the group and entire cluster feature distribution.
In some embodiments, described in order to further analyze the difference between multiple groups in a network cluster Display module also shows the interface of the feature distribution of the data set of multiple groups, please refers to Fig. 9 and Figure 10, and Fig. 9 is shown as this Shen Please show that the step flow chart that multiple groups are distributed in the cluster, Figure 10 are shown as the application in a reality in one embodiment It applies and shows multiple groups distribution interface E in the cluster in example, as shown, the step includes:
In step S311, multiple groups are determined in the cluster be made of multiple network users, use different shape, figure respectively Mark, label and/or the difference of the multiple group of characterization;In one embodiment, for example, selection Fig. 3 in label 0,1 and 2 3 groups, wherein show marked as 0 group " green " color table, show marked as 1 group " red " color table, marked as 2 Group is shown with " indigo plant " color table.
In step S312, at least one data characteristics is determined from the data set of the multiple group;In the present embodiment In, a data characteristics, such as IP address are determined from the data set of this 3 groups.
In step S313, based between each two network user at least one data characteristics analysis respectively group Relative Entropy as the similarity degree between measuring each two network user;In the present embodiment, it is based on the IP Relative Entropy (the letter of IP usage amount dimensions in 3 groups of adress analysis label 0,1 and 2 between each two network user Cease entropy, IP used amount entropy) as the similarity degree measured between each two network user.For example, adopting It is used as with the method t-SNE of Data Dimensionality Reduction (t- distribution neighborhoods embedded mobile GIS) and with the relative entropy between two users and measures this The index of a little network user's distances.
In step S314, display interface is exported, in the interface, with shape, icon, and/or tag characterization network User characterizes the difference of the multiple group with different colours, and two network users in each group are characterized with the distance of display Between similarity degree.In the present embodiment, it is in interface E as shown in Figure 10, characterizes the network user with dot, " green " color table shows Group marked as 0 shows the group marked as 1 with " red " color table, the group marked as 2 is shown with " indigo plant " color table, wherein uses " indigo plant " Color table show it is shorter marked as the user distance in 2 group, the group form tufted distribution, shown marked as 1 with " red " color table User distance in group is also shorter, which forms tufted distribution, and the normal users for indicating random sampling are shown with " green " color table Distribution, the distance between normal users farther out, distribution more disperse.Thereby it is believed that a group is if it is dense Cluster, be considered as a fraud group possibility it is bigger.Such as in embodiment shown in Figure 11, which shows Group is in the distribution that more disperses, then it represents that for should " green " colo(u)r group group be normal group, the user of " green " point expression therein For normal users.Opposite, what is shown with " red " color table group (group i.e. marked as 1) and the group that is shown with " indigo plant " color table (group i.e. marked as 2) at tufted in being distributed, then it represents that for should " red " and " indigo plant " colo(u)r group group be exception group, wherein use The user that " red " point and " indigo plant " point indicate is abnormal user.In one embodiment, it can be handed over using the user of the visualization system The specifying information of user and feature value in each group are checked to mutual formula by mouse suspension.
In other examples, in the interface of output, for example, shape, icon, and/or tag characterization can also be used The network user, such as shape are the geometric figures such as triangle, rectangle, for example icon is smiling face or face of crying, human skeleton head portrait, Qiang Daotou As etc. icons, such as label word or with the symbol etc. clearly distinguished.
The group data visualization system of the application is by by the data set of determined group in fraud detection process It is presented based on the modes such as member relation, type distribution, tabulation in group, realizes institute during detecting fraud The data characteristics of point group is shown with a variety of relationship interfaces, is conducive to domain expert and algorithm expert and is detected to fraud The detection algorithm of system is assessed and is revised.
It should be noted that all modules in the fraud detecting system can be configured in single computer equipment On.Or each module in the fraud detecting system is arranged, respectively the client of user side and the service of network side On device, and client is connect with server network.For example, the acquisition module and processing module of fraud detecting system are mounted on In server, display module is mounted in client, and the client is based on sending request to log in the server-side, the clothes Business device runs the fraud detecting system based on the operation that the client executing is asked to the client, and passes through visitor Family end shows respective interface.The client includes but not limited to:Configuration is soft in the browser or private client of user terminal The interface of part and hardware etc. for executing display interface program.
It should also be noted that, through the above description of the embodiments, those skilled in the art can be clearly Solving some or all of the application can realize by software and in conjunction with required general hardware platform.Based on such reason Solution, substantially the part that contributes to existing technology can body in the form of software products in other words for the technical solution of the application Reveal and, which may include machine readable Jie of one or more for being stored thereon with machine-executable instruction Matter, these instructions can make when being executed by one or more machines such as computer, computer network or other electronic equipments It obtains the one or more machine and executes operation according to an embodiment of the present application.Machine readable media may include, but be not limited to, soft Disk, CD, CD-ROM (compact-disc-read-only memory), magneto-optic disk, ROM (read-only memory), RAM (random access memory), EPROM (Erasable Programmable Read Only Memory EPROM), EEPROM (electrically erasable programmable read-only memory), magnetic or optical card, sudden strain of a muscle Deposit or suitable for store machine-executable instruction other kinds of medium/machine readable media.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as:Personal computer, service Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, top set Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer including any of the above system or equipment Distributed computing environment etc..
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, group Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage device.
It should be noted that it will be understood by those skilled in the art that above-mentioned members can be programmable logic device, Including:Programmable logic array (Programmable Array Logic, PAL), Universal Array Logic (Generic Array Logic, GAL), field programmable gate array (Field-Programmable Gate Array, FPGA), complex programmable patrol One or more in volume device (Complex Programmable Logic Device, CPLD), the application, which does not do this, to be had Body limits.
In conclusion the application by by the data set of fraud detection process Zhong Suofen groups be based on member relation, The modes such as type distribution, tabulation are presented, realize the data characteristics of Suo Fen groups during detecting fraud with A variety of relationship interfaces are shown, and are conducive to domain expert and algorithm expert and are carried out to the detection algorithm of fraud detecting system Assessment and revision.
The principles and effects of the application are only illustrated in above-described embodiment, not for limitation the application.It is any ripe Know the personage of this technology all can without prejudice to spirit herein and under the scope of, carry out modifications and changes to above-described embodiment.Cause This, those of ordinary skill in the art is complete without departing from spirit disclosed herein and institute under technological thought such as At all equivalent modifications or change, should be covered by claims hereof.

Claims (23)

1. a kind of group data method for visualizing is applied in a fraud detecting system, which is characterized in that including following step Suddenly:
The data set of a group is obtained, the data characteristics of the data set includes user information, IP address, event type, thing Part is initiated one or more in source, event response side and Time To Event;
Target signature is determined from the data characteristics;And
The member of the group is associated according to the event type, and is carried out with point and line chart in the display interface of output Characterization;Wherein, the point and/or line are for characterizing identified target signature.
2. group data method for visualizing according to claim 1, which is characterized in that the color and/or shape of the point Event for characterizing the event type initiates source and event response side;Association between the line characterization group member, institute It states the color of line and/or shape is used to characterize identified target signature.
3. group data method for visualizing according to claim 1, which is characterized in that further include in the display interface The side of point and line chart is shown in group information, user information, event information and predictive information at least in a manner of text box A kind of step.
4. group data method for visualizing according to claim 1, which is characterized in that the data for obtaining a group The step of collection includes:
Obtain the operation log that cluster is made of multiple network users;
At least one data characteristics is determined from the operation log of the multiple network user, and is analyzed in the operation log extremely The similarity of few one group of data characteristics is with the determination group;And
Obtain the data set of the group.
5. group data method for visualizing according to claim 1 or 4, which is characterized in that further include that display is at least one The step of group interface, the group size in the group interface are characterized with the geometric figure size shown.
6. group data method for visualizing according to claim 1 or 4, which is characterized in that further include one group of display Data set interface the step of, the data characteristics of the data set of the group include user information, IP address, event type, Event initiates at least the two data characteristics in source, event response side and Time To Event, on the boundary of the group data collection In face, sequencing display after the group data collection is grouped.
7. group data method for visualizing according to claim 1 or 4, which is characterized in that further include the display group Data set feature distribution interface the step of:
A group is selected, and determines at least one data characteristics from the data set of the group,
Count feature distribution of at least one data characteristics of the determination in the group and cluster;And
Show the profiles versus's figure of the histogram and the corresponding histogram of the feature distribution in entire cluster histogram.
8. group data method for visualizing according to claim 1 or 4, which is characterized in that further include the multiple groups of display Data set feature distribution interface the step of:
Multiple groups are determined in the cluster be made of multiple network users, use different shape, icon, label and/or color respectively Characterize the difference of the multiple group;
At least one data characteristics is determined from the data set of the multiple group;
Based on the Relative Entropy conduct between each two network user at least one data characteristics analysis respectively group Measure the similarity degree between each two network user;And
Display interface is exported, in the interface, with shape, icon, and/or the tag characterization network user, with different colours table The difference for levying the multiple group characterizes the similarity degree in each group between two network users with the distance of display.
9. group data method for visualizing according to claim 1, which is characterized in that the event type includes that network is used The concern at family, thumb up, comment on, presenting, logging in, publishing, more new state, registration, at least one of modification information.
10. a kind of computer equipment, which is characterized in that including:
Processor;
The presentation engine executed on the processor, the presentation engine is for executing as described in claim any one of 1-9 Group data method for visualizing.
11. a kind of group data visualization system, which is characterized in that including:
Acquisition module, for obtaining the data set of a group, the data characteristics of the data set includes user information, IP It is one or more in location, event type, event initiation source, event response side and Time To Event;
Processing module, for from the data characteristics determine target signature, according to the event type by the group at Member is associated;And
Display module shows point and line chart for passing through display interface;Wherein, the point and/or line are for characterizing identified mesh Mark feature.
12. group data visualization system according to claim 11, which is characterized in that the color and/or shape of the point The event that shape is used to characterize the event type initiates source and event response side;Association between the line characterization group member, The color and/or shape of the line are for characterizing identified target signature.
13. group data visualization system according to claim 11, which is characterized in that the display module is additionally operable to lead to Cross the display interface the side of the point and line chart shown in a manner of text box group information, user information, event information, And at least one of predictive information.
14. group data visualization system according to claim 11, which is characterized in that the group is obtained by described The operation log for multiple network users that modulus block obtains, and analyze at least one set in the operation log through the processing module What the similarity of data characteristics determined.
15. group data visualization system according to claim 11, which is characterized in that the display module is additionally operable to show Show at least one group interface, the group size in the group interface is characterized with the geometric figure size shown.
16. group data visualization system according to claim 11, which is characterized in that the display module is additionally operable to show Show the interface of the data set of a group, the data characteristics of the data set of the group includes user information, IP address, event class Type, event initiate at least the two data characteristics in source, event response side and Time To Event, in the group data collection Interface in, the group data collection it is grouped after sequencing display.
17. group data visualization system according to claim 11, which is characterized in that the display module is additionally operable to show Show the interface of the feature distribution of the data set of the group, the histogram of the feature distribution and the corresponding histogram are entire Profiles versus's figure in cluster histogram.
18. group data visualization system according to claim 1, which is characterized in that the display module is additionally operable to show Show and use shape, icon, and/or the tag characterization network user, the difference of the multiple group is characterized with different colours, with display Distance characterizes the interface of the similarity degree between two network users in each group.
19. group data visualization system according to claim 11, which is characterized in that the event type includes network The concern of user, thumb up, comment on, presenting, logging in, publishing, more new state, registration, at least one of modification information.
20. a kind of client passes through one server-side of network connection, which is characterized in that the client is based on sending request to step on Record the step of server-side executes claim 1-9 any one of them group data method for visualizing.
21. a kind of server passes through one client of network connection, which is characterized in that the server is held based on the client The operation of row request, claim 1-9 any one of them group data method for visualizing is sent to the client Process simultaneously shows implementing result by the client.
22. a kind of browser passes through one server-side of network connection, which is characterized in that the browser is based on sending request to step on Record the step of server-side executes claim 1-9 any one of them group data method for visualizing.
23. a kind of computer readable storage medium is stored with data visualization computer program, which is characterized in that the data Visual calculation machine program is performed the step of realizing any one of the claim 1-9 group data method for visualizing.
CN201810022004.8A 2018-01-10 2018-01-10 Group membership data visualization method and system Active CN108280644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810022004.8A CN108280644B (en) 2018-01-10 2018-01-10 Group membership data visualization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810022004.8A CN108280644B (en) 2018-01-10 2018-01-10 Group membership data visualization method and system

Publications (2)

Publication Number Publication Date
CN108280644A true CN108280644A (en) 2018-07-13
CN108280644B CN108280644B (en) 2021-08-03

Family

ID=62803412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810022004.8A Active CN108280644B (en) 2018-01-10 2018-01-10 Group membership data visualization method and system

Country Status (1)

Country Link
CN (1) CN108280644B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968993A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Information processing method and device, storage medium and processor
CN111127026A (en) * 2019-12-13 2020-05-08 深圳中兴飞贷金融科技有限公司 Method, device, storage medium and electronic equipment for determining user fraud behavior
CN112732398A (en) * 2021-02-02 2021-04-30 三盟科技股份有限公司 Big data visualization management method and system based on artificial intelligence
CN113837777A (en) * 2021-09-30 2021-12-24 浙江创邻科技有限公司 Graph database-based anti-fraud management and control method, device, system and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050043961A1 (en) * 2002-09-30 2005-02-24 Michael Torres System and method for identification, detection and investigation of maleficent acts
US20110213788A1 (en) * 2008-03-05 2011-09-01 Quantum Intelligence, Inc. Information fusion for multiple anomaly detection systems
CN103279887A (en) * 2013-04-26 2013-09-04 华东师范大学 Information-theory-based visual analysis method and system for micro-blog spreading
CN104573071A (en) * 2015-01-26 2015-04-29 湖南大学 Intelligent school situation analysis system and method based on megadata technology
CN104915793A (en) * 2015-06-30 2015-09-16 北京西塔网络科技股份有限公司 Public information intelligent analysis platform based on big data analysis and mining
CN107404387A (en) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 The processing method of one species information, device
CN107515927A (en) * 2017-08-24 2017-12-26 深圳市云房网络科技有限公司 A kind of real estate user behavioural analysis platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050043961A1 (en) * 2002-09-30 2005-02-24 Michael Torres System and method for identification, detection and investigation of maleficent acts
US20110213788A1 (en) * 2008-03-05 2011-09-01 Quantum Intelligence, Inc. Information fusion for multiple anomaly detection systems
CN103279887A (en) * 2013-04-26 2013-09-04 华东师范大学 Information-theory-based visual analysis method and system for micro-blog spreading
CN104573071A (en) * 2015-01-26 2015-04-29 湖南大学 Intelligent school situation analysis system and method based on megadata technology
CN104915793A (en) * 2015-06-30 2015-09-16 北京西塔网络科技股份有限公司 Public information intelligent analysis platform based on big data analysis and mining
CN107404387A (en) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 The processing method of one species information, device
CN107515927A (en) * 2017-08-24 2017-12-26 深圳市云房网络科技有限公司 A kind of real estate user behavioural analysis platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
童新安 等: ""可视化数据挖掘在信贷欺诈检测中的应用"", 《宜春学院学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968993A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Information processing method and device, storage medium and processor
CN111127026A (en) * 2019-12-13 2020-05-08 深圳中兴飞贷金融科技有限公司 Method, device, storage medium and electronic equipment for determining user fraud behavior
CN112732398A (en) * 2021-02-02 2021-04-30 三盟科技股份有限公司 Big data visualization management method and system based on artificial intelligence
CN113837777A (en) * 2021-09-30 2021-12-24 浙江创邻科技有限公司 Graph database-based anti-fraud management and control method, device, system and storage medium
CN113837777B (en) * 2021-09-30 2024-02-20 浙江创邻科技有限公司 Anti-fraud management and control method, device and system based on graph database and storage medium

Also Published As

Publication number Publication date
CN108280644B (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN108268624A (en) User data method for visualizing and system
CN108170830A (en) Group event data visualization method and system
CN108280644A (en) Group member relation data method for visualizing and system
Zhang et al. Visual analytics for the big data era—A comparative review of state-of-the-art commercial systems
JP6294180B2 (en) Improvement effect evaluation support apparatus and method
CN107944745B (en) Risk information evaluation method and system
CN109993233A (en) Based on machine learning come the method and system of prediction data audit target
CN106708729B (en) The prediction technique and device of aacode defect
CN112380454A (en) Training course recommendation method, device, equipment and medium
CN115271957A (en) Financial risk analysis and evaluation system and method based on cloud computing
CN111915381A (en) Method and device for detecting cheating behaviors, electronic equipment and storage medium
CN107958346A (en) The recognition methods of abnormal behaviour and device
CN107844911A (en) Performance report using network door to products & services
Werner Materiality Maps: Process Mining Data Visualization for Financial Audits
CN105488061B (en) A kind of method and device of verify data validity
CN112631889A (en) Portrayal method, device and equipment for application system and readable storage medium
CN110619564B (en) Anti-fraud feature generation method and device
CN108510007A (en) A kind of webpage tamper detection method, device, electronic equipment and storage medium
CN114742412A (en) Software technology service system and method
US20210350134A1 (en) Generating event logs from video streams
US20210349941A1 (en) Assigning case identifiers to video streams
Bharathy et al. Applications of social systems modeling to political risk management
CN114155096A (en) Method for bank to detect illegal fund transfer of network gambling based on three-part graph
CN112651433A (en) Abnormal behavior analysis method for privileged account
KR101364768B1 (en) Financial account transaction pattern analysis system and a method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20181016

Address after: 100084 10 floor 1009-1, 3 building, 1 Zhongguancun East Road, Haidian District, Beijing.

Applicant after: Hua Ching Qing Chiao information technology (Beijing) Co., Ltd.

Address before: 100084 Tsinghua Yuan, Beijing, Haidian District

Applicant before: Tsinghua University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant