CN108280644A - Group member relation data method for visualizing and system - Google Patents
Group member relation data method for visualizing and system Download PDFInfo
- Publication number
- CN108280644A CN108280644A CN201810022004.8A CN201810022004A CN108280644A CN 108280644 A CN108280644 A CN 108280644A CN 201810022004 A CN201810022004 A CN 201810022004A CN 108280644 A CN108280644 A CN 108280644A
- Authority
- CN
- China
- Prior art keywords
- group
- data
- event
- interface
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/382—Payment protocols; Details thereof insuring higher security of transaction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Computational Linguistics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
A kind of group member relation data method for visualizing of the application offer and system.Wherein, the method for visualizing is applied in a fraud detecting system, includes the following steps:The data set of a group is obtained, the data characteristics of the data set includes one or more in user information, IP address, event type, event initiation source, event response side and Time To Event;Target signature is determined from the data characteristics;And be associated the member of the group according to the event type, and characterized with point and line chart in the display interface of output;Wherein, the point and/or line are for characterizing identified target signature.The application is presented by the way that the data set of fraud detection process Zhong Suofen groups is based on the modes such as member relation, it realizes the data characteristics of Suo Fen groups during detecting fraud to be shown with a variety of relationship interfaces, is conducive to domain expert and algorithm expert and the detection algorithm of fraud detecting system is assessed and revised.
Description
Technical field
This application involves computer processing technical fields, more particularly to a kind of group member relation data method for visualizing
And system.
Background technology
Online fraud has been current internet dark aspect known to everybody, it all can worldwide be caused every year
Immeasurable loss.2015, million ranks that net crime complaint center has been connected in worldwide about taking advantage of
The complaint of swindleness problem, and cheat on the net it is annual also can worldwide cause tens economic loss, fraudulent user is usual
For can from help promote some specific commodity, or spread junk information in receive remuneration.In internet finance, fraud is used
The credit card that family is applied for loan, stolen with them using false identity buys commodity, even carries out the unlawful activities such as money laundering.Cause
This, in internet business scenario, finding suitable anti-fraud algorithm becomes more crucial, this demand is also growing day by day.
Although nowadays having many methods to identify the fraud on internet, by constructed fraud detecting system
Limitation, the credible of the data of the corresponding fraud suspect filtered out needs follow-up a large amount of manpower verification, for example, platform
Supervisor need to investigate verification one by one.This so that the revision of such as algorithm parameter, data characteristics are excellent in fraud detecting system
Design, algorithm model selection of first grade etc., not only need the Software for Design of algorithm expert, with greater need for the participation of domain expert.Cause
This, fraud Detection accuracy can be efficiently modified by improving the transparency of fraud recognizer, how to realize the visual of data
Turn to this field urgent problem to be solved.
Invention content
In view of the foregoing deficiencies of prior art, a kind of group member relation data of being designed to provide of the application can
Depending on changing method and system, for solving the problems, such as that fraud recognizer is visual in the prior art.
In order to achieve the above objects and other related objects, the first aspect of the application provides a kind of group data visualization side
Method is applied in a fraud detecting system, includes the following steps:The data set of a group is obtained, the data set
Data characteristics includes in user information, IP address, event type, event initiation source, event response side and Time To Event
It is one or more;Target signature is determined from the data characteristics;And according to the event type by the member of the group
It is associated, and is characterized with point and line chart in the display interface of output;Wherein, the point and/or line are true for characterizing
Fixed target signature.
The second aspect of the application also provides a kind of computer equipment, including:Processor;It executes on the processor
Engine is presented, the engine that presents is used to execute the group data method for visualizing.
The third aspect of the application also provides a kind of group data visualization system, including:Acquisition module, for obtaining one
The data characteristics of the data set of a group, the data set includes user information, IP address, event type, event initiation source, thing
It is one or more in part responder and Time To Event;Processing module, for determining that target is special from the data characteristics
The member of the group, is associated by sign according to the event type;And display module, for being shown by display interface
Point and line chart;Wherein, the point and/or line are for characterizing identified target signature.
The application provides a kind of client in fourth aspect, and by one server-side of network connection, the client is based on hair
Request is sent to log in the step of server-side executes the group data method for visualizing.
The application provides a kind of server at the 5th aspect, and by one client of network connection, the server is based on institute
The operation for stating client executing request sends the process of the group data method for visualizing and by described to the client
Client shows implementing result.
The application provides a kind of browser at the 6th aspect, and by one server-side of network connection, the browser is based on hair
Request is sent to log in the step of server-side executes the group data method for visualizing.
The application provides a kind of computer readable storage medium at the 7th aspect, is stored with data visualization computer journey
Sequence, which is characterized in that the data visualization computer program is performed the step for realizing the group data method for visualizing
Suddenly.
As described above, the group member relation data method for visualizing and system of the application, have the advantages that:This
Application by the data set of fraud detection process Zhong Suofen groups by being based on member relation, type distribution, tabulation etc.
Mode is presented, and is realized the data characteristics of Suo Fen groups during detecting fraud and is opened up with a variety of relationship interfaces
Show, is conducive to domain expert and algorithm expert and the detection algorithm of fraud detecting system is assessed and revised.
Description of the drawings
Fig. 1 is shown as the application and is shown as the group data method for visualizing flow chart of the application in one embodiment.
What Fig. 2 was shown as the application provides a kind of flow chart of one group data collection of acquisition of embodiment.
Fig. 3 is shown as the interface for including multiple groups that the application is shown in one embodiment.
Fig. 4 is shown as generating the incidence relation interface schematic diagram of various event types in a group between each member
Fig. 5 is shown as the interface schematic diagram in the point and line chart interface side display text frame of group member relationship.
Fig. 6 shows the list interface schematic diagram of the data set for the group that the application is shown in one embodiment.
Fig. 7 is shown as showing the flow chart at the interface of the feature distribution of the data set of the group.
Fig. 8 is shown as the histogram of the feature distribution of registion time of the application in one embodiment in a group and right
Than the interface of figure.
Fig. 9 is shown as the application and shows the step flow chart that multiple groups are distributed in the cluster in one embodiment.
Figure 10 is shown as the application and shows multiple groups distribution interface E in the cluster in one embodiment.
Figure 11 is shown as the configuration diagram of the application computer equipment in one embodiment
Figure 12 is shown as the modular structure schematic diagram of group data visualization system provided herein.
Specific implementation mode
Illustrate that presently filed embodiment, those skilled in the art can be by this explanations by particular specific embodiment below
Content disclosed by book understands other advantages and effect of the application easily.
In described below, refer to the attached drawing, attached drawing describes several embodiments of the application.It should be appreciated that also can be used
Other embodiment, and can be carried out without departing substantially from spirit and scope mechanical composition, structure, electrically with
And operational change.Following detailed description should not be considered limiting, and the range of embodiments herein
Only limited by the claims for the patent announced.Term used herein is merely to describe specific embodiment, and be not
It is intended to limitation the application.
Furthermore as used in herein, singulative " one ", "one" and "the" are intended to also include plural number shape
Formula, unless there is opposite instruction in context.It will be further understood that term "comprising", " comprising " show that there are the spies
Sign, step, operation, element, component, project, type, and/or group, but it is not excluded for other one or more features, step, behaviour
Presence, appearance or the addition of work, element, component, project, type, and/or group.Term "or" used herein and "and/or" quilt
It is construed to inclusive, or means any one or any combinations.Therefore, " A, B or C " or " A, B and/or C " mean " with
Descend any one:A;B;C;A and B;A and C;B and C;A, B and C ".Only when element, function, step or the combination of operation are in certain sides
When inherently mutually exclusive under formula, it just will appear the exception of this definition.
In fraud detection technique, domain expert provides the warp of data classification for the core technology that fraud identifies
The demand with classification results accuracy is tested, but the parameter in algorithm framework itself and algorithm is not known to them.Field
Expert is examined due to the mode for having no way of classifying to data during being detected when obtaining fraud using fraud detecting system
When surveying result, domain expert is other than verifying testing result, the accuracy for judging obtained testing result of having no way of.
In order to improve the accuracy of fraud detecting system, the application provides a kind of group number applied to fraud detecting system
According to method for visualizing, categorized obtained group and its data set in fraud detecting system are shown in a manner of visual
To algorithm expert and domain expert so that different users (such as domain expert or algorithm expert) by a variety of interactive means come
Various frauds are explored, and fraud detection algorithm can flexibly be changed according to fraud feature.
The group data method for visualizing is mainly executed by computer equipment.The computer equipment can be following
Suitable computer equipment, such as handheld computer device, tablet computer equipment, notebook computer, desktop PC,
Server etc..Computer equipment includes display, input unit, the port input/output (I/O), one or more processors, deposits
Reservoir, non-volatile memory device, network interface and power supply etc..The various parts may include hardware element (such as core
Piece and circuit), software element (such as tangible non-transitory computer-readable medium of store instruction) or hardware element and software
The combination of element.In addition, it may be noted that various parts can be combined into less component or be separated into additional component.For example,
Memory and non-volatile memory device can be included in single component.The computer equipment can be individually performed described visual
Change method, or coordinate with other computer equipments and execute.In some embodiments, computer equipment executes method for visualizing simultaneously
Corresponding visualization interface is shown.For example, computer equipment includes processor, display, wherein in the processor
Engine (or display engine) is presented in upper execution, and the engine that presents is used to execute the group data method for visualizing and passes through
Display is shown, here, the engine that presents includes but not limited to that can parse to be used for boundary based on what program language was developed
The software and hardware, such as XML, HTML script, C language etc. that face is shown.In yet other embodiments, a computer
Equipment executes method for visualizing and is supplied to another computer equipment to be shown corresponding visualization interface.For example, objective
Family end group operates in the request of user and initiates to ask to server-side and log in the server-side, server-side execute method for visualizing with
Corresponding interface data is formed, and the interface data is fed back into client, by the browser of client or the application of customization
Program shows corresponding diagram according to respective interface data.
The method for visualizing is mainly executed by fraud detecting system.The fraud detecting system may include
Software and hardware in one or more computer equipments.It is done as a fraud group to provide a group to domain expert
" whether same group of user has identical behavioural habits " that and algorithm expert are proposed.The application is from group
A kind of method for visualizing is provided in group internal members' relationship.Referring to Fig. 1, being shown as the group number of the application in one embodiment
According to method for visualizing flow chart.As shown, the group data method for visualizing includes the following steps:
In step s 11, the data set of a group is obtained.The data characteristics of the data set includes user information, IP
It is one or more in address, event type, event initiation source, event response side and Time To Event.Wherein, the use
Family information refers to the information of characterization user identity, for example, User ID, unique user's pet name, certificate number etc..User's letter
Breath further includes:Phone number, mailbox, ID number, gender, user equipment used by a user number, registion time etc..The IP
Location indicates the IP address of computer equipment corresponding when same user information generates event in a network.The event type is
It is recorded in the type that user behavior event is indicated in network operation daily record comprising but be not limited to:It is carried out between the network user
It the Social behaviors such as pays close attention to, thumb up, commenting on, presenting and (being either referred to as to give a present) or the network user logs in, publishes, updates
At least one of operation behaviors such as state, registration, modification information.Same user information can correspond at least one event type,
Each event type corresponds to event and initiates source, event response side and Time To Event.For example, same user information can correspond to it is more
It is a to thumb up event type, it each thumbs up event type and corresponds to respective event initiation source, event response side and Time To Event.
In certain embodiments, determine that the mode of a group is described below, referring to Fig. 2, being shown as the institute of the application
A kind of flow chart of one group data collection of acquisition of embodiment is provided, as shown, the step S11 further comprises:
Step S111 obtains the operation log that cluster is made of multiple network users;In various embodiments, the collection
Group is a cluster of the all-network user composition that can be got, the network user in the cluster from same website or
The different website of person also or from different Internet channels, for example can be internet, one or more intranets, local
Net (LAN), wide area network (WLAN), storage area network (SAN) etc. or its is appropriately combined, can also be the mobile communication of mobile phone
Network etc..
Step S112 determines at least one data characteristics from the operation log of the multiple network user, and analyzes institute
The similarity of at least one set of data characteristics in operation log is stated with the determination group;In the particular embodiment, for network
Fraud will necessarily leave the characteristics of user is using data in a network, collected in fraud detecting system and come from least one
The operation log of multiple network users of a website, by analyzing the similar of at least one data characteristics in the operation log
Degree, the user to generating corresponding operating daily record are grouped, and obtain the data set of group and group in operation log.
Step S113 obtains the data set of the group.In some embodiments, the data set can be obtained from a storage
The database of You Ge groups and its data set, the database are for example configured in the storage server of a distal end, or configuration
In storage device in local computer equipment, then the data set of an acquired group can be grasped based on the input of user
Work is extracted from database and is obtained.For example, the fraud detecting system obtains multiple groups using unsupervised detection algorithm
Group, user select one of group by selection interface, then obtain the data set of relevant groups.
Specifically, the fraud detecting system is first to all data in operation log in the phase of same class data characteristics
It is calculated like degree, wherein the similarity available information entropy is weighed, for example, the fraud detecting system point
Not Li Yong user information calculate the comentropy of IP usage amounts or maximum IP usage amount dimensions, utilize event type calculating operation type
The comentropy of dimension calculates the comentropy of bad operation dimension using the comentropy or operating time of registion time dimension;By
By above-mentioned calculating, recycles unsupervised detection mode to be detected obtained each comentropy and divide to obtain multiple groups
Group.Wherein, the unsupervised detection mode citing includes using the algorithm based on dense subgraph or the calculation based on vector space
Method etc..Each group that method for visualizing provided herein is presented for reflect shared resource used in fraud,
Customer relationship etc., to allow the user using the fraud detecting system more clearly to determine in the unsupervised detection algorithm
Classification policy it is whether reasonable.Wherein, the shared resource includes but not limited to shared IP, mailbox etc., and customer relationship includes
But it is not limited to:User's concern, interactive relation etc..
In one embodiment, the method for visualizing further includes the steps that at least one group interface of display, the group
Group size in group interface is characterized with the geometric figure size shown.Implement one referring to Fig. 3, being shown as the application
The interface for including multiple groups shown in example, as shown, 11 groups are shown in interface A, for characterizing those groups
Geometric figure be circle, 11 groups are all located in a maximum circle of dotted line, in the circle of dotted line, such as described
Circle of dotted line is used for characterizing cluster be made of N number of network user, such as marked as 0 group is normal group, at one
There are 10 groups of different sizes marked as 1-10 in smaller circle of dotted line, the number of members of circular size and group is at just
Than, that is, the big group's person's of being expressed as quantity is more, the small group's person's of being expressed as negligible amounts, for another example the group marked as 1-10
For abnormal group.In various embodiments, the geometric figure of the group can be arbitrary shape.The face of geometric figure
Color can be randomly provided, or related to the number of members of the quantity of group or group.For example, N kind colors are preset with, the fraud thing
Part detecting system randomly corresponds to different colours on the geometric figure for characterizing each group.For another example, the fraud detection
System is corresponding in turn to the geometric figure for characterizing each group according to the ascending sequence of number of members according to preset color sequences
On.The display interface described in the user's operation and when choosing a geometric figure, the fraud detecting system obtains a group
The data set of group.
In a preferred embodiment, can also include display group information at least one group interface of display
Information bar, when user selects a group in the group interface, interface side with the side of form or text box
Formula shows that the essential information of the group, the essential information are, for example,:Group's coding, number of members, for determining the group
The most preferred data characteristics of group, the information such as group attribute (such as normal group or abnormal group).
In step s 12, target signature is determined from the data characteristics.Here, will in order to be based on same data characteristics
Incidence relation in a detected group between each user information is showed, and the fraud detecting system can press
According at least one data characteristics with incidence relation in the fraud detected as can characterize between group member
The target signature of incidence relation.In one embodiment, such as in detection corpse powder analysis, the fraud detecting system is certainly
Source and event response side dynamicly are initiated as target signature using IP address, event.Alternatively, the fraud detecting system is based on using
The selection operation at family selects at least one data characteristics as target signature.For example, it is target that user, which selects concern event type,
Feature, the fraud detecting system will then be built used when the member relation of group based on the concern event type
Each data characteristics target signature or user is used as to select IP address for target signature, the fraud detecting system
Then it regard each data characteristics used when building the member relation of group based on the IP address as target signature.
In step s 13, the member of the group is associated according to the event type, and in display circle of output
It is characterized with point and line chart in face.Wherein, the point and/or line are for characterizing identified target signature.Referring to Fig. 4, its
It is shown as generating the incidence relation interface schematic diagram of various event types in a group between each member.As shown, scheming
In the embodiment for showing interface B, such as user selects IP address for target signature, and the expression of " Huang " color dot is determined based on user information
Member, the member in various event types be only event initiate source;" indigo plant " color dot is indicated based on user information and determination
Member, the member are at least event response side in various event types;Between line two members of expression between any two points extremely
A kind of event type is generated less;The color person of being expressed as of line IP of used computer equipment when initiating an event type
Location (being specially the grouping of IP address).By showing the interface, domain expert can by analyze the point of same color quantity,
Accounting relationship between identical IP address (grouping of IP address) proportion, different colours point etc. verifies detected group minute
Class is preferably or bad accuracy.For example, as shown in figure 4, the line that " Huang " color dot is drawn far fewer than " indigo plant " color dot and " Huang " color dot
Solid color accounting it is high, then illustrate the member represented by " Huang " color dot belong to corpse powder account possibility it is high, simultaneously
Member represented by " indigo plant " color dot, which belongs to, employs the possibility of corpse powder account high.Domain expert is by observing shown be somebody's turn to do
Quantity, the color etc. of each line for the line that the color at interface midpoint, the distribution of each color point, each point are drawn, are able to verify that through this
Group belongs to the accuracy of the group result of fraud group.
In certain embodiments, the target signature for describing group member relationship may be one, other data are special
Sign is rendered on as supplemental characteristic in display interface.Still by taking Fig. 4 as an example, the diversification of IP address is table in shown interface
It is the degree of risk for cheating group to levy the group, so, in shown member relation, with the color of point, the shape of point, or
The event of event type produced by the combination characterization of the color and shape of point initiates source and event response side;It is formed with line phenon
Association between member, and the color of the line and/or shape are used to characterize identified target signature (such as point of IP address
Group).Wherein, the color of line can utilize light tone system prompt domain expert to check, when domain expert observes that the color of line especially collects
In, then assist to verify with the quantity of the point of a variety of colors and belong in the group member being grouped via fraud detecting system
Whether meet design requirement in the accuracy of rogue member.
In other embodiments, in order to more easily shown in above-mentioned group member relationship interface group and at
The related data of member, the method for visualizing further include being shown in a manner of text box in the side of display interface midpoint line chart
The step of showing at least one of group information, user information, event information and predictive information.
Here, information shown in the text box of the point and line chart side can be opened up based on the selection operation of user
Exhibition.For example, referring to Fig. 5, it is shown as illustrating at the interface of the point and line chart interface side display text frame of group member relationship
Figure.It, can be first in the text box of right side when the user clicks when one in the point and line chart interface C in interface C as shown in the figure
It first shows the user information represented by the point, such as at least one of User ID, gender, plays the part of in generated event type
Role's (such as event initiates source or event response side), and by fraud detecting system regrouping prediction prediction result (such as
Belong to fraudulent user or belong to normal users).
A text box can also be provided in the point and line chart interface C to show that the selected each member's of user is detailed
Thin information.For example, as shown in figure 5, showing that the user information of the selected point of user includes in Node_Info text boxes:With
Family ID (User_id:A1b2c3d4e5f6g), gender (Sex:Female), mailbox (Mail), user tag or type (Label:
) and registion time (Reg_time anomaly_source:2017-07-15,19:10:29) etc..
In interface as shown in Figure 5, also shows that the attribute information of selected user, i.e., be grouped by fraud detecting system
The prediction result of prediction shows Abnormal if belonging to fraudulent user or belongs to normal users Normal.
When member represented by the point produces multiple event types, same user's letter can be shown with multiple text boxes
Breath institute's role in each event type.In interface C as shown in Figure 5, the point of user will be characterized in a manner of perpendicular row
Information is shown, and the content of display is, for example, User ID (User_id:abcdefg1234567);User's gender (Sex:
Female), user tag or type (Label:anomaly_source);Also by characterize user between relationship line information into
Row displaying, for example, event type (Event_type:follow);IP address is grouped (IP:123.123.123), in the relationship
Event initiates source (Source_id:1234567abcdefg) and event response side (Target_id:7654321gfedcba);Thing
Part time of origin (Timestamp:2017-08-07,21:49:05) etc..For example, showing the group in interface shown in fig. 5
The number of group is 4, and the quantity for generating concern event type (follow) is 131, generates the quantity of present event type (like)
It is 21, sum is 152 etc..Thereby, all event informations corresponding to same USER_ID (include for example, event type and angle
Color information) and predictive information be shown in individual text box.Moreover, group can also be shown in point and line chart interface
The statistical information of the event type of member.
In some embodiments, whether user is not only concerned about the relationship of group member, reasonable more concerned with the group distributed,
This needs the detailed data feature that user can check in each group and each data characteristics built for classifying group
Preferred order.The method for visualizing may include the step of showing the interface of the data set of a group.Shown data set
It is shown with list mode, thus displays for a user the details of data characteristics in same group.To improve the group
Data set classification accuracy divides based on when shown list can classify according to fraud detecting system in the interface
Class priority shows the data characteristics list in a group by column.For example, referring to Fig. 6, display the application is implemented one
The list interface schematic diagram of the data set of the group shown in example.In the list interface schematic diagram, shown one
It is obtained by the sequence sequence of priority from high to low that the data set of a group, which is according to the similitude of data characteristics,.When first excellent
It when data characteristics similitude in first grade is identical, is ranked up according to the data characteristics of the second priority, implementation shown in Fig. 7
In example, the sequence of the priority from high to low is:IP address (segmentation or grouping of IP address), event initiate source
(source), event response side (target), event type (event_type) and Time To Event (timestamp).
In the present embodiment, the new line (gauge outfit) of table is encoded with the importance of different lines, if the value of a feature more collects
In, then this feature is more important.In an embodiment provided by the present application, the fraud detecting system is to pass through meter
The comentropy of each feature is calculated to represent this characteristic.If comentropy is lower, it means that consistency is higher.Then institute
Fraud detecting system is stated to be ranked up feature according to the incremental sequence of comentropy, it is finally that the list head of low comentropy is suitable
It is that sequence leans on prompt family note that certain, under different performances, can also according to the list head in the table that will be shown into
Row color rendering, for example finally prompt the attention at the family row to be characterized to be most deep the color rendering of the list head of low comentropy
Data characteristics it is mostly important, and so on carry out other data characteristicses that the color rendering row are characterized, and then obtain in figure
Shown in data set list interface.The list interface can be undertaken on show multiple group interfaces the step of after or step S13 it
Before, then the selection operation of the list interface is selected based on user and is shown.
In certain embodiments, it is whether the further data set for characterizing acquired group can reflect fraud
Characteristic, it is also necessary to be shown from other dimensions.For example, network operation data and group data by comparing normal users
Collect to further confirm that the accuracy of detected fraud.For this purpose, the method for visualizing further includes:Show the group
Data set feature distribution interface the step of.Wherein, the feature distribution interface can be shown with each data type in entirety
Distribution in network, the overall network are opposite, for example form a cluster by multiple network users, then can pass through
The distribution of some data characteristics in the interface display cluster in some group, referring to Fig. 3, maximum empty in such as Fig. 3
Line circle indicates one and forms cluster by multiple network users, and cluster Zhong You11Ge groups are the group that number is 0-10 respectively,
Therefrom a group is selected to be shown into row information.
In some embodiments, the data type that feature distribution interface can be shown is, for example,:It ties up at average operating time interval
The comentropy (average operation interval entropy) of degree, the comentropy (IP of IP address usage amount dimension
Used amount entropy), the comentropy (sex entropy) of gender dimension, the comentropy (email of Email dimension
Entropy), the comentropy (reg time entropy) of registion time dimension, the comentropy of number of operations dimension
(operation times entropy), the comentropy (device amount entropy) of number of devices dimension operate class
The comentropy (operation type entropy) of type dimension, the maximum amount of comentropy (max that is used by others using IP
IP used be used amount entropy) etc..In the embodiment shown in fig. 6, it is with the entropy of registion time dimension
It is shown for data characteristics, i.e. Fig. 6 is shown as the comentropy of registion time in a group (registration period) dimension in net
Feature distribution in network cluster.In order to which effective ratio is to the spy of the network operation data of acquired group data collection and normal users
Distributional difference is levied, referring to Fig. 7, its flow chart for being shown as showing the interface of the feature distribution of the data set of the group, such as
Shown in figure, include the following steps:
In step S211, a group is selected, and at least one data are determined from the data set of the group
Feature.In one embodiment, group marked as 2 such as in selection Fig. 3, and from the data in the group marked as 2
It concentrates and determines a data characteristics for being user information, for example the user information is registion time.
In step S212, feature of at least one data characteristics of the determination in the group and cluster point is counted
Cloth.In the present embodiment, the statistics feature distribution for the data characteristics of registion time in the group, and statistics institute
State feature distribution of the data characteristics for registion time in the entire cluster.
In step S213, shows the histogram of the feature distribution and correspond to the histogram in entire cluster histogram
In profiles versus figure.In the present embodiment, based on the coding to the data characteristics, the display data for registion time
The histogram of feature feature distribution in the group, and the display data characteristics for registion time is in the entire collection
The histogram of feature distribution in group.Referring to Fig. 8, being shown as registion time of the application in one embodiment in a group
The histogram of feature distribution and the interface of comparison diagram, as shown, in the interface D, figure (a) is shown as selected marked as 2
Group in registion time feature distribution thumbnail, the amplification of the corresponding thumbnail, then be the amplification of lower side in the D of interface
Scheme (d), it can be seen from the enlarged drawing in the group, from August 1 day to 31 middle of the month of August, the group member
The time for carrying out registration operation concentrates on August 5, August 6 days, August 11 days, August 12 days and August this 5 days on the 16th, and in institute
Figure (c) in the D of interface is stated to be characterized as registering the histogram for the Annual distribution that user carries out registration operation in August part in the cluster
Figure, from the figure (c) as can be seen that registration user has certain rule in the in one's duty registration distribution of August in the cluster, on boundary
(b) is schemed in the D of face is characterized as scheme data characteristics that (d) and figure (c) overlap to be shown as registion time described whole
Difference in a cluster and in the group of selection.In order to allow users to know the difference between different characteristic and connection
It is to be presented this block diagram in the form of three layers in embodiment provided by the present application, user, which passes through, clicks one of contracting
After sketch map, the page will be scrolled into schemes by normalized profiles versus.Certainly, in specific application, the data characteristics
Thumbnail there may also be multiple, each represent different data characteristicses.
It in some embodiments, can also be by carrying out color rendering to histogram to distinguish or emphasize some data characteristics
Feature distribution or Dynamic Announce (such as the mode flickered) are to distinguish or emphasize certain number in the group and entire cluster
According to feature in the group and entire cluster feature distribution.
In some embodiments, described in order to further analyze the difference between multiple groups in a network cluster
Group data method for visualizing further includes the steps that the interface of the feature distribution for the data set for showing multiple groups, please refers to Fig. 9
And Figure 10, Fig. 9 are shown as the application and show the step flow chart that multiple groups are distributed in the cluster, Figure 10 in one embodiment
It is shown as the application and shows multiple groups distribution interface E in the cluster in one embodiment, as shown, the step packet
It includes:
In step S311, multiple groups are determined in the cluster be made of multiple network users, use different shape, figure respectively
Mark, label and/or the difference of the multiple group of characterization;In one embodiment, for example, selection Fig. 3 in label 0,1 and 2
3 groups, wherein show marked as 0 group " green " color table, show marked as 1 group " red " color table, marked as 2
Group is shown with " indigo plant " color table.
In step S312, at least one data characteristics is determined from the data set of the multiple group;In the present embodiment
In, a data characteristics, such as IP address are determined from the data set of this 3 groups.
In step S313, based between each two network user at least one data characteristics analysis respectively group
Relative Entropy as the similarity degree between measuring each two network user;In the present embodiment, it is based on the IP
Relative Entropy (the letter of IP usage amount dimensions in 3 groups of adress analysis label 0,1 and 2 between each two network user
Cease entropy, IP used amount entropy) as the similarity degree measured between each two network user.For example, adopting
It is used as with the method t-SNE of Data Dimensionality Reduction (t- distribution neighborhoods embedded mobile GIS) and with the relative entropy between two users and measures this
The index of a little network user's distances.
In step S314, display interface is exported, in the interface, with shape, icon, and/or tag characterization network
User characterizes the difference of the multiple group with different colours, and two network users in each group are characterized with the distance of display
Between similarity degree.In the present embodiment, it is in interface E as shown in Figure 10, characterizes the network user with dot, " green " color table shows
Group marked as 0 shows the group marked as 1 with " red " color table, the group marked as 2 is shown with " indigo plant " color table, wherein uses " indigo plant "
Color table show it is shorter marked as the user distance in 2 group, the group form tufted distribution, shown marked as 1 with " red " color table
User distance in group is also shorter, which forms tufted distribution, and point of the normal users of random sampling is shown with " green " color table
Cloth, farther out, distribution more disperses distance between normal users.Thereby it is believed that a group is if it is dense cluster,
Be considered as a fraud group possibility it is bigger.Such as in embodiment shown in Figure 11, this is in the group that " green " color table shows
The distribution more disperseed, then it represents that for should " green " colo(u)r group group be normal group, it is therein it is " green " point expression user be also normal
User.Opposite, what is shown with " red " color table group (group i.e. marked as 1) and group's (i.e. label for being shown with " indigo plant " color table
Group for 2) in being distributed at tufted, then it represents that for should " red " and " indigo plant " colo(u)r group group be exception group, wherein use it is " red " put and
The user that " indigo plant " point indicates is abnormal user.In one embodiment, led to using user's interactive of the visualization system
Mouse is crossed to suspend to check the specifying information of user and feature value in each group.
In other examples, in the interface of output, for example, shape, icon, and/or tag characterization can also be used
The network user, such as shape are the geometric figures such as triangle, rectangle, for example icon is smiling face or face of crying, human skeleton head portrait, Qiang Daotou
As etc. icons, such as label word or with the symbol etc. clearly distinguished.
The group data method for visualizing of the application is by by the data set of determined group in fraud detection process
It is presented based on the modes such as member relation, type distribution, tabulation in group, realizes institute during detecting fraud
The data characteristics of point group is shown with a variety of relationship interfaces, is conducive to domain expert and algorithm expert and is detected to fraud
The detection algorithm of system is assessed and is revised.
The application also provides a kind of computer equipment, and the computer equipment can be following suitable computer equipment,
Such as handheld computer device, tablet computer equipment, notebook computer, desktop PC, server etc..Computer is set
Standby includes display, input unit, the port input/output (I/O), one or more processors, memory, non-volatile memories
Equipment, network interface and power supply etc..The various parts may include hardware element (such as chip and circuit), software member
The combination of part (such as tangible non-transitory computer-readable medium of store instruction) or hardware element and software element.In addition,
It may be noted that various parts can be combined into less component or be separated into additional component.For example, memory and non-volatile
Storage device can be included in single component.The computer equipment can be individually performed the method for visualizing, or and other
Computer equipment cooperation executes.
1 is please referred to Fig.1, the configuration diagram of the application computer equipment in one embodiment is shown as, as shown,
In present embodiment, the computer equipment 1 include one or more processors 11 and what is executed on the processor 1 be in
Existing engine 12, to execute above-mentioned method for visualizing and be shown corresponding visualization interface.For example, computer equipment packet
Containing processor 11, display and the presentation engine 12 executed on the processor 11, wherein held on the processor 11
Capable presentation engine (or display engine), the group data that engine 12 is presented for executing described in above-described embodiment are visual
Change method simultaneously shown by display, execute the description of the implementation process of the group data method for visualizing refering to for
The description of Fig. 1 to Figure 10.Under specific implementation state, the presentation engine is, for example, to be stored in local computer device
On memory or in remote storage server, the presentation engine includes but not limited to that can parse to develop based on program language
The software and hardware, such as XML, HTML script, C language etc. for interface display.In yet other embodiments, one
Platform computer equipment executes method for visualizing and is supplied to another computer equipment to be shown corresponding visualization interface.
It initiates to ask and log in the server-side to server-side for example, request of the client based on user is operated, server-side executes visual
The interface data is fed back to client by change method to form corresponding interface data, by the browser of client or fixed
The application program of system shows corresponding diagram according to respective interface data.
The application also provides a kind of client, and the client passes through one server-side of network connection, in the present embodiment, institute
It is, for example, web client to state client, and the client is, for example, web services end, and the web client is based on sending web industry
Business request executes the group data method for visualizing described in above-described embodiment and passes through display to log in the web services end
It is shown, executes the description of the implementation process of the group data method for visualizing refering to the description for Fig. 1 to Figure 10.
The application also provides a kind of server, by one client of network connection, in the present embodiment, the client example
Such as it is web client, the client is, for example, web services end, and the web server executes request based on web client
Operation sends the group data method for visualizing executed described in above-described embodiment to the client and is given by display
It has been shown that, executes the description of the implementation process of the group data method for visualizing refering to the description for Fig. 1 to Figure 10.
The application also provides a kind of browser, by one server-side of network connection, the browser be based on sending request with
It logs in the server-side to execute the group data method for visualizing described in above-described embodiment and shown by display, hold
The description of the implementation process of the row group data method for visualizing is refering to the description for Fig. 1 to Figure 10.In the present embodiment,
The browser is, for example, web browser, including but not limited to QQ browsers, Internet Explorer browsers,
Firefox browser, Safari browsers, Opera browsers, Google Chrome browsers, baidu browser, search dog are clear
Look at device, cheetah browser, 360 browsers, UC browsers, proud trip browser, Window on the World browser etc..
The application also provides a kind of group data visualization system, the group data visualization system may include one or
Software and hardware in multiple computer equipments, and the data set for the group that fraud detecting system is detected carries out visually
Change.Do what what and algorithm expert were proposed as a fraud group to provide group one by one to domain expert
" whether same group of user has identical behavioural habits ".The application provides a kind of group data from group member relationship
Visualization system.2 are please referred to Fig.1, the modular structure signal of group data visualization system provided herein is shown as
Figure.As shown, the group data visualization system 3 includes acquisition module 31, processing module 32 and display module 33.
Wherein, the acquisition module 31 is used to obtain the data set of a group.The data characteristics of the data set includes
User information, IP address, event type, event are initiated one or more in source, event response side and Time To Event.
Wherein, the user information refers to the information of characterization user identity, for example, User ID, unique user's pet name, certificate number
Deng.The user information further includes:When phone number, mailbox, ID number, gender, user equipment used by a user number, registration
Between etc..The IP address indicates the IP address of computer equipment corresponding when same user information generates event in a network.
The event type is recorded on the type that user behavior event is indicated in network operation daily record comprising but be not limited to:Network
The concern that is carried out between user the Social behaviors such as thumbs up, comments on, presenting and (being either referred to as to give a present) or the network user carries out
Log in, publish, more new state, registration, at least one of operation behaviors such as modification information.Same user information can correspond at least
One event type, each event type correspond to event and initiate source, event response side and Time To Event.For example, same use
Family information can correspond to it is multiple thumb up event type, each thumb up event type correspond to respective event initiate source, event response side and
Time To Event.
In certain embodiments, determine that the mode of a group is described below, referring to Fig. 2, being shown as the institute of the application
A kind of flow chart of one group data collection of acquisition of embodiment is provided, as shown, the acquisition module 31 it is executable with
Lower step S111-S113:
Step S111 obtains the operation log that cluster is made of multiple network users;In various embodiments, the collection
Group is a cluster of the all-network user composition that can be got, the network user in the cluster from same website or
The different website of person also or from different Internet channels, for example can be internet, one or more intranets, local
Net (LAN), wide area network (WLAN), storage area network (SAN) etc. or its is appropriately combined, can also be the mobile communication of mobile phone
Network etc..
Step S112 determines at least one data characteristics from the operation log of the multiple network user, and analyzes institute
The similarity of at least one set of data characteristics in operation log is stated with the determination group;In the particular embodiment, for network
Fraud will necessarily leave the characteristics of user is using data in a network, collected in acquisition module 31 and come from least one website
Multiple network users operation log, by analyzing the similarity of at least one data characteristics in the operation log, to production
The user of raw corresponding operating daily record is grouped, and obtains the data set of group and group in operation log.
Step S113 obtains the data set of the group.In some embodiments, the data set can be obtained from a storage
The database of You Ge groups and its data set, the database are for example configured in the storage server of a distal end, or configuration
In storage device in local computer equipment, then the data set of an acquired group can be grasped based on the input of user
Work is extracted from database and is obtained.For example, the acquisition module 31 obtains multiple groups, user using unsupervised detection algorithm
One of group is selected by selection interface, then obtains the data set of relevant groups.
Specifically, the acquisition module 31 first to all data in operation log same class data characteristics similarity into
Row calculates, wherein the similarity available information entropy is weighed, for example, the acquisition module 31 is utilized respectively user's letter
Breath calculates the comentropy of IP usage amounts or maximum IP usage amount dimensions, utilizes the information of event type calculating operation type dimension
Entropy calculates the comentropy of bad operation dimension using the comentropy or operating time of registion time dimension;By above-mentioned meter
It calculates, recycles unsupervised detection mode to be detected obtained each comentropy and divide to obtain multiple groups.Wherein, described
Unsupervised detection mode citing includes using the algorithm based on dense subgraph or the algorithm etc. based on vector space.The application
Each group that the method for visualizing provided is presented for reflecting shared resource, customer relationship etc. used in fraud,
To allow the user using the acquisition module 31 more clearly to determine whether the classification policy in the unsupervised detection algorithm closes
Reason.Wherein, the shared resource includes but not limited to shared IP, mailbox etc., and customer relationship includes but not limited to:User is closed
Note, interactive relation etc..
In one embodiment, the method for visualizing further includes the steps that at least one group interface of display, the group
Group size in group interface is characterized with the geometric figure size shown.Implement one referring to Fig. 3, being shown as the application
The interface for including multiple groups shown in example, as shown, 11 groups are shown in interface, for characterizing those groups
Geometric figure is circle, and 11 groups are all located in a maximum circle of dotted line, in the circle of dotted line, such as the void
Line circle is used for characterizing cluster be made of N number of network user, such as marked as 0 group is normal group, one compared with
There are 10 groups of different sizes marked as 1-10 in small circle of dotted line, circular size is directly proportional to the number of members of group,
That is, the big group's person's of being expressed as quantity is more, the small group's person's of being expressed as negligible amounts are for another example different marked as the group of 1-10
Normal group.In various embodiments, the geometric figure of the group can be arbitrary shape.The color of geometric figure can
It is randomly provided, or related to the number of members of the quantity of group or group.For example, being preset with N kind colors, the acquisition module 31
Randomly different colours are corresponded on the geometric figure for characterizing each group.For another example, the acquisition module 31 is according to preset face
Color sequence, is corresponding in turn to according to the ascending sequence of number of members on the geometric figure for characterizing each group.When user's operation institute
When stating display interface and choosing a geometric figure, the acquisition module 31 obtains the data set of a group.
In a preferred embodiment, can also include display group information at least one group interface of display
Information bar, when user selects a group in the group interface, interface side with the side of form or text box
Formula shows that the essential information of the group, the essential information are, for example,:Group's coding, number of members, for determining the group
The most preferred data characteristics of group, the information such as group attribute (such as normal group or abnormal group).
Processing module 32 is used to determine target signature from the data characteristics.Here, in order to be based on same data characteristics
Incidence relation between each user information in a detected group is showed, the processing module 32 can be according to institute
At least one data characteristics with incidence relation is as the association between capable of characterizing group member in the fraud of detection
The target signature of relationship.For example, in detecting corpse powder event, the processing module 32 is automatically initiated with IP address, event
Source and event response side are target signature.Alternatively, selection operation of the processing module 32 based on user selects at least one number
According to feature as target signature.For example, it is target signature that user, which selects concern event type, the processing module 32 will be then based on
The concern event type and build each data characteristics used when the member relation of group be used as target signature.
Display module 33 is used to show point and line chart by display interface;Wherein, the point and/or line are determined for characterizing
Target signature.Referring to Fig. 4, its incidence relation circle for being shown as generating various event types in a group between each member
Face schematic diagram.As shown, " Huang " color dot indicates based on user information the member of determination, it is only in various event types
Event initiates source;" indigo plant " color dot indicates based on user information the member of determination, and event is at least in various event types
Responder also can and initiate source for event;Line between any two points indicates at least to generate a kind of event type between two members;
The IP address of the color person of being expressed as of line used computer equipment when initiating an event type.By showing the boundary
Face, accounting between quantity, identical IP address proportion, different colours point that domain expert can be by analyzing the point of same color
Detected group's classification is verified than relationship etc. preferably or bad accuracy.For example, as shown in figure 4, " Huang " color dot far fewer than
The solid color accounting for the line that " indigo plant " color dot and " Huang " color dot are drawn is high, then illustrates that the member represented by " Huang " color dot belongs to
High in the possibility of corpse powder account, the member represented by " indigo plant " color dot belongs to the possibility pole for employing corpse powder account simultaneously
It is high.The number for the line that domain expert is drawn by the shown color at the interface midpoint of observation, the distribution of each color point, each point
Amount, the color etc. of each line are able to verify that the accuracy for the group result for belonging to fraud group through the group.
In certain embodiments, the target signature for describing group member relationship may be one, other data are special
Sign is rendered on as supplemental characteristic in display interface.Still by taking Fig. 4 as an example, the diversification of IP address is table in shown interface
It is the degree of risk for cheating group to levy the group, so, in shown member relation, with the color and/or shape characterization of point
The event of produced event type initiates source and event response side;The association between group member, and the line are characterized with line
Color and/or shape are for characterizing identified target signature (i.e. IP address).Wherein, the color of line can utilize light tone system to prompt
Domain expert checks, when domain expert observes that the color of line is especially concentrated, then assists with the quantity of the point of a variety of colors, can be with
Whether the accuracy for belonging to rogue member in the group member that verification is grouped via fraud detecting system meets design requirement.
In other embodiments, in order to more easily shown in above-mentioned group member relationship interface group and at
The related data of member, the display module 33 are additionally operable to show in a manner of text box in the side of display interface midpoint line chart
Show at least one of group information, user information, event information and predictive information.
Here, information shown in the text box of the point and line chart side can be opened up based on the selection operation of user
Exhibition.For example, referring to Fig. 5, it is shown as illustrating at the interface of the point and line chart interface side display text frame of group member relationship
Figure.As shown, when the user clicks one in the point and line chart interface C when, can show this first in the text box of right side
The represented user information of point, such as at least one of User ID, gender, the role in generated event type
(such as event initiates source or event response side), and (such as belonged to and taken advantage of by the prediction result of fraud detecting system regrouping prediction
Swindleness user belongs to normal users).
A text box can also be provided in the point and line chart interface C to show that the selected each member's of user is detailed
Thin information.For example, as shown in figure 5, showing that the user information of the selected point of user includes in Node_Info text boxes:With
Family ID (User_id:A1b2c3d4e5f6g), gender (Sex:Female), mailbox (Mail), user tag or type (Label:
) and registion time (Reg_time anomaly_source:2017-07-15,19:10:29) etc..
In interface as shown in Figure 5, also shows that the attribute information of selected user, i.e., be grouped by fraud detecting system
The prediction result of prediction shows Abnormal if belonging to fraudulent user or belongs to normal users Normal.
When member represented by the point produces multiple event types, same user's letter can be shown with multiple text boxes
Breath institute's role in each event type.In interface C as shown in Figure 5, the point of user will be characterized in a manner of perpendicular row
Information is shown, and the content of display is, for example, User ID (User_id:abcdefg1234567);User's gender (Sex:
Female), user tag or type (Label:anomaly_source);Also by characterize user between relationship line information into
Row displaying, for example, event type (Event_type:follow);IP address is grouped (IP:123.123.123), in the relationship
Event initiates source (Source_id:1234567abcdefg) and event response side (Target_id:7654321gfedcba);Thing
Part time of origin (Timestamp:2017-08-07,21:49:05) etc..For example, showing the group in interface shown in fig. 5
The number of group is 4, and the quantity for generating concern event type (follow) is 131, generates the quantity of present event type (like)
It is 21, sum is 152 etc..Thereby, all event informations corresponding to same USER_ID (include for example, event type and angle
Color information) and predictive information be shown in individual text box.Moreover, group can also be shown in point and line chart interface
The statistical information of the event type of member.
In some embodiments, whether user is not only concerned about the relationship of group member, reasonable more concerned with the group distributed,
This needs the detailed data feature that user can check in each group and each data characteristics built for classifying group
Preferred order.The method for visualizing may include the step of showing the interface of the data set of a group.Shown data set
It is shown with list mode, thus displays for a user the details of data characteristics in same group.To improve the group
Data set classification accuracy divides based on when shown list can classify according to fraud detecting system in the interface
Class priority shows the data characteristics list in a group by column.For example, referring to Fig. 6, display the application is implemented one
The list interface schematic diagram of the data set of the group shown in example.In the list interface schematic diagram, shown one
It is obtained by the sequence sequence of priority from high to low that the data set of a group, which is according to the similitude of data characteristics,.When first excellent
It when data characteristics similitude in first grade is identical, is ranked up according to the data characteristics of the second priority, implementation shown in Fig. 7
In example, the sequence of the priority from high to low is:IP address, event initiate source (source), event response side (target),
Event type (event_type) and Time To Event (timestamp).In the present embodiment, by the new line of table (gauge outfit)
It is encoded with the importance of different lines, if as soon as the value of feature is more concentrated, then this feature is more important.In this Shen
In the embodiment that please be provided, the fraud detecting system is to represent this spy by calculating the comentropy of each feature
Property.If comentropy is lower, it means that consistency is higher.Then the processing module 32 passs feature according to comentropy
The sequence of increasing is ranked up, that the list head front of low comentropy is finally prompted family note that certain, different implementation
In the case of, display module 33 can also carry out color rendering according to the list head in the table that will be shown, such as finally by low letter
Cease the color rendering of the list head of entropy prompts the data characteristics that the attention at the family row are characterized mostly important to be most deep, with such
Other data characteristicses for promoting the row color rendering row to be characterized, and then obtain data set list interface shown in figure.The row
Before surface and interface can be undertaken on after the multiple group interfaces of display or display module shows point and line chart, then selected based on user
It selects the selection operation of the list interface and shows.
In certain embodiments, it is whether the further data set for characterizing acquired group can reflect fraud
Characteristic, it is also necessary to be shown from other dimensions.For example, network operation data and group data by comparing normal users
Collect to further confirm that the accuracy of detected fraud.For this purpose, the display module 33 is additionally operable to show the group
Data set feature distribution interface.Wherein, the feature distribution interface can be shown with each data type in overall network
Distribution, the overall network is opposite, for example forms a cluster by multiple network users, then can be aobvious by interface
Show the distribution of some data characteristics in the cluster in some group, referring to Fig. 3, maximum circle of dotted line table in such as Fig. 3
Show that one forms cluster by multiple network users, cluster Zhong You11Ge groups are the group that number is 0-10, Cong Zhongxuan respectively
A group is selected to show into row information.
In some embodiments, the data type that feature distribution interface can be shown is, for example,:It ties up at average operating time interval
The comentropy (average operation interval entropy) of degree, the comentropy (IP of IP address usage amount dimension
Used amount entropy), the comentropy (sex entropy) of gender dimension, the comentropy (email of Email dimension
Entropy), the comentropy (reg time entropy) of registion time dimension, the comentropy of number of operations dimension
(operation times entropy), the comentropy (device amount entropy) of number of devices dimension operate class
The comentropy (operation type entropy) of type dimension, the maximum amount of comentropy (max that is used by others using IP
IP used be used amount entropy) etc..In the embodiment shown in fig. 6, it is with the entropy of registion time dimension
It is shown for data characteristics, i.e. Fig. 6 is shown as the comentropy of registion time in a group (registration period) dimension in net
Feature distribution in network cluster.In order to which effective ratio is to the spy of the network operation data of acquired group data collection and normal users
Distributional difference is levied, referring to Fig. 7, its flow chart for being shown as showing the interface of the feature distribution of the data set of the group, such as
Shown in figure, processing module 33 executes following steps:
In step S211, a group is selected, and at least one data are determined from the data set of the group
Feature.In one embodiment, group marked as 2 such as in selection Fig. 3, and from the data in the group marked as 2
It concentrates and determines a data characteristics for being user information, for example the user information is registion time.
In step S212, feature of at least one data characteristics of the determination in the group and cluster point is counted
Cloth.In the present embodiment, the statistics feature distribution for the data characteristics of registion time in the group, and statistics institute
State feature distribution of the data characteristics for registion time in the entire cluster.
In step S213, shows the histogram of the feature distribution and correspond to the histogram in entire cluster histogram
In profiles versus figure.In the present embodiment, based on the coding to the data characteristics, the display data for registion time
The histogram of feature feature distribution in the group, and the display data characteristics for registion time is in the entire collection
The histogram of feature distribution in group.Referring to Fig. 8, being shown as registion time of the application in one embodiment in a group
The histogram of feature distribution and the interface of comparison diagram, as shown, in the interface D, figure (a) is shown as selected marked as 2
Group in registion time feature distribution thumbnail, the amplification of the corresponding thumbnail, then be the amplification of lower side in the D of interface
Scheme (d), it can be seen from the enlarged drawing in the group, from August 1 day to 31 middle of the month of August, the group member
The time for carrying out registration operation concentrates on August 5, August 6 days, August 11 days, August 12 days and August this 5 days on the 16th, and in institute
Figure (c) in the D of interface is stated to be characterized as registering the histogram for the Annual distribution that user carries out registration operation in August part in the cluster
Figure, from the figure (c) as can be seen that registration user has certain rule in the in one's duty registration distribution of August in the cluster, on boundary
(b) is schemed in the D of face is characterized as scheme data characteristics that (d) and figure (c) overlap to be shown as registion time described whole
Difference in a cluster and in the group of selection.In order to allow users to know the difference between different characteristic and connection
It is to be presented this block diagram in the form of three layers in embodiment provided by the present application, user, which passes through, clicks one of contracting
After sketch map, the page will be scrolled into schemes by normalized profiles versus.Certainly, in specific application, the data characteristics
Thumbnail there may also be multiple, each represent different data characteristicses.
It in some embodiments, can also be by carrying out color rendering to histogram to distinguish or emphasize some data characteristics
Feature distribution or Dynamic Announce (such as the mode flickered) are to distinguish or emphasize certain number in the group and entire cluster
According to feature in the group and entire cluster feature distribution.
In some embodiments, described in order to further analyze the difference between multiple groups in a network cluster
Display module also shows the interface of the feature distribution of the data set of multiple groups, please refers to Fig. 9 and Figure 10, and Fig. 9 is shown as this Shen
Please show that the step flow chart that multiple groups are distributed in the cluster, Figure 10 are shown as the application in a reality in one embodiment
It applies and shows multiple groups distribution interface E in the cluster in example, as shown, the step includes:
In step S311, multiple groups are determined in the cluster be made of multiple network users, use different shape, figure respectively
Mark, label and/or the difference of the multiple group of characterization;In one embodiment, for example, selection Fig. 3 in label 0,1 and 2
3 groups, wherein show marked as 0 group " green " color table, show marked as 1 group " red " color table, marked as 2
Group is shown with " indigo plant " color table.
In step S312, at least one data characteristics is determined from the data set of the multiple group;In the present embodiment
In, a data characteristics, such as IP address are determined from the data set of this 3 groups.
In step S313, based between each two network user at least one data characteristics analysis respectively group
Relative Entropy as the similarity degree between measuring each two network user;In the present embodiment, it is based on the IP
Relative Entropy (the letter of IP usage amount dimensions in 3 groups of adress analysis label 0,1 and 2 between each two network user
Cease entropy, IP used amount entropy) as the similarity degree measured between each two network user.For example, adopting
It is used as with the method t-SNE of Data Dimensionality Reduction (t- distribution neighborhoods embedded mobile GIS) and with the relative entropy between two users and measures this
The index of a little network user's distances.
In step S314, display interface is exported, in the interface, with shape, icon, and/or tag characterization network
User characterizes the difference of the multiple group with different colours, and two network users in each group are characterized with the distance of display
Between similarity degree.In the present embodiment, it is in interface E as shown in Figure 10, characterizes the network user with dot, " green " color table shows
Group marked as 0 shows the group marked as 1 with " red " color table, the group marked as 2 is shown with " indigo plant " color table, wherein uses " indigo plant "
Color table show it is shorter marked as the user distance in 2 group, the group form tufted distribution, shown marked as 1 with " red " color table
User distance in group is also shorter, which forms tufted distribution, and the normal users for indicating random sampling are shown with " green " color table
Distribution, the distance between normal users farther out, distribution more disperse.Thereby it is believed that a group is if it is dense
Cluster, be considered as a fraud group possibility it is bigger.Such as in embodiment shown in Figure 11, which shows
Group is in the distribution that more disperses, then it represents that for should " green " colo(u)r group group be normal group, the user of " green " point expression therein
For normal users.Opposite, what is shown with " red " color table group (group i.e. marked as 1) and the group that is shown with " indigo plant " color table
(group i.e. marked as 2) at tufted in being distributed, then it represents that for should " red " and " indigo plant " colo(u)r group group be exception group, wherein use
The user that " red " point and " indigo plant " point indicate is abnormal user.In one embodiment, it can be handed over using the user of the visualization system
The specifying information of user and feature value in each group are checked to mutual formula by mouse suspension.
In other examples, in the interface of output, for example, shape, icon, and/or tag characterization can also be used
The network user, such as shape are the geometric figures such as triangle, rectangle, for example icon is smiling face or face of crying, human skeleton head portrait, Qiang Daotou
As etc. icons, such as label word or with the symbol etc. clearly distinguished.
The group data visualization system of the application is by by the data set of determined group in fraud detection process
It is presented based on the modes such as member relation, type distribution, tabulation in group, realizes institute during detecting fraud
The data characteristics of point group is shown with a variety of relationship interfaces, is conducive to domain expert and algorithm expert and is detected to fraud
The detection algorithm of system is assessed and is revised.
It should be noted that all modules in the fraud detecting system can be configured in single computer equipment
On.Or each module in the fraud detecting system is arranged, respectively the client of user side and the service of network side
On device, and client is connect with server network.For example, the acquisition module and processing module of fraud detecting system are mounted on
In server, display module is mounted in client, and the client is based on sending request to log in the server-side, the clothes
Business device runs the fraud detecting system based on the operation that the client executing is asked to the client, and passes through visitor
Family end shows respective interface.The client includes but not limited to:Configuration is soft in the browser or private client of user terminal
The interface of part and hardware etc. for executing display interface program.
It should also be noted that, through the above description of the embodiments, those skilled in the art can be clearly
Solving some or all of the application can realize by software and in conjunction with required general hardware platform.Based on such reason
Solution, substantially the part that contributes to existing technology can body in the form of software products in other words for the technical solution of the application
Reveal and, which may include machine readable Jie of one or more for being stored thereon with machine-executable instruction
Matter, these instructions can make when being executed by one or more machines such as computer, computer network or other electronic equipments
It obtains the one or more machine and executes operation according to an embodiment of the present application.Machine readable media may include, but be not limited to, soft
Disk, CD, CD-ROM (compact-disc-read-only memory), magneto-optic disk, ROM (read-only memory), RAM (random access memory),
EPROM (Erasable Programmable Read Only Memory EPROM), EEPROM (electrically erasable programmable read-only memory), magnetic or optical card, sudden strain of a muscle
Deposit or suitable for store machine-executable instruction other kinds of medium/machine readable media.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as:Personal computer, service
Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, top set
Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer including any of the above system or equipment
Distributed computing environment etc..
The application can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, group
Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage device.
It should be noted that it will be understood by those skilled in the art that above-mentioned members can be programmable logic device,
Including:Programmable logic array (Programmable Array Logic, PAL), Universal Array Logic (Generic Array
Logic, GAL), field programmable gate array (Field-Programmable Gate Array, FPGA), complex programmable patrol
One or more in volume device (Complex Programmable Logic Device, CPLD), the application, which does not do this, to be had
Body limits.
In conclusion the application by by the data set of fraud detection process Zhong Suofen groups be based on member relation,
The modes such as type distribution, tabulation are presented, realize the data characteristics of Suo Fen groups during detecting fraud with
A variety of relationship interfaces are shown, and are conducive to domain expert and algorithm expert and are carried out to the detection algorithm of fraud detecting system
Assessment and revision.
The principles and effects of the application are only illustrated in above-described embodiment, not for limitation the application.It is any ripe
Know the personage of this technology all can without prejudice to spirit herein and under the scope of, carry out modifications and changes to above-described embodiment.Cause
This, those of ordinary skill in the art is complete without departing from spirit disclosed herein and institute under technological thought such as
At all equivalent modifications or change, should be covered by claims hereof.
Claims (23)
1. a kind of group data method for visualizing is applied in a fraud detecting system, which is characterized in that including following step
Suddenly:
The data set of a group is obtained, the data characteristics of the data set includes user information, IP address, event type, thing
Part is initiated one or more in source, event response side and Time To Event;
Target signature is determined from the data characteristics;And
The member of the group is associated according to the event type, and is carried out with point and line chart in the display interface of output
Characterization;Wherein, the point and/or line are for characterizing identified target signature.
2. group data method for visualizing according to claim 1, which is characterized in that the color and/or shape of the point
Event for characterizing the event type initiates source and event response side;Association between the line characterization group member, institute
It states the color of line and/or shape is used to characterize identified target signature.
3. group data method for visualizing according to claim 1, which is characterized in that further include in the display interface
The side of point and line chart is shown in group information, user information, event information and predictive information at least in a manner of text box
A kind of step.
4. group data method for visualizing according to claim 1, which is characterized in that the data for obtaining a group
The step of collection includes:
Obtain the operation log that cluster is made of multiple network users;
At least one data characteristics is determined from the operation log of the multiple network user, and is analyzed in the operation log extremely
The similarity of few one group of data characteristics is with the determination group;And
Obtain the data set of the group.
5. group data method for visualizing according to claim 1 or 4, which is characterized in that further include that display is at least one
The step of group interface, the group size in the group interface are characterized with the geometric figure size shown.
6. group data method for visualizing according to claim 1 or 4, which is characterized in that further include one group of display
Data set interface the step of, the data characteristics of the data set of the group include user information, IP address, event type,
Event initiates at least the two data characteristics in source, event response side and Time To Event, on the boundary of the group data collection
In face, sequencing display after the group data collection is grouped.
7. group data method for visualizing according to claim 1 or 4, which is characterized in that further include the display group
Data set feature distribution interface the step of:
A group is selected, and determines at least one data characteristics from the data set of the group,
Count feature distribution of at least one data characteristics of the determination in the group and cluster;And
Show the profiles versus's figure of the histogram and the corresponding histogram of the feature distribution in entire cluster histogram.
8. group data method for visualizing according to claim 1 or 4, which is characterized in that further include the multiple groups of display
Data set feature distribution interface the step of:
Multiple groups are determined in the cluster be made of multiple network users, use different shape, icon, label and/or color respectively
Characterize the difference of the multiple group;
At least one data characteristics is determined from the data set of the multiple group;
Based on the Relative Entropy conduct between each two network user at least one data characteristics analysis respectively group
Measure the similarity degree between each two network user;And
Display interface is exported, in the interface, with shape, icon, and/or the tag characterization network user, with different colours table
The difference for levying the multiple group characterizes the similarity degree in each group between two network users with the distance of display.
9. group data method for visualizing according to claim 1, which is characterized in that the event type includes that network is used
The concern at family, thumb up, comment on, presenting, logging in, publishing, more new state, registration, at least one of modification information.
10. a kind of computer equipment, which is characterized in that including:
Processor;
The presentation engine executed on the processor, the presentation engine is for executing as described in claim any one of 1-9
Group data method for visualizing.
11. a kind of group data visualization system, which is characterized in that including:
Acquisition module, for obtaining the data set of a group, the data characteristics of the data set includes user information, IP
It is one or more in location, event type, event initiation source, event response side and Time To Event;
Processing module, for from the data characteristics determine target signature, according to the event type by the group at
Member is associated;And
Display module shows point and line chart for passing through display interface;Wherein, the point and/or line are for characterizing identified mesh
Mark feature.
12. group data visualization system according to claim 11, which is characterized in that the color and/or shape of the point
The event that shape is used to characterize the event type initiates source and event response side;Association between the line characterization group member,
The color and/or shape of the line are for characterizing identified target signature.
13. group data visualization system according to claim 11, which is characterized in that the display module is additionally operable to lead to
Cross the display interface the side of the point and line chart shown in a manner of text box group information, user information, event information,
And at least one of predictive information.
14. group data visualization system according to claim 11, which is characterized in that the group is obtained by described
The operation log for multiple network users that modulus block obtains, and analyze at least one set in the operation log through the processing module
What the similarity of data characteristics determined.
15. group data visualization system according to claim 11, which is characterized in that the display module is additionally operable to show
Show at least one group interface, the group size in the group interface is characterized with the geometric figure size shown.
16. group data visualization system according to claim 11, which is characterized in that the display module is additionally operable to show
Show the interface of the data set of a group, the data characteristics of the data set of the group includes user information, IP address, event class
Type, event initiate at least the two data characteristics in source, event response side and Time To Event, in the group data collection
Interface in, the group data collection it is grouped after sequencing display.
17. group data visualization system according to claim 11, which is characterized in that the display module is additionally operable to show
Show the interface of the feature distribution of the data set of the group, the histogram of the feature distribution and the corresponding histogram are entire
Profiles versus's figure in cluster histogram.
18. group data visualization system according to claim 1, which is characterized in that the display module is additionally operable to show
Show and use shape, icon, and/or the tag characterization network user, the difference of the multiple group is characterized with different colours, with display
Distance characterizes the interface of the similarity degree between two network users in each group.
19. group data visualization system according to claim 11, which is characterized in that the event type includes network
The concern of user, thumb up, comment on, presenting, logging in, publishing, more new state, registration, at least one of modification information.
20. a kind of client passes through one server-side of network connection, which is characterized in that the client is based on sending request to step on
Record the step of server-side executes claim 1-9 any one of them group data method for visualizing.
21. a kind of server passes through one client of network connection, which is characterized in that the server is held based on the client
The operation of row request, claim 1-9 any one of them group data method for visualizing is sent to the client
Process simultaneously shows implementing result by the client.
22. a kind of browser passes through one server-side of network connection, which is characterized in that the browser is based on sending request to step on
Record the step of server-side executes claim 1-9 any one of them group data method for visualizing.
23. a kind of computer readable storage medium is stored with data visualization computer program, which is characterized in that the data
Visual calculation machine program is performed the step of realizing any one of the claim 1-9 group data method for visualizing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810022004.8A CN108280644B (en) | 2018-01-10 | 2018-01-10 | Group membership data visualization method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810022004.8A CN108280644B (en) | 2018-01-10 | 2018-01-10 | Group membership data visualization method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108280644A true CN108280644A (en) | 2018-07-13 |
CN108280644B CN108280644B (en) | 2021-08-03 |
Family
ID=62803412
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810022004.8A Active CN108280644B (en) | 2018-01-10 | 2018-01-10 | Group membership data visualization method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108280644B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968993A (en) * | 2018-09-30 | 2020-04-07 | 北京国双科技有限公司 | Information processing method and device, storage medium and processor |
CN111127026A (en) * | 2019-12-13 | 2020-05-08 | 深圳中兴飞贷金融科技有限公司 | Method, device, storage medium and electronic equipment for determining user fraud behavior |
CN112732398A (en) * | 2021-02-02 | 2021-04-30 | 三盟科技股份有限公司 | Big data visualization management method and system based on artificial intelligence |
CN113837777A (en) * | 2021-09-30 | 2021-12-24 | 浙江创邻科技有限公司 | Graph database-based anti-fraud management and control method, device, system and storage medium |
CN114529391A (en) * | 2022-01-28 | 2022-05-24 | 中银金融科技有限公司 | Suspicious money laundering ganged information identification method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050043961A1 (en) * | 2002-09-30 | 2005-02-24 | Michael Torres | System and method for identification, detection and investigation of maleficent acts |
US20110213788A1 (en) * | 2008-03-05 | 2011-09-01 | Quantum Intelligence, Inc. | Information fusion for multiple anomaly detection systems |
CN103279887A (en) * | 2013-04-26 | 2013-09-04 | 华东师范大学 | Information-theory-based visual analysis method and system for micro-blog spreading |
CN104573071A (en) * | 2015-01-26 | 2015-04-29 | 湖南大学 | Intelligent school situation analysis system and method based on megadata technology |
CN104915793A (en) * | 2015-06-30 | 2015-09-16 | 北京西塔网络科技股份有限公司 | Public information intelligent analysis platform based on big data analysis and mining |
CN107404387A (en) * | 2016-05-19 | 2017-11-28 | 阿里巴巴集团控股有限公司 | The processing method of one species information, device |
CN107515927A (en) * | 2017-08-24 | 2017-12-26 | 深圳市云房网络科技有限公司 | A kind of real estate user behavioural analysis platform |
-
2018
- 2018-01-10 CN CN201810022004.8A patent/CN108280644B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050043961A1 (en) * | 2002-09-30 | 2005-02-24 | Michael Torres | System and method for identification, detection and investigation of maleficent acts |
US20110213788A1 (en) * | 2008-03-05 | 2011-09-01 | Quantum Intelligence, Inc. | Information fusion for multiple anomaly detection systems |
CN103279887A (en) * | 2013-04-26 | 2013-09-04 | 华东师范大学 | Information-theory-based visual analysis method and system for micro-blog spreading |
CN104573071A (en) * | 2015-01-26 | 2015-04-29 | 湖南大学 | Intelligent school situation analysis system and method based on megadata technology |
CN104915793A (en) * | 2015-06-30 | 2015-09-16 | 北京西塔网络科技股份有限公司 | Public information intelligent analysis platform based on big data analysis and mining |
CN107404387A (en) * | 2016-05-19 | 2017-11-28 | 阿里巴巴集团控股有限公司 | The processing method of one species information, device |
CN107515927A (en) * | 2017-08-24 | 2017-12-26 | 深圳市云房网络科技有限公司 | A kind of real estate user behavioural analysis platform |
Non-Patent Citations (1)
Title |
---|
童新安 等: ""可视化数据挖掘在信贷欺诈检测中的应用"", 《宜春学院学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968993A (en) * | 2018-09-30 | 2020-04-07 | 北京国双科技有限公司 | Information processing method and device, storage medium and processor |
CN111127026A (en) * | 2019-12-13 | 2020-05-08 | 深圳中兴飞贷金融科技有限公司 | Method, device, storage medium and electronic equipment for determining user fraud behavior |
CN112732398A (en) * | 2021-02-02 | 2021-04-30 | 三盟科技股份有限公司 | Big data visualization management method and system based on artificial intelligence |
CN113837777A (en) * | 2021-09-30 | 2021-12-24 | 浙江创邻科技有限公司 | Graph database-based anti-fraud management and control method, device, system and storage medium |
CN113837777B (en) * | 2021-09-30 | 2024-02-20 | 浙江创邻科技有限公司 | Anti-fraud management and control method, device and system based on graph database and storage medium |
CN114529391A (en) * | 2022-01-28 | 2022-05-24 | 中银金融科技有限公司 | Suspicious money laundering ganged information identification method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108280644B (en) | 2021-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108280644A (en) | Group member relation data method for visualizing and system | |
CN108268624A (en) | User data method for visualizing and system | |
CN108170830A (en) | Group event data visualization method and system | |
Zhang et al. | Visual analytics for the big data era—A comparative review of state-of-the-art commercial systems | |
CN107729519B (en) | Multi-source multi-dimensional data-based evaluation method and device, and terminal | |
CN107944745B (en) | Risk information evaluation method and system | |
CN106708729B (en) | The prediction technique and device of aacode defect | |
CN110457364B (en) | User information view generation method and device | |
CN109858965A (en) | A kind of user identification method and system | |
CN112380454A (en) | Training course recommendation method, device, equipment and medium | |
CN115271957A (en) | Financial risk analysis and evaluation system and method based on cloud computing | |
Werner | Materiality Maps: Process Mining Data Visualization for Financial Audits | |
CN115545103A (en) | Abnormal data identification method, label identification method and abnormal data identification device | |
CN108510007A (en) | A kind of webpage tamper detection method, device, electronic equipment and storage medium | |
CN114862140A (en) | Behavior analysis-based potential evaluation method, device, equipment and storage medium | |
CN105488061B (en) | A kind of method and device of verify data validity | |
CN115545791A (en) | Guest group portrait generation method and device, electronic equipment and storage medium | |
US11704362B2 (en) | Assigning case identifiers to video streams | |
CN110619564A (en) | Anti-fraud feature generation method and device | |
US11710313B2 (en) | Generating event logs from video streams | |
CN112100183B (en) | Report query system, device and method based on label management | |
CN113869623A (en) | Enterprise risk level determination method and device and readable storage medium | |
Kovanic | Guide to Gnostic analysis of uncertain data | |
Hui et al. | Structural and policy changes in the Chinese housing market | |
CN112651433A (en) | Abnormal behavior analysis method for privileged account |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20181016 Address after: 100084 10 floor 1009-1, 3 building, 1 Zhongguancun East Road, Haidian District, Beijing. Applicant after: Hua Ching Qing Chiao information technology (Beijing) Co., Ltd. Address before: 100084 Tsinghua Yuan, Beijing, Haidian District Applicant before: Tsinghua University |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |