CN108268624A - User data method for visualizing and system - Google Patents

User data method for visualizing and system Download PDF

Info

Publication number
CN108268624A
CN108268624A CN201810022133.7A CN201810022133A CN108268624A CN 108268624 A CN108268624 A CN 108268624A CN 201810022133 A CN201810022133 A CN 201810022133A CN 108268624 A CN108268624 A CN 108268624A
Authority
CN
China
Prior art keywords
group
user
data
decision
decision tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810022133.7A
Other languages
Chinese (zh)
Other versions
CN108268624B (en
Inventor
徐葳
孙娇
姚期智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huakong Tsingjiao Information Technology Beijing Co Ltd
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810022133.7A priority Critical patent/CN108268624B/en
Publication of CN108268624A publication Critical patent/CN108268624A/en
Application granted granted Critical
Publication of CN108268624B publication Critical patent/CN108268624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a kind of user data method for visualizing and system.Wherein, the method includes:The data set of a group is obtained, the data characteristics of data set includes user information, IP address, event type, event and initiates source, event response side and Time To Event;Wherein, the data characteristics of data set is confirmed as different decision priority;A decision tree figure is shown to characterize the attribute test process of all users in group, wherein:Show the data characteristics of the first priority of decision tree figure root node and its decision codomain;Show the final attribute of at least one user of each leaf node characterization of decision tree figure;Show the current attribute of multiple users, the data characteristics of current priority and its decision codomain of each nonleaf node characterization of decision tree figure;And display corresponds to the decision path of the root node or each nonleaf node in decision tree figure, the lines of those decision paths different colours, shape or thickness are characterized.

Description

User data method for visualizing and system
Technical field
This application involves computer processing technology field, more particularly to a kind of user data method for visualizing and system.
Background technology
Online fraud has been current internet dark aspect known to everybody, it all can worldwide be caused every year Immeasurable loss.2015, million ranks that net crime complaint center has been connected in worldwide about taking advantage of The complaint of swindleness problem, and cheat on the net it is annual also can worldwide cause tens economic loss, fraudulent user is usual For can from help promote some specific commodity or spread junk information in receive remuneration.In internet finance, fraud is used The credit card that family is applied for loan, stolen with them using false identity buys commodity, even carries out the unlawful activities such as money laundering.Cause This, in internet business scenario, finding suitable anti-fraud algorithm becomes more crucial, this demand is also growing day by day.
Although nowadays there are many methods to identify the fraud on internet, by constructed fraud detecting system Limitation, the credible of the data of corresponding fraud suspect filtered out needs follow-up a large amount of manpower verification, for example, platform Supervisor need to investigate verification one by one.This so that the revision of such as algorithm parameter, data characteristics are excellent in fraud detecting system Design, algorithm model selection of first grade etc. not only need the Software for Design of algorithm expert, with greater need for the participation of domain expert.Cause This, fraud Detection accuracy can be efficiently modified by improving the transparency of fraud recognizer, how to realize the visual of data Turn to this field urgent problem to be solved.
Invention content
In view of the foregoing deficiencies of prior art, the application is designed to provide a kind of user data method for visualizing And system, for solving the problems, such as that fraud recognizer is visual in the prior art.
In order to achieve the above objects and other related objects, the first aspect of the application provides a kind of user data visualization side Method, applied in a fraud detecting system, the method for visualizing includes the following steps:Obtain the data of a group Collection, the data characteristics of the data set include user information, IP address, event type, event initiate source, event response side and Time To Event;Wherein, the data characteristics of the data set is confirmed as different decision priority;Show a decision tree Figure to characterize the attribute test process of all users in the group, wherein:Show the of the decision tree figure root node The data characteristics of one priority and its decision codomain;Show at least one user of each leaf node characterization of the decision tree figure Final attribute;Show current attribute, the current priority of multiple users of each nonleaf node characterization of the decision tree figure Data characteristics and its decision codomain;And show determining for the root node or each nonleaf node that correspond in the decision tree figure Plan path, the lines of those decision paths different colours, shape or thickness are characterized.
The application second aspect provides a kind of computer equipment, including:Processor;The presentation performed on the processor Engine, the presentation engine are used to perform as above any one of them user data method for visualizing.
The application third aspect provides a kind of user data visualization system, including:Acquisition module, for obtaining a group The data set of group, the data characteristics of the data set include user information, IP address, event type, event and initiate source, event sound Ying Fang and Time To Event;Wherein, the data characteristics of the data set is confirmed as different decision priority;It is and aobvious Show module, for showing a decision tree figure to characterize the attribute test process of all users in the group, wherein, display The data characteristics of first priority of the decision tree figure root node and its decision codomain;Show that the decision tree figure is each The final attribute of at least one user of leaf node characterization;Show multiple use of each nonleaf node characterization of the decision tree figure The current attribute at family, the data characteristics of current priority and its decision codomain;And in the corresponding decision tree figure of display The decision path of root node or each nonleaf node, the lines of those decision paths different colours, shape or thickness carry out table Sign.
The application provides a kind of client in fourth aspect, passes through one server-side of network connection, which is characterized in that the visitor Family end group asks to log in the step of server-side performs user data method for visualizing described in any one of the above embodiments in transmission.
The application provides a kind of server at the 5th aspect, passes through one client of network connection, which is characterized in that the clothes It is visual to send user data described in any one of the above embodiments to the client for the operation that business device is asked based on the client executing The process of change method simultaneously shows implementing result by the client.
The application provides a kind of browser at the 6th aspect, passes through one server-side of network connection, which is characterized in that described clear Device of looking at is based on sending request to log in the step of server-side performs any one of them user data method for visualizing.
The application provides a kind of computer readable storage medium at the 7th aspect, is stored with data visualization computer journey Sequence, which is characterized in that the data visualization computer program, which is performed, realizes that user data described in any of the above-described is visual The step of change method.
As described above, the user data method for visualizing and system of the application, have the advantages that:By that will cheat The modes such as the determining group user grouping process of institute, data characteristics distribution, tabulation are presented in event detection procedure, are realized Suo Fen groups are shown with a variety of relationship interfaces during fraud is detected, and are conducive to domain expert and algorithm expert pair The detection algorithm of fraud detecting system is assessed and is revised.
Description of the drawings
Fig. 1 is shown as the user data method for visualizing flow chart of the application in one embodiment.
Fig. 2 is shown as a kind of embodiment provided herein to obtain the flow chart of a group data collection.
Fig. 3 is shown as the interface for including multiple groups that the application is shown in one embodiment.
Fig. 4 is shown as the schematic diagram of a group user decision tree figure that the application is shown in one embodiment.
Fig. 5 is shown as the application and shows the number of users for being further included in decision tree figure and being classified to each node in one embodiment The display interface of amount.
Fig. 6 be shown as the application show in one embodiment left side for target user's operation log on a timeline, Right side shows the interface schematic diagram of Group Decision tree graph shape.
Fig. 7 is shown as the list interface schematic diagram for the data set of a group that the application is shown in one embodiment.
Fig. 8 is shown as the comentropy of registion time dimension in the group that the application is shown in one embodiment in network The interface schematic diagram of feature distribution in cluster.
Fig. 9 be shown as the application show in one embodiment the group data set feature distribution interface flow Figure.
Figure 10 is shown as the application and shows the step flow chart that multiple groups are distributed in the cluster in one embodiment.
Figure 11 is shown as the application and shows multiple groups distribution interface in the cluster in one embodiment.
Figure 12 is shown as the configuration diagram of the application computer equipment in one embodiment.
Figure 13 is shown as the modular structure schematic diagram of user data visualization system provided herein.
Specific embodiment
Presently filed embodiment is illustrated by particular specific embodiment below, those skilled in the art can be by this explanation Content disclosed by book understands other advantages and effect of the application easily.
In described below, refer to the attached drawing, attached drawing describes several embodiments of the application.It should be appreciated that it also can be used Other embodiment, and can be carried out in the case of without departing substantially from spirit and scope mechanical composition, structure, electrically with And operational change.Following detailed description should not be considered limiting, and the range of embodiments herein Only limited by the claims for the patent announced.Term used herein is merely to describe specific embodiment, and be not It is intended to limitation the application.
Furthermore as used in herein, singulative " one ", "one" and "the" are intended to also include plural number shape Formula, unless there is opposite instruction in context.It will be further understood that term "comprising", " comprising " show that there are the spies Sign, step, operation, element, component, project, type, and/or group, but it is not excluded for other one or more features, step, behaviour Presence, appearance or the addition of work, element, component, project, type, and/or group.Term "or" used herein and "and/or" quilt It is construed to inclusive or means any one or any combinations.Therefore, " A, B or C " or " A, B and/or C " mean " with Descend any one:A;B;C;A and B;A and C;B and C;A, B and C ".Only when element, function, step or the combination of operation are in certain sides When inherently mutually exclusive under formula, it just will appear the exception of this definition.
In fraud detection technique, domain expert provides the warp of data classification for the core technology that fraud identifies The demand with classification results accuracy is tested, but parameter of the algorithm framework and in algorithm in itself is not known to them.Field Expert is examined due to the mode for having no way of classifying to data during being detected when obtaining fraud using fraud detecting system When surveying result, domain expert is other than verifying testing result, the accuracy for obtained testing result of having no way of judging. In order to improve the accuracy of fraud detecting system, the application provides a kind of number of users applied to fraud detecting system According to method for visualizing, obtained group categorized in fraud detecting system and its data set are shown in a manner of visual To algorithm expert and domain expert so that different domain experts or algorithm expert is various to explore by a variety of interactive means Fraud, and fraud detection algorithm can flexibly be changed according to fraud feature.
The user data method for visualizing is mainly performed by computer equipment.The computer equipment can be following Suitable computer equipment, such as handheld computer device, tablet computer equipment, notebook computer, desktop PC, Server etc..Computer equipment includes display, input unit, input/output (I/O) port, one or more processors, deposits Reservoir, non-volatile memory device, network interface and power supply etc..The various parts may include hardware element (such as core Piece and circuit), software element (such as tangible non-transitory computer-readable medium of store instruction) or hardware element and software The combination of element.In addition, it may be noted that various parts can be combined into less component or be separated into additional component.For example, Memory and non-volatile memory device can be included in single component.The computer equipment can be individually performed described visual Change method coordinates execution with other computer equipments.In some embodiments, computer equipment performs method for visualizing simultaneously Corresponding visualization interface is shown.For example, computer equipment includes processor, display, wherein, in the processor Engine (or display engine) is presented in upper perform, and the engine that presents is used to perform the user data method for visualizing and pass through Display is shown, here, the presentation engine includes but not limited to parse is used for boundary based on what program language was developed The software and hardware that face is shown, such as XML, HTML script, C language etc..In yet other embodiments, a computer Equipment performs method for visualizing and another computer equipment is supplied to be shown corresponding visualization interface.It is for example, objective Family end group operates in the request of user and initiates to ask to server-side and log in the server-side, server-side perform method for visualizing with Corresponding interface data is formed, and the interface data is fed back into client, by the browser of client or the application of customization Program shows corresponding diagram according to respective interface data.
The method for visualizing can be applied to fraud detecting system.The fraud detecting system may include one Or the software and hardware in multiple computer equipments.It is assorted in order to done to domain expert's one group of offer as a fraud group And " whether same group of user has identical behavioural habits " that algorithm expert is proposed.The application is used from group A kind of method for visualizing is provided in terms of the grouping process at family.Referring to Fig. 1, it is shown as the number of users of the application in one embodiment According to method for visualizing flow chart.As shown in the figure, the user data method for visualizing includes the following steps:
In step s 11, the data set of a group is obtained.The data characteristics of the data set is believed including at least user Breath, IP address, event type, event initiate source, event response side and Time To Event.The user information refers to table The information of user identity is levied, for example, User ID, unique user's pet name, certificate number etc..The user information further includes:Mobile phone Number, mailbox, ID number, gender, user equipment used by a user number, registion time etc..The IP address represents same use Family information generates the IP address of computer equipment corresponding during event in a network or IP address is segmented or IP address grouping.Institute It states event type and is recorded on the type that user behavior event is represented in network operation daily record, include but not limited to:Network is used The concern that is carried out between family, the Social behaviors such as thumb up, comment on, presenting or the network user logged in, published, more new state, At least one of operation behaviors such as registration, modification information.Same user information can correspond at least one event type, each thing Part type corresponds to event and initiates source, event response side and Time To Event.For example, same user information can correspond to multiple thumb up Event type each thumbs up event type and corresponds to respective event initiation source, event response side and Time To Event.The event Initiation source refers to initiate user information of an event type etc..The event response side includes the mesh of initiated event type Mark user information etc..
Here, group's grouping of collected user is determining according to data characteristics based on collected constituent clusters. The detection algorithm (such as unsupervised detection algorithm) for the group member for participating in the fraud time is preset in fraud detecting system. The detection algorithm is for group member of accurately classifying, the decision priority pair of all data characteristicses based on collected member All members carry out hierarchical classification.Different frauds corresponds to the detection algorithm of different decision priority.
In some embodiments, the detection algorithm carries out decision according to the similarity of the data characteristics of all members Classification.Specifically, referring to Fig. 2, being shown as a kind of embodiment provided herein to obtain group data collection Flow chart, as shown in the figure, the step S11 further comprises:
Step S111 obtains the operation log that cluster is made of multiple network users;In various embodiments, the collection Group is the cluster of all-network user composition that can be got, the network user in the cluster from same website or The different website of person also or from different Internet channels, for example can be internet, one or more intranets, local Net (LAN), wide area network (WLAN), storage area network (SAN) etc. or its appropriately combined or mobile phone mobile communication Network etc..
Step S112 determines at least one data characteristics from the operation log of the multiple network user, and analyzes institute The similarity of at least one set of data characteristics in operation log is stated to determine the group;In the particular embodiment, for network Fraud will necessarily leave the characteristics of user is using data in a network, be collected in fraud detecting system from least one The operation log of multiple network users of a website, by analyzing the similar of at least one data characteristics in the operation log Degree is grouped the user for generating corresponding operating daily record, obtains the data set of group and group in operation log.
In certain embodiments, the data set positioned at a group includes but unlimited user information, IP address, event class Type, event initiate at least the two data characteristics in source, event response side and Time To Event.Wherein, the user information Such as phone number, mailbox, ID number, identification card number, gender, user equipment used by a user number, registion time.Wherein, Same user information can correspond at least one event type, and each event type corresponds to event and initiates source, event response side and thing Part time of origin.The affair character includes but not limited to:The concern that is carried out between the network user, thumb up, comment on, presenting (or Person is referred to as to give a present) etc. Social behaviors or the network user logged in, published, more new state, registration, the behaviour such as modification information Make at least one of behavior.For example, same user information can correspond to it is multiple thumb up event type, each thumb up event type pair Respective event is answered to initiate source, event response side and Time To Event.
Step S113 obtains the data set of the group.In some embodiments, the data set can be obtained from a storage You Ge groups and its database of data set, the database are for example configured in the storage server of a distal end or are configured In storage device in local computer equipment, then the data set of an acquired group can be grasped based on the input of user Work is extracted from database and is obtained.For example, the fraud detecting system obtains multiple groups using unsupervised detection algorithm Group, user select one of group by selection interface, then obtain the data set of relevant groups.
Specifically, the fraud detecting system first to data all in operation log same class data characteristics phase It is calculated like degree, wherein, the similarity available information entropy is weighed, for example, the fraud detecting system point Not Li Yong user information calculate the comentropy of IP usage amounts or maximum IP usage amount dimensions, utilize event type calculating operation type The comentropy of dimension calculates the comentropy of bad operation dimension using the comentropy of registion time dimension or operating time;By By above-mentioned calculating, unsupervised detection mode is recycled to be detected obtained each comentropy and divides to obtain multiple groups Group.Wherein, the unsupervised detection mode citing is included using the algorithm based on dense subgraph or the calculation based on vector space Method etc..Each group that method for visualizing provided herein is presented for reflect shared resource used in fraud, Customer relationship etc., the user using the fraud detecting system to be allowed more clearly to determine in the unsupervised detection algorithm Classification policy it is whether reasonable.Wherein, the shared resource includes but not limited to shared IP, mailbox etc., and customer relationship includes But it is not limited to:User's concern, interactive relation etc..
In one embodiment, the method for visualizing further includes the step of showing at least one group interface, the group Group size in class boundary face is characterized with the geometric figure size shown.Implement referring to Fig. 3, being shown as the application one The interface for including multiple groups shown in example, as shown in the figure, 11 groups are shown in interface, for characterizing those groups Geometric figure is circle, and 11 groups are all located in a maximum circle of dotted line, in the circle of dotted line, such as the void Line circle is used for characterizing cluster be made of N number of network user, such as the group marked as 0 is normal group, one compared with There is of different sizes 10 group marked as 1-10 in small circle of dotted line, circular size is directly proportional to the number of members of group, That is, the big group's person's of being expressed as quantity is more, the small group's person's of being expressed as negligible amounts, for another example the group marked as 1-10 is different Normal group.In various embodiments, the geometric figure of the group can be arbitrary shape.The color of geometric figure can It is randomly provided or related to the number of members of the quantity of group or group.For example, N kind colors are preset with, the fraud inspection Examining system randomly corresponds to different colours on the geometric figure for characterizing each group.For another example, the fraud detecting system According to preset color sequences, it is corresponding in turn on the geometric figure for characterizing each group according to the ascending sequence of number of members. When display interface described in user's operation chooses a geometric figure, described one group of fraud detecting system acquisition Data set.
In a preferred embodiment, display group information can also be included at least one group interface of display Information bar, when user selects a group in the group interface, in the side at interface with the side of form or text box Formula shows the essential information of the group, and the essential information is, for example,:Group's coding, number of members, for determining the group The information such as the most preferred data characteristics of group, group attribute (such as normal group or abnormal group).
In order to show the decision process of Suo Fen groups, fraud detecting system performs step S12 after the grouping, with tree The form of shape structure is by the group user that of classifying in corresponding detection algorithm according to the data characteristics of decision priority Grouping process is shown that thus domain expert and/or algorithm expert are solved by visualization interface in corresponding detection algorithm Deficiency and defect.
In step s 12, a decision tree figure is shown to characterize the attribute test process of all users in the group. Wherein, the attribute of the user may include normal users (Normal) and abnormal user (Abnormal) or comprising just common Family (Normal), fraud role A (Abnormal A), fraud role B (Abnormal B) etc..In display interface, this step Fraud inspection is characterized up to each leaf node via decision path or each nonleaf node from the root node set with tree Examining system is using detection algorithm from highest priority until same each user of group belongs to obtained from lowest priority hierarchical classification The process of property.Wherein, it is shown in display interface illustrated below:The data of first priority of the decision tree figure root node Feature and its decision codomain;The final attribute of at least one user of each leaf node characterization of decision tree figure;It is described to determine The current attribute of multiple users, the data characteristics of current priority and its decision value of each nonleaf node characterization of plan tree graph shape Domain;And root node or the decision path of each nonleaf node in the corresponding decision tree figure, those decision paths are not with The lines of same color, shape or thickness are characterized.By the decision tree figure as it can be seen that being divided into user's quilt of each leaf node It determines to be detected as normal users or the final attribute of abnormal user, classification need to be continued until quilt by being divided into the user of each nonleaf node Determining leaf node is assigned to determine the final attribute (i.e. normal users and abnormal user) of relative users.
Wherein, the result of decision of decision tree is the original value and corresponding decision according to each user data feature of step-by-step analysis The relationship of value threshold and classify.For example, certain selected user in the fraud detecting system, by decision hedge clipper After branch, it is utilized respectively the out-degree (out_ of maximum IP usage amounts, the user in social networks from high to low according to priority Degree) with user information IP usage amounts are calculated, decision is grouped to all users.Wherein, the unsupervised detection algorithm side Formula citing is included using the algorithm based on dense subgraph or algorithm based on vector space etc..Provided herein is visual Each group that change method is presented is for reflecting shared resource used in fraud, customer relationship etc., to allow described in use The user of fraud detecting system more clearly determines whether the classification policy in the unsupervised detection algorithm is reasonable.Its In, the shared resource includes but not limited to shared IP, mailbox etc., and customer relationship includes but not limited to:User's concern, interaction Relationship etc..
Referring to Fig. 4, it is shown as the schematic diagram of a group user decision tree figure.As shown in the figure, the decision tree diagram The root node of shape shows that the data characteristics of highest priority is maximum IP usage amounts, and weigh group with maximum IP usage amounts The attributive classification of user, i.e., when the maximum IP usage amounts (max_IP_used_be_used_amount) corresponding to a user≤ When 80.5, relative users are classified to first nonleaf node along " indigo plant " color decision path, conversely, then by relative users edge " Huang " Color decision path is classified to first leaf node.First nonleaf node initiates data spy of the source for the second priority according to event Sign continues to carry out acquired each user classification judgement, i.e., initiates source using event to weigh current acquired each user's Attributive classification, when the out-degree (out_degree)≤711.0 with event initiation source in social networks corresponding to a user When, relative users are classified to second leaf node along " indigo plant " color decision path, conversely, then by relative users edge " Huang " color decision Route classification is to second nonleaf node.Second nonleaf node continues according to data characteristics of the IP usage amounts for third priority Classification judgement is carried out to acquired each user, i.e., the attribute point of current acquired each user is weighed using IP usage amounts When corresponding to a user with IP usage amounts (IP_used_amount)≤870.0, relative users are determined along " indigo plant " color for class Plan route classification is to third leaf node, conversely, relative users then are classified to the 4th leaf node along " Huang " color decision path. Wherein, show decision path that attribute that the nonleaf node in currently point priority classified is " normal users " with " indigo plant " color table, " Huang " color table shows the decision path that the attribute classified under current priority is " abnormal user ".Wherein, each nonleaf node Middle user weighs the codomain of corresponding information, and as illustrated in the drawing 80.5,711.0 and 870.0 etc., it is corresponding current preference series According to the decision codomain of feature.
In various embodiments, the difference of decision path can also be characterized with lines of different shapes in display, For example the user property gone out with solid line characterization decision is normal users, the user property gone out with dotted line characterization decision is abnormal use Family, then alternatively, the user property gone out with straight line characterization decision is normal users, the user property gone out with curve characterization decision is different Common family, more alternatively, the lines of thickness characterize the difference of decision path, such as the user property gone out with hachure characterization decision For normal users, the user property gone out with thick lines characterization decision is abnormal user etc..
For the number of users for being more clearly seen each nonleaf node and acquired in leaf node, a decision tree is being shown To characterize in the group during the attribute test of all users, the root node in the decision tree figure is also shown figure It group user quantity (sample size that i.e. root node gives) and is also shown in each nonleaf node of the decision tree figure The number of users (sample size that i.e. current nonleaf node obtains) of current attribute.Referring to Fig. 5, its decision tree figure for showing In further include the display interface of the number of users for being classified to each node, wherein, the sample_size shown in root node is root section The given sample size of point, i.e. group member sum, the sample_size that other nonleaf nodes are shown are obtained for current nonleaf node Sample size, the sample_size expressions shown in leaf node are classified to the number of users of own node by upper level.
It should be noted that it is different according to the type of fraud, the design of unsupervised detection algorithm, operate day in detection In will in each Group Decision assorting process, each data characteristics priority, the decision codomain of each priority, the superior and the subordinate are adjacent excellent First grade relationship, decision path at different levels etc. all may be different.Or even in order to get more quickly to the group of each user in operation log The result of decision, used unsupervised detection algorithm can cut selected data characteristics according to convergent in training Choosing, i.e., when trained detection algorithm has reached the condition of convergence, remaining data characteristics will be handled by beta pruning, institute's beta pruning processing Data characteristics will be not displayed on the display interface of decision tree figure.Alternatively, all users are examining in acquired group It is had been identified as in first several grades of classification in method of determining and calculating, then remaining data characteristics can be handled by beta pruning, and display module is only The decision tree figure of each node that display is connected comprising all decision paths and each decision path.It is described herein when utilizing When method for visualizing is shown the categorised decision process of a group, domain expert and algorithm expert are easier to evaluate the inspection The accuracy of method of determining and calculating.
It is another aobvious in the display interface of the decision tree figure or what is redirected based on acquired operational order Show in interface, the method for visualizing further includes:Determine a user in the group as target user;And described The side of decision tree figure shows a time shaft, the step of the operation log of the target user on the time axis is presented Suddenly.
Here, when domain expert or algorithm expert click a leaf node and by the pop-out of leaf node in choose one with When user links, operation log of the relative users in time shaft is shown on the side of decision tree figure.Referring to Fig. 6, It is shown as left side as the operation log of target user on a timeline, the interface signal in right side display Group Decision tree graph shape Figure.According to the sequential node of time sequencing in time shaft marking operation daily record from top to bottom as depicted, by each sequential node Show event type (such as event_type) in the operation log corresponding to corresponding time point, event generation time (such as Timestamp), user information (such as user_id), IP address (such as complete IP address or IP segmentation), event response side be (such as Target_user), event content (such as comment_id, comment_lenth, amount, object_id, target_ Video etc.), event type (such as event_type).By showing, each operation of user on a timeline is gone through in group History, can allow domain expert and algorithm expert checks the accuracy of the detected user property positioned at same group in detail, with And adhere to the general character relationship of normal users and abnormal user in same group separately, and then confirm the deficiency and defect of detection algorithm.
In other embodiments, domain expert and algorithm expert are not only concerned about the member property assorting process of group, It is also concerned about whether distributed group is reasonable, this needs them that can check the detailed data feature in each group, and from another A kind of dimension opens the preferred order of each data characteristics checked and built for classifying group.The method for visualizing may include showing The step of showing the interface of the data set of a group.Shown data set is shown with list mode, is thus shown for user Show the details of data characteristics in same group.To improve the group data collection classification accuracy, shown in the interface Priority of classifying based on when the list shown can classify according to fraud detecting system is by the data characteristics in a group List is shown by column.For example, the referring to Fig. 7, row of the data set of a group that display the application is shown in one embodiment Surface and interface schematic diagram.In the list interface schematic diagram, the data set of shown group is according to data characteristics Obtained by similitude sorts for the sequence of priority from high to low.When the data characteristics similitude in the first priority is identical, Data characteristics according to the second priority is ranked up, in the embodiment shown in fig. 7, the sequence of the priority from high to low For:IP address, event initiate source (source), event response side (target), event type (event_type) and event hair Raw time (timestamp).In the present embodiment, the new line of table (gauge outfit) is encoded with the importance of different lines, such as The value of one feature of fruit is more concentrated, then this feature is more important.In the embodiment provided in the application, the fraud Event detection system is to represent this characteristic by calculating the comentropy of each feature.If comentropy is lower, then meaning It is higher consistency.Then feature is ranked up by the fraud detecting system according to the incremental sequence of comentropy, most At last the list head front of low comentropy come prompting family note that certain, can also be according to will under different performances List head in the table of display carries out color rendering, for example most the color rendering of the list head of low comentropy is most deep next at last Prompt the data characteristics that the attentions row at family are characterized mostly important, and so on the progress color rendering row characterized other Data characteristics, and then obtain data set list interface shown in figure.The list interface can be undertaken on the multiple group interfaces of display The step of after or step S12 before or after, then the selection operation of the list interface is selected based on user and is shown.
In certain embodiments, whether the data set of the group acquired for further characterization can reflect fraud Characteristic, it is also necessary to be shown from other dimensions.For example, network operation data and group data by comparing normal users Collect the accuracy to further confirm that detected fraud.For this purpose, the method for visualizing further includes:Show the group Data set feature distribution interface the step of.Wherein, the feature distribution interface can be shown with each data type in entirety Distribution in network, the overall network are opposite, for example form a cluster by multiple network users, then can pass through The distribution of some data characteristics in the interface display cluster in some group, referring to Fig. 3, maximum empty in such as Fig. 3 Line circle represents one and forms cluster by multiple network users, and cluster Zhong You11Ge groups are the group that number is 0-10 respectively, Therefrom a group is selected to be shown into row information.
In some embodiments, the data type that feature distribution interface can be shown is, for example,:It ties up at average operating time interval The comentropy (average operation interval entropy) of degree, the comentropy (IP of IP address usage amount dimension Used amount entropy), the comentropy (sex entropy) of gender dimension, the comentropy (email of Email dimension Entropy), the comentropy (reg time entropy) of registion time dimension, the comentropy of number of operations dimension (operation times entropy), the comentropy (device amount entropy) of number of devices dimension operate class The comentropy (operation type entropy) of type dimension, the maximum amount of comentropy (max that is used by others using IP IP used be used amount entropy) etc...In the embodiment shown in fig. 8, with the information of registion time dimension For entropy to be shown for data characteristics, i.e. Fig. 8 is shown as the comentropy of registion time in a group (registration period) dimension Feature distribution in network cluster.In order to which effective ratio is to the network operation data of acquired group data collection and normal users Feature distribution difference, referring to Fig. 9, its flow for being shown as showing the interface of the feature distribution of the data set of the group Figure, as shown in the figure, including the following steps:
In step S211, a group is selected, and at least one data are determined from the data set of the group Feature.In one embodiment, group marked as 2 such as in selection Fig. 3, and from the data in the group marked as 2 The data characteristics for determining that one is user information is concentrated, for example the user information is registion time.
In step S212, determining at least one data characteristics feature in the group and cluster point is counted Cloth.In the present embodiment, the statistics feature distribution and statistics institute for the data characteristics of registion time in the group State feature distribution of the data characteristics for registion time in the entire cluster.
In step S213, show the histogram of the feature distribution and correspond to the histogram in entire cluster histogram In profiles versus figure.In the present embodiment, based on the coding to the data characteristics, the display data for registion time Feature in the group histogram of feature distribution and the display data characteristics for registion time in the entire collection The histogram of feature distribution in group.As shown in figure 8, in the interface D, figure (a) is shown as noting in the selected group marked as 2 The feature distribution thumbnail of volume time corresponds to the amplification of the thumbnail, then the enlarged drawing (d) for lower side in the D of interface, by institute It states enlarged drawing to can be seen that in the group, from August 1 day to August one middle of the month of 31 days, which carries out registration behaviour The time of work concentrates on August 5, August 6 days, August 11 days, August 12 days and August this 5 days on the 16th, and in the interface D Figure (c) is characterized as the histogram that registered user in the cluster carries out the Annual distribution of registration operation in August part, from the figure (c) as can be seen that registered user has certain rule in the in one's duty registration distribution of August in the cluster, scheme in the D of interface (b) data characteristics for being characterized as overlaping in figure (d) and figure (c) being shown as registion time is in the entire cluster With the difference in the group of selection.In order to allow users to know the difference and contact between different characteristic, the application This block diagram is presented in the form of three layers in the embodiment of offer, after user is by clicking one of thumbnail, page Face will be scrolled into schemes by normalized profiles versus.Certainly, in specific application, the thumbnail of the data characteristics may be used also Can have multiple, each represent different data characteristicses.
In some embodiments, it can also distinguish or emphasize some data characteristics by carrying out color rendering to histogram Feature distribution or Dynamic Announce (such as the mode flickered) are to distinguish or emphasize certain number in the group and entire cluster According to feature in the group and entire cluster feature distribution.
In some embodiments, it is described in order to further analyze the difference between multiple groups in a network cluster User data method for visualizing further includes the step of interface of the feature distribution for the data set for showing multiple groups, please refers to Fig.1 0 And Figure 11, Figure 10 are shown as the application and show the step flow chart that multiple groups are distributed in the cluster in one embodiment, figure 11 are shown as the application shows multiple groups distribution interface E in the cluster in one embodiment, as shown in the figure, the step packet It includes:
In step S311, multiple groups are determined in the cluster be made of multiple network users, respectively with different shape, figure Mark, label and/or the difference of the multiple group of characterization;In one embodiment, for example, selection Fig. 3 in label 0,1 and 2 3 groups, wherein, the group marked as 0 is shown with " " color table, and the group marked as 1 is shown with " red " color table, the group marked as 2 Group is shown with " indigo plant " color table.
In step S312, at least one data characteristics is determined from the data set of the multiple group;In the present embodiment In, a data characteristics, such as IP address are determined from the data set of this 3 groups.
In step S313, based between each two network user at least one data characteristics analysis respectively group Relative Entropy as the similarity degree between measuring each two network user;In the present embodiment, based on the IP Relative Entropy (the letter of IP usage amount dimensions in 3 groups of adress analysis label 0,1 and 2 between each two network user Cease entropy, IP used amount entropy) as the similarity degree measured between each two network user.For example, it adopts It is used as with the method t-SNE of Data Dimensionality Reduction (t- distribution neighborhoods embedded mobile GIS) and with the relative entropy between two users and measures this The index of a little network user's distances.
In step S314, display interface is exported, in the interface, with shape, icon, and/or tag characterization network User characterizes the difference of the multiple group with different colours, and two network users in each group are characterized with the distance of display Between similarity degree.In the present embodiment, in interface E as shown in figure 11, the network user is characterized with dot, " green " color table shows Group marked as 0, shows the group marked as 1 with " red " color table, shows the group marked as 2 with " indigo plant " color table, wherein, with " indigo plant " Color table shows that the user distance in the group marked as 2 is shorter, which forms tufted distribution, is shown with " red " color table marked as 1 User distance in group is also shorter, which forms tufted distribution, and point of the normal users of random sampling is shown with " green " color table Farther out, distribution more disperses the distance between cloth, normal users.Thereby it is believed that a group is if dense cluster, It is considered as a fraud group possibility it is bigger.Than the group that in embodiment as shown in figure 11, which shows In the distribution more disperseed, then it represents that for should " green " colo(u)r group group be normal group, it is therein it is " green " point represent user also be just Common family.Opposite, the group (group i.e. marked as 1) shown with " red " color table and the group shown with " indigo plant " color table (mark Number group for being 2) be distributed in into tufted, then it represents that for should " red " and " indigo plant " colo(u)r group group be exception group, wherein, put with " red " and The user that " indigo plant " point represents is abnormal user.In one embodiment, led to using user's interactive of the visualization system Mouse is crossed to suspend to check the specifying information of user and feature value in each group.
In other examples, in the interface of output, for example, shape, icon, and/or tag characterization can also be used The network user, such as shape are the geometric figures such as triangle, rectangle, for example icon is smiling face or face of crying, human skeleton head portrait, Qiang Daotou As etc. icons, such as label word or with symbol for clearly distinguishing etc..
The user data method for visualizing of the application is by the way that the determining group user of institute in fraud detection process is grouped The modes such as process, data characteristics distribution, tabulation are presented, and realize during fraud is detected Suo Fen groups with more Kind relationship interface is shown, and is conducive to domain expert and algorithm expert and the detection algorithm of fraud detecting system is commented Estimate and revise.
The application also provides a kind of computer equipment, and the computer equipment can be following suitable computer equipment, Such as handheld computer device, tablet computer equipment, notebook computer, desktop PC, server etc..Computer is set It is standby to include display, input unit, input/output (I/O) port, one or more processors, memory, non-volatile memories Equipment, network interface and power supply etc..The various parts may include hardware element (such as chip and circuit), software member The combination of part (such as tangible non-transitory computer-readable medium of store instruction) or hardware element and software element.In addition, It may be noted that various parts can be combined into less component or be separated into additional component.For example, memory and non-volatile Storage device can be included in single component.The computer equipment can be individually performed the method for visualizing or and other Computer equipment cooperation performs.
2 are please referred to Fig.1, is shown as the configuration diagram of the application computer equipment in one embodiment, as shown in the figure, In present embodiment, the computer equipment 1 include one or more processors 11 and what is performed on the processor 1 be in Existing engine 12, to perform above-mentioned method for visualizing and be shown corresponding visualization interface.For example, computer equipment packet Containing processor 11, display and the presentation engine 12 performed on the processor 11, wherein, it is held on the processor 11 Capable presentation engine (or display engine), the user data that engine 12 is presented for performing described in above-described embodiment are visual Change method simultaneously passes through display and is shown, performs the description of implementation process of the user data method for visualizing refering to being directed to The description of Fig. 1 to Figure 11.Under specific implementation state, the presentation engine is, for example, to be stored in local computer device On memory or in remote storage server, the presentation engine includes but not limited to parse to be developed based on program language The software and hardware for interface display, such as XML, HTML script, C language etc..In yet other embodiments, one Platform computer equipment performs method for visualizing and another computer equipment is supplied to be shown corresponding visualization interface. It initiates to ask and log in the server-side to server-side for example, request of the client based on user is operated, server-side performs visual The interface data is fed back to client by change method to form corresponding interface data, by the browser of client or fixed The application program of system shows corresponding diagram according to respective interface data.
The application also provides a kind of client, and the client passes through one server-side of network connection, in the present embodiment, institute It is, for example, web client to state client, and the client is, for example, web services end, and the web client is based on sending web industry Business request performs the user data method for visualizing described in above-described embodiment and passes through display to log in the web services end It is shown, the description for performing the implementation process of the user data method for visualizing refers to the description for being directed to Fig. 1 to Figure 11.
The application also provides a kind of server, passes through one client of network connection, in the present embodiment, the client example It is such as web client, the client is, for example, web services end, and the web server performs request based on web client Operation sends the user data method for visualizing performed described in above-described embodiment to the client and passes through display and give It has been shown that, the description for performing the implementation process of the user data method for visualizing refer to the description for being directed to Fig. 1 to Figure 11.
The application also provides a kind of browser, by one server-side of network connection, the browser be based on sending request with It logs in the server-side and performs the user data method for visualizing described in above-described embodiment and pass through display and shown, held The description of the implementation process of the row user data method for visualizing refers to the description for Fig. 1 to Figure 11.In the present embodiment, The browser is, for example, web browser, including but not limited to QQ browsers, Internet Explorer browsers, Firefox browser, Safari browsers, Opera browsers, Google Chrome browsers, baidu browser, search dog are clear Look at device, cheetah browser, 360 browsers, UC browsers, proud trip browser, Window on the World browser etc..
The application also provides a kind of user data visualization system, the user data visualization system may include one or Software and hardware in multiple computer equipments, and the data set for the group that fraud detecting system is detected carries out visually Change.Do what what and algorithm expert were proposed as a fraud group to provide group one by one to domain expert " whether same group of user has identical behavioural habits ".The application provides a kind of user data from group member relationship Visualization system.3 are please referred to Fig.1, is shown as the modular structure signal of user data visualization system provided herein Figure.As shown in the figure, the user data visualization system 3 includes acquisition module 31 and display module 32.
The acquisition module 31 is used to obtain the data set of a group.The data characteristics of the data set, which includes at least, to be used Family information, IP address, event type, event initiate source, event response side and Time To Event.The user information refers to energy The information of user identity is enough characterized, for example, User ID, unique user's pet name, certificate number etc..The user information further includes: Phone number, mailbox, ID number, gender, user equipment used by a user number, registion time etc..The IP address represents same One user information generates the IP address of computer equipment corresponding during event in a network or IP address is segmented or IP address point Group.The event type is recorded on the type that user behavior event is represented in network operation daily record, includes but not limited to:Net The concern that is carried out between network user, thumb up, comment on, presenting and (being either referred to as to give a present) etc. Social behaviors or the network user into At least one of operation behaviors such as row is logged in, published, more new state, registration, modification information.Same user information can correspond to A few event type, each event type correspond to event and initiate source, event response side and Time To Event.It is for example, same User information can correspond to it is multiple thumb up event type, each thumb up event type and correspond to respective event and initiate source, event response side And Time To Event.The event initiates source and refers to initiate user information of an event type etc..The event response side Including target user's information of event type for being initiated etc..
Here, group's grouping of collected user is determining according to data characteristics based on collected constituent clusters. The detection algorithm (such as unsupervised detection algorithm) for the group member for participating in the fraud time is preset in fraud detecting system. The detection algorithm is for group member of accurately classifying, the decision priority pair of all data characteristicses based on collected member All members carry out hierarchical classification.Different frauds corresponds to the unsupervised detection algorithm of different decision priority.
In some embodiments, the detection algorithm carries out decision according to the similarity of the data characteristics of all members Classification.Specifically, referring to Fig. 2, being shown as a kind of embodiment provided herein to obtain group data collection Flow chart, as shown in the figure, the acquisition module can obtain one from being concentrated based on the obtained multiple group datas of following steps The data set of group:
Step S111 obtains the operation log that cluster is made of multiple network users;In various embodiments, the collection Group is the cluster of all-network user composition that can be got, the network user in the cluster from same website or The different website of person also or from different Internet channels, for example can be internet, one or more intranets, local Net (LAN), wide area network (WLAN), storage area network (SAN) etc. or its appropriately combined or mobile phone mobile communication Network etc..
Step S112 determines at least one data characteristics from the operation log of the multiple network user, and analyzes institute The similarity of at least one set of data characteristics in operation log is stated to determine the group;In the particular embodiment, for network Fraud will necessarily leave the characteristics of user is using data in a network, be collected in fraud detecting system from least one The operation log of multiple network users of a website, by analyzing the similar of at least one data characteristics in the operation log Degree is grouped the user for generating corresponding operating daily record, obtains the data set of group and group in operation log.
In certain embodiments, the data set positioned at a group includes but unlimited user information, IP address, event class Type, event initiate at least the two data characteristics in source, event response side and Time To Event.Wherein, the user information Such as phone number, mailbox, ID number, identification card number, gender, user equipment used by a user number, registion time.Wherein, Same user information can correspond at least one event type, and each event type corresponds to event and initiates source, event response side and thing Part time of origin.The affair character includes but not limited to:The concern that is carried out between the network user such as thumbs up, comments on, presenting at the societies At least one in the operation behaviors such as Bank of Communications is or the network user is logged in, published, more new state, registration, modification information Person.For example, same user information can correspond to it is multiple thumb up event type, each thumb up event type and correspond to respective event and initiate Source, event response side and Time To Event.
Step S113 obtains the data set of the group.In some embodiments, the data set can be obtained from a storage You Ge groups and its database of data set, the database are for example configured in the storage server of a distal end or are configured In storage device in local computer equipment, then the data set of an acquired group can be grasped based on the input of user Work is extracted from database and is obtained.For example, the fraud detecting system obtains multiple groups using unsupervised detection algorithm Group, user select one of group by selection interface, then obtain the data set of relevant groups.
Specifically, the fraud detecting system first to data all in operation log same class data characteristics phase It is calculated like degree, wherein, the similarity available information entropy is weighed, for example, the fraud detecting system point Not Li Yong user information calculate the comentropy of IP usage amounts or maximum IP usage amount dimensions, utilize event type calculating operation type The comentropy of dimension calculates the comentropy of bad operation dimension using the comentropy of registion time dimension or operating time;By By above-mentioned calculating, unsupervised detection mode is recycled to be detected obtained each comentropy and divides to obtain multiple groups Group.Wherein, the unsupervised detection mode citing is included using the algorithm based on dense subgraph or the calculation based on vector space Method etc..Each group that method for visualizing provided herein is presented for reflect shared resource used in fraud, Customer relationship etc., the user using the fraud detecting system to be allowed more clearly to determine in the unsupervised detection algorithm Classification policy it is whether reasonable.Wherein, the shared resource includes but not limited to shared IP, mailbox etc., and customer relationship includes But it is not limited to:User's concern, interactive relation etc..
In one embodiment, the display module 32 in the user data visualization system can show at least one group Class boundary face, the group size in the group interface are characterized with the geometric figure size shown.Referring to Fig. 3, it is shown as The interface for including multiple groups that the application is shown in one embodiment as shown in the figure, showing 11 groups in interface, is used for The geometric figure for characterizing those groups is circle, and 11 groups are all located in a maximum circle of dotted line, in the dotted line In circle, for example the circle of dotted line is used for characterizing cluster be made of N number of network user, such as the group marked as 0 is just , there is of different sizes 10 group marked as 1-10 in normal group in a smaller circle of dotted line, circular size and group Number of members is directly proportional, that is, the big group's person's of being expressed as quantity is more, the small group's person's of being expressed as negligible amounts, for another example label Group for 1-10 is abnormal group.In various embodiments, the geometric figure of the group can be arbitrary shape. The color of geometric figure can be randomly provided or related to the number of members of the quantity of group or group.For example, it is preset with N kind face Color, the fraud detecting system randomly correspond to different colours on the geometric figure for characterizing each group.For another example, it is described Fraud detecting system is corresponding in turn to each group of characterization according to preset color sequences according to the ascending sequence of number of members On the geometric figure of group.When display interface described in user's operation chooses a geometric figure, the fraud detection system System obtains the data set of a group.
In a preferred embodiment, display group information can also be included at least one group interface of display Information bar, when user selects a group in the group interface, in the side at interface with the side of form or text box Formula shows the essential information of the group, and the essential information is, for example,:Group's coding, number of members, for determining the group The information such as the most preferred data characteristics of group, group attribute (such as normal group or abnormal group).
In order to show the decision process of Suo Fen groups, after the grouping, display module 32 is with tree for fraud detecting system The form of shape structure is by the group user that of classifying in corresponding detection algorithm according to the data characteristics of decision priority Grouping process is shown that thus domain expert and/or algorithm expert are solved by visualization interface in corresponding detection algorithm Deficiency and defect.
The display module 32 is surveyed for showing a decision tree figure to characterize the attribute of all users in the group Examination process.Wherein, the attribute of the user may include normal users (Normal) and abnormal user (Abnormal) or comprising Normal users (Normal), fraud role A (Abnormal A), fraud role B (Abnormal B) etc..In display interface, The display module 32 is with the root node that tree is set certainly via decision path or each nonleaf node until each leaf node comes Fraud detecting system is characterized using detection algorithm from highest priority until same obtained from lowest priority hierarchical classification The process of each user property of one group.Wherein, it is shown in display interface illustrated below:The of the decision tree figure root node The data characteristics of one priority and its decision codomain;At least one user of each leaf node characterization of decision tree figure is most Whole attribute;The current attribute of multiple users of each nonleaf node characterization of decision tree figure, the data of current priority are special Sign and its decision codomain;And root node or the decision path of each nonleaf node in the corresponding decision tree figure, those The lines of decision path different colours, shape or thickness are characterized.By the decision tree figure as it can be seen that being divided into each leaf The user of node is determined being detected as normal users or the final attribute of abnormal user, and the user for being divided into each nonleaf node need to be after Continuous classification is until be assigned to determining leaf node to determine the final attribute (i.e. normal users and abnormal user) of relative users.
Wherein, the result of decision of decision tree is the original value and corresponding decision according to each user data feature of step-by-step analysis The relationship of value threshold and classify.For example, certain selected user in the fraud detecting system, by decision hedge clipper After branch, it is utilized respectively the out-degree (out_ of maximum IP usage amounts, the user in social networks from high to low according to priority Degree) with user information IP usage amounts are calculated, decision is grouped to all users.Wherein, the unsupervised detection algorithm side Formula citing is included using the algorithm based on dense subgraph or algorithm based on vector space etc..Provided herein is visual Each group that change method is presented is for reflecting shared resource used in fraud, customer relationship etc., to allow described in use The user of fraud detecting system more clearly determines whether the classification policy in the unsupervised detection algorithm is reasonable.Its In, the shared resource includes but not limited to shared IP, mailbox etc., and customer relationship includes but not limited to:User's concern, interaction Relationship etc..
Referring to Fig. 4, it is shown as the schematic diagram of a group user decision tree figure.As shown in the figure, the decision tree diagram The root node of shape shows that the data characteristics of highest priority is maximum IP usage amounts, and weigh group with maximum IP usage amounts The attributive classification of user, i.e., when the maximum IP usage amounts (max_IP_used_be_used_amount) corresponding to a user≤ When 80.5, relative users are classified to first nonleaf node along " indigo plant " color decision path, conversely, then by relative users edge " Huang " Color decision path is classified to first leaf node.First nonleaf node initiates data spy of the source for the second priority according to event Sign continues to carry out acquired each user classification judgement, i.e., initiates source using event to weigh current acquired each user's Attributive classification, when the out-degree (out_degree)≤711.0 with event initiation source in social networks corresponding to a user When, relative users are classified to second leaf node along " indigo plant " color decision path, conversely, then by relative users edge " Huang " color decision Route classification is to second nonleaf node.Second nonleaf node continues according to data characteristics of the IP usage amounts for third priority Classification judgement is carried out to acquired each user, i.e., the attribute point of current acquired each user is weighed using IP usage amounts When corresponding to a user with IP usage amounts (IP_used_amount)≤870.0, relative users are determined along " indigo plant " color for class Plan route classification is to third leaf node, conversely, relative users then are classified to the 4th leaf node along " Huang " color decision path. Wherein, show decision path that attribute that the nonleaf node in currently point priority classified is " normal users " with " indigo plant " color table, " Huang " color table shows the decision path that the attribute classified under current priority is " abnormal user ".Wherein, each nonleaf node Middle user weighs the codomain of corresponding information, and as illustrated in the drawing 80.5,711.0 and 870.0 etc., it is corresponding current preference series According to the decision codomain of feature.
In various embodiments, the difference of decision path can also be characterized with lines of different shapes in display, For example the user property gone out with solid line characterization decision is normal users, the user property gone out with dotted line characterization decision is abnormal use Family, then alternatively, the user property gone out with straight line characterization decision is normal users, the user property gone out with curve characterization decision is different Common family, more alternatively, the lines of thickness characterize the difference of decision path, such as the user property gone out with hachure characterization decision For normal users, the user property gone out with thick lines characterization decision is abnormal user etc..
For the number of users for being more clearly seen each nonleaf node and acquired in leaf node, a decision tree is being shown To characterize in the group during the attribute test of all users, the root node in the decision tree figure is also shown figure It group user quantity (sample size that i.e. root node gives) and is also shown in each nonleaf node of the decision tree figure The number of users (sample size that i.e. current nonleaf node obtains) of current attribute.Referring to Fig. 5, its decision tree figure for showing In further include the display interface of the number of users for being classified to each node, wherein, the sample_size shown in root node is root section The given sample size of point, i.e. group member sum, the sample_size that other nonleaf nodes are shown are obtained for current nonleaf node Sample size, the sample_size expressions shown in leaf node are classified to the number of users of own node by upper level.
It should be noted that it is different according to the type of fraud, the design of unsupervised detection algorithm, operate day in detection In will in each Group Decision assorting process, each data characteristics priority, the decision codomain of each priority, the superior and the subordinate are adjacent excellent First grade relationship, decision path at different levels etc. all may be different.Or even in order to get more quickly to the group of each user in operation log The result of decision, used unsupervised detection algorithm can cut selected data characteristics according to convergent in training Choosing, i.e., when trained detection algorithm has reached the condition of convergence, remaining data characteristics will be handled by beta pruning, institute's beta pruning processing Data characteristics will be not displayed on the display interface of decision tree figure.Alternatively, all users are examining in acquired group It is had been identified as in first several grades of classification in method of determining and calculating, then remaining data characteristics can be handled by beta pruning, display module 32 The decision tree figure of each node that only display is connected comprising all decision paths and each decision path.It is herein described when utilizing When user data visualization system is shown the categorised decision process of a group, domain expert and algorithm expert are easier to Evaluate the accuracy of the detection algorithm.
It is another aobvious in the display interface of the decision tree figure or what is redirected based on acquired operational order Show in interface, the display module 32 be additionally operable to determine the group in a user as target user;And described The side of decision tree figure shows a time shaft, the operation log of the target user on the time axis is presented.
Here, when domain expert or algorithm expert click a leaf node and by the pop-out of leaf node in choose one with When user links, operation log of the relative users in time shaft is shown on the side of decision tree figure.Referring to Fig. 6, It is shown as left side as the operation log of target user on a timeline, the interface signal in right side display Group Decision tree graph shape Figure.According to the sequential node of time sequencing in time shaft marking operation daily record from top to bottom as depicted, by each sequential node Show event type (such as event_type) in the operation log corresponding to corresponding time point, event generation time (such as Timestamp), user information (such as user_id), IP address (such as complete IP address or IP segmentation), event response side be (such as Target_user), event content (such as comment_id, comment_lenth, amount, object_id, target_ Video etc.), event type (such as event_type).By showing, each operation of user on a timeline is gone through in group History, can allow domain expert and algorithm expert checks the accuracy of the detected user property positioned at same group in detail, with And adhere to the general character relationship of normal users and abnormal user in same group separately, and then confirm the deficiency and defect of detection algorithm.
In other embodiments, domain expert and algorithm expert are not only concerned about the member property assorting process of group, It is also concerned about whether distributed group is reasonable, this needs them that can check the detailed data feature in each group, and from another A kind of dimension opens the preferred order of each data characteristics checked and built for classifying group.The method for visualizing is additionally operable to show Show the interface of the data set of a group.Shown data set is shown with list mode, is thus displayed for a user same The details of data characteristics in group.To improve the group data collection classification accuracy, shown row in the interface Based on when table can classify according to fraud detecting system classify priority by the data characteristics list in a group by Row displaying.For example, the referring to Fig. 7, list interface of the data set of a group that display the application is shown in one embodiment Schematic diagram.In the list interface schematic diagram, the data set of shown group is the similitude according to data characteristics For obtained by the sequence sequence of priority from high to low.When the data characteristics similitude in the first priority is identical, according to The data characteristics of two priority is ranked up, and in the embodiment shown in fig. 7, the sequence of the priority from high to low is:IP Address (segmentation or grouping of IP address), event initiate source (source), event response side (target), event type (event_type) and Time To Event (timestamp).In the present embodiment, the new line of table (gauge outfit) is used into different lines Importance encoded, if the value of a feature is more concentrated, then this feature is more important.It is provided in the application In one embodiment, the fraud detecting system is to represent this characteristic by calculating the comentropy of each feature.If Comentropy is lower, then means that consistency is higher.Then the fraud detecting system passs feature according to comentropy The sequence of increasing is ranked up, most at last the list head front of low comentropy come prompting family note that certain, different implementation In the case of, color rendering, such as the most list of low comentropy at last can also be carried out according to by the list head in the table shown Head color rendering prompt the attention at the family data characteristics that the row are characterized mostly important for most deep, and so on progress color Other data characteristicses that the row are characterized are rendered, and then obtain data set list interface shown in figure.The list interface can be held It is connected on after the step of showing multiple group interfaces or before or after step S12, then list circle is selected based on user The selection operation in face and show.
In certain embodiments, whether the data set of the group acquired for further characterization can reflect fraud Characteristic, it is also necessary to be shown from other dimensions.For example, network operation data and group data by comparing normal users Collect the accuracy to further confirm that detected fraud.For this purpose, the display module 32 is additionally operable to show the group Data set feature distribution interface.Wherein, the feature distribution interface can be shown with each data type in overall network Distribution, the overall network is opposite, for example forms a cluster by multiple network users, then can be shown by interface Show the distribution of some data characteristics in the cluster in some group, referring to Fig. 3, maximum circle of dotted line table in such as Fig. 3 Show that one forms cluster by multiple network users, cluster Zhong You11Ge groups are to number the group for being 0-10, Cong Zhongxuan respectively A group is selected to show into row information.
In some embodiments, the data type that feature distribution interface can be shown is, for example,:It ties up at average operating time interval The comentropy (average operation interval entropy) of degree, the comentropy (IP of IP address usage amount dimension Used amount entropy), the comentropy (sex entropy) of gender dimension, the comentropy (email of Email dimension Entropy), the comentropy (reg time entropy) of registion time dimension, the comentropy of number of operations dimension (operation times entropy), the comentropy (device amount entropy) of number of devices dimension operate class The comentropy (operation type entropy) of type dimension, the maximum amount of comentropy (max that is used by others using IP IP used be used amount entropy) etc...In the embodiment shown in fig. 8, with the information of registion time dimension For entropy to be shown for data characteristics, i.e. Fig. 8 is shown as the comentropy of registion time in a group (registration period) in net Feature distribution in network cluster.In order to which effective ratio is to the spy of the network operation data of acquired group data collection and normal users Distributional difference is levied, referring to Fig. 9, its flow chart for being shown as showing the interface of the feature distribution of the data set of the group, such as Shown in figure, user's visualization system performs following steps so that display module 32 includes generated each diagram in respective interface On:
In step S211, a group is selected, and at least one data are determined from the data set of the group Feature.In one embodiment, group marked as 2 such as in selection Fig. 3, and from the data in the group marked as 2 The data characteristics for determining that one is user information is concentrated, for example the user information is registion time.
In step S212, determining at least one data characteristics feature in the group and cluster point is counted Cloth.In the present embodiment, the statistics feature distribution and statistics institute for the data characteristics of registion time in the group State feature distribution of the data characteristics for registion time in the entire cluster.
In step S213, show the histogram of the feature distribution and correspond to the histogram in entire cluster histogram In profiles versus figure.In the present embodiment, based on the coding to the data characteristics, the display data for registion time Feature in the group histogram of feature distribution and the display data characteristics for registion time in the entire collection The histogram of feature distribution in group.As shown in figure 8, in the interface D, figure (a) is shown as noting in the selected group marked as 2 The feature distribution thumbnail of volume time corresponds to the amplification of the thumbnail, then the enlarged drawing (d) for lower side in the D of interface, by institute It states enlarged drawing to can be seen that in the group, from August 1 day to August one middle of the month of 31 days, which carries out registration behaviour The time of work concentrates on August 5, August 6 days, August 11 days, August 12 days and August this 5 days on the 16th, and in the interface D Figure (c) is characterized as the histogram that registered user in the cluster carries out the Annual distribution of registration operation in August part, from the figure (c) as can be seen that registered user has certain rule in the in one's duty registration distribution of August in the cluster, scheme in the D of interface (b) data characteristics for being characterized as overlaping in figure (d) and figure (c) being shown as registion time is in the entire cluster With the difference in the group of selection.In order to allow users to know the difference and contact between different characteristic, the application This block diagram is presented in the form of three layers in the embodiment of offer, after user is by clicking one of thumbnail, page Face will be scrolled into schemes by normalized profiles versus.Certainly, in specific application, the thumbnail of the data characteristics may be used also Can have multiple, each represent different data characteristicses.
In some embodiments, it can also distinguish or emphasize some data characteristics by carrying out color rendering to histogram Feature distribution or Dynamic Announce (such as the mode flickered) are to distinguish or emphasize certain number in the group and entire cluster According to feature in the group and entire cluster feature distribution.
In some embodiments, it is described in order to further analyze the difference between multiple groups in a network cluster Display module 32 is additionally operable to show the interface of the feature distribution of the data set of multiple groups, please refers to Fig.1 0 and Figure 11, Tu10Xian It is shown as the display module 32 and shows the step flow chart that multiple groups are distributed in the cluster in one embodiment, Figure 11 is shown Multiple groups distribution interface E in the cluster is shown in one embodiment for the application, as shown in the figure, the step includes:
In step S311, multiple groups are determined in the cluster be made of multiple network users, respectively with different shape, figure Mark, label and/or the difference of the multiple group of characterization;In one embodiment, for example, selection Fig. 3 in label 0,1 and 2 3 groups, wherein, the group marked as 0 shows that the group marked as 1 is shown with " red " color table, marked as 2 with " green " color table Group is shown with " indigo plant " color table.
In step S312, at least one data characteristics is determined from the data set of the multiple group;In the present embodiment In, a data characteristics, such as IP address are determined from the data set of this 3 groups.
In step S313, based between each two network user at least one data characteristics analysis respectively group Relative Entropy as the similarity degree between measuring each two network user;In the present embodiment, based on the IP Relative Entropy (the letter of IP usage amount dimensions in 3 groups of adress analysis label 0,1 and 2 between each two network user Cease entropy, IP used amount entropy) as the similarity degree measured between each two network user.For example, it adopts It is used as with the method t-SNE of Data Dimensionality Reduction (t- distribution neighborhoods embedded mobile GIS) and with the relative entropy between two users and measures this The index of a little network user's distances.
In step S314, display interface is exported, in the interface, with shape, icon, and/or tag characterization network User characterizes the difference of the multiple group with different colours, and two network users in each group are characterized with the distance of display Between similarity degree.In the present embodiment, in interface E as shown in figure 11, the network user is characterized with dot, " green " color table shows Group marked as 0, shows the group marked as 1 with " red " color table, shows the group marked as 2 with " indigo plant " color table, wherein, with " indigo plant " Color table shows that the user distance in the group marked as 2 is shorter, which forms tufted distribution, is shown with " red " color table marked as 1 User distance in group is also shorter, which forms tufted distribution, and the normal users for representing random sampling are shown with " green " color table Distribution, farther out, distribution more disperses for the distance between normal users.Thereby it is believed that a group is if dense Cluster, be considered as a fraud group possibility it is bigger.Than in embodiment as shown in figure 11, which shows Group is in the distribution that more disperses, then it represents that for should " green " colo(u)r group group be normal group, the user of " green " point expression therein For normal users.Opposite, the group (group i.e. marked as 1) shown with " red " color table and the group shown with " indigo plant " color table (group i.e. marked as 2) is distributed in into tufted, then it represents that for should " red " and " indigo plant " colo(u)r group group be exception group, wherein, use The user that " red " point and " indigo plant " point represent is abnormal user.In one embodiment, it can be handed over using the user of the visualization system The specifying information of user and feature value in each group are checked to mutual formula by mouse suspension.
In other examples, in the interface of output, for example, shape, icon, and/or tag characterization can also be used The network user, such as shape are the geometric figures such as triangle, rectangle, for example icon is smiling face or face of crying, human skeleton head portrait, Qiang Daotou As etc. icons, such as label word or with symbol for clearly distinguishing etc..
The user data visualization system of the application is by the way that the determining group user of institute in fraud detection process is grouped The modes such as process, data characteristics distribution, tabulation are presented, and realize during fraud is detected Suo Fen groups with more Kind relationship interface is shown, and is conducive to domain expert and algorithm expert and the detection algorithm of fraud detecting system is commented Estimate and revise.
It is set it should be noted that all modules in the user data visualization system can be configured in single computer It is standby upper.Or each module in the user data visualization system is arranged, respectively the client and network side of user side On server, and client is connect with server network.For example, the acquisition module of user data visualization system is mounted on service In device, display module is mounted in client, and the client is based on sending request to log in the server-side, the server Operation based on client executing request runs the user data visualization system, and pass through client to the client End shows respective interface.The client includes but not limited to:Configuration is in the browser of user terminal or private client software Interface and for performing hardware of display interface program etc..
It should also be noted that, through the above description of the embodiments, those skilled in the art can be clearly Solving the part or all of of the application can realize by software and with reference to required general hardware platform.Based on such reason Solution, the part that the technical solution of the application substantially in other words contributes to the prior art can in the form of software product body Reveal and, which may include being stored thereon with machine readable Jie of one or more of machine-executable instruction Matter, these instructions can make by computer, computer network or other electronic equipments when one or more machines perform It obtains the one or more machine and performs operation according to an embodiment of the present application.Machine readable media may include, but be not limited to, soft Disk, CD, CD-ROM (compact-disc-read-only memory), magneto-optic disk, ROM (read-only memory), RAM (random access memory), EPROM (Erasable Programmable Read Only Memory EPROM), EEPROM (electrically erasable programmable read-only memory), magnetic or optical card, sudden strain of a muscle Deposit or suitable for store machine-executable instruction other kinds of medium/machine readable media.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as:Personal computer, service Device computer, handheld device or portable device, laptop device, multicomputer system, the system based on microprocessor, top set Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer, including any of the above system or equipment Distributed computing environment etc..
The application can be described in the general context of computer executable instructions, such as program Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, group Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these distributed computing environment, by Task is performed and connected remote processing devices by communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage device.
It should be noted that it will be understood by those skilled in the art that above-mentioned members can be programmable logic device, Including:Programmable logic array (Programmable Array Logic, PAL), Universal Array Logic (Generic Array Logic, GAL), field programmable gate array (Field-Programmable Gate Array, FPGA), complex programmable patrol One or more in volume device (Complex Programmable Logic Device, CPLD), the application, which does not do this, to be had Body limits.
The principles and effects of the application are only illustrated in above-described embodiment, not for limitation the application.It is any ripe Know the personage of this technology all can without prejudice to spirit herein and under the scope of, modifications and changes are carried out to above-described embodiment.Cause This, those of ordinary skill in the art is complete without departing from spirit disclosed herein and institute under technological thought such as Into all equivalent modifications or change, should be covered by claims hereof.

Claims (25)

1. a kind of user data method for visualizing, applied in a fraud detecting system, which is characterized in that including following step Suddenly:
The data set of a group is obtained, the data characteristics of the data set includes user information, IP address, event type, thing Part initiates source, event response side and Time To Event;Wherein, the data characteristics of the data set is confirmed as different determine Plan priority;
A decision tree figure is shown to characterize the attribute test process of all users in the group, wherein:
Show the data characteristics of the first priority of the decision tree figure root node and its decision codomain;
Show the final attribute of at least one user of each leaf node characterization of the decision tree figure;
Show that current attribute, the data of current priority of multiple users of each nonleaf node characterization of the decision tree figure are special Sign and its decision codomain;And
The decision path of root node or each nonleaf node in the corresponding decision tree figure of display, those decision paths are not with The lines of same color, shape or thickness are characterized.
2. user data method for visualizing according to claim 1, which is characterized in that one decision tree figure of the display To characterize in the group during the attribute test of all users, the root node in the decision tree figure also shows group Number of users and the number of users that current attribute is also shown in each nonleaf node of the decision tree figure.
3. user data method for visualizing according to claim 1 or 2, which is characterized in that the decision tree diagram of the display Shape is the decision tree figure handled through beta pruning.
4. user data method for visualizing according to claim 1, which is characterized in that further comprising the steps of:
Determine a user in the group as target user;
A time shaft is shown in the side of the decision tree figure, the behaviour of the target user on the time axis is presented Make daily record.
5. user data method for visualizing according to claim 1, which is characterized in that the data for obtaining a group The step of collection, includes:
Obtain the operation log that cluster is made of multiple network users;
At least one data characteristics is determined from the operation log of the multiple network user, and is analyzed in the operation log extremely The similarity of few one group of data characteristics is with the determining group;And
Obtain the data set of the group.
6. user data method for visualizing according to claim 1 or 5, which is characterized in that it is at least one to further include display The step of group interface, the group size in the group interface are characterized with the geometric figure size shown.
7. user data method for visualizing according to claim 1 or 5, which is characterized in that further include one group of display Data set interface the step of, the data characteristics of the data set of the group include user information, IP address, event type, Event initiates at least the two data characteristics in source, event response side and Time To Event, on the boundary of the group data collection In face, sequencing display after the group data collection is grouped.
8. user data method for visualizing according to claim 1 or 5, which is characterized in that further include the display group Data set feature distribution interface the step of:
A group is selected, and at least one data characteristics is determined from the data set of the group,
Count feature distribution of the determining at least one data characteristics in the group and cluster;And
Show the profiles versus's figure of the histogram and the corresponding histogram of the feature distribution in entire cluster histogram.
9. user data method for visualizing according to claim 1 or 5, which is characterized in that further include the multiple groups of display Data set feature distribution interface the step of:
Multiple groups are determined in the cluster be made of multiple network users, respectively with different shape, icon, label and/or color Characterize the difference of the multiple group;
At least one data characteristics is determined from the data set of the multiple group;
Based on the Relative Entropy conduct between each two network user at least one data characteristics analysis respectively group Measure the similarity degree between each two network user;And
Display interface is exported, in the interface, with shape, icon, and/or the tag characterization network user, with different colours table The difference of the multiple group is levied, the similarity degree in each group between two network users is characterized with the distance of display.
10. user data method for visualizing according to claim 1, which is characterized in that the event type includes network The concern of user, thumb up, comment on, presenting, logging in, publishing, more new state, registration, at least one of modification information.
11. a kind of computer equipment, which is characterized in that including:
Processor;
The presentation engine performed on the processor, the presentation engine is for execution as described in claim any one of 1-10 User data method for visualizing.
12. a kind of user data visualization system, which is characterized in that including:
Acquisition module, for obtaining the data set of a group, the data characteristics of the data set is with including user information, IP Location, event type, event initiate source, event response side and Time To Event;Wherein, the data characteristics quilt of the data set It is determined as different decision priority;And
Display module, for showing a decision tree figure to characterize the attribute test process of all users in the group, In, show the data characteristics of the first priority of the decision tree figure root node and its decision codomain;Show the decision tree The final attribute of at least one user of each leaf node characterization of figure;Show each nonleaf node characterization of the decision tree figure The current attribute of multiple users, the data characteristics of current priority and its decision codomain;And the corresponding decision tree of display The decision path of root node or each nonleaf node in figure, the line of those decision path different colours, shape or thickness Item is characterized.
13. user data visualization system according to claim 12, which is characterized in that the display module is additionally operable to Root node in the decision tree figure shows group user quantity and is shown in each nonleaf node of the decision tree figure Show the number of users of current attribute.
14. user data visualization system according to claim 12 or 13, which is characterized in that the display module is shown Decision tree figure be the decision tree figure handled through beta pruning.
15. user data visualization system according to claim 12 or 13, which is characterized in that the display module is also used In showing a time shaft in the side of the decision tree figure, the operation day of a target user on the time axis is presented Will, the target user are determined by an input operation.
16. user data visualization system according to claim 12, which is characterized in that the group is obtained by described The operation log for multiple network users that modulus block obtains, and analyze at least one set in the operation log through the processing module What the similarity of data characteristics determined.
17. the user data visualization system according to claim 12 or 16, which is characterized in that the display module is also used In at least one group interface of display, the group size in the group interface is characterized with the geometric figure size shown.
18. the user data visualization system according to claim 12 or 16, which is characterized in that the display module is also used In the interface for the data set for showing a group, the data characteristics of the data set of the group includes user information, IP address, thing Part type, event initiate at least the two data characteristics in source, event response side and Time To Event, in the group number According in the interface of collection, sequencing display after the group data collection is grouped.
19. the user data visualization system according to claim 12 or 16, which is characterized in that the display module is also used In the interface of the feature distribution for the data set for showing the group, the histogram of the feature distribution and the corresponding histogram exist Profiles versus's figure in entire cluster histogram.
20. the user data visualization system according to claim 12 or 16, which is characterized in that the display module is also used In display shape, icon, and/or the tag characterization network user, the difference of the multiple group is characterized with different colours, with aobvious The distance shown characterizes the interface of the similarity degree between two network users in each group.
21. user data visualization system according to claim 12, which is characterized in that the event type includes network The concern of user, thumb up, comment on, presenting, logging in, publishing, more new state, registration, at least one of modification information.
22. a kind of client passes through one server-side of network connection, which is characterized in that the client is based on sending request to step on Record the step of server-side performs claim 1-10 any one of them user data method for visualizing.
23. a kind of server passes through one client of network connection, which is characterized in that the server is held based on the client The operation of row request sends claim 1-10 any one of them user data method for visualizing to the client Process simultaneously shows implementing result by the client.
24. a kind of browser passes through one server-side of network connection, which is characterized in that the browser is based on sending request to step on Record the step of server-side performs claim 1-10 any one of them user data method for visualizing.
25. a kind of computer readable storage medium is stored with data visualization computer program, which is characterized in that the data Visual calculation machine program is performed the step of realizing any one of the claim 1-10 user data method for visualizing.
CN201810022133.7A 2018-01-10 2018-01-10 User data visualization method and system Active CN108268624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810022133.7A CN108268624B (en) 2018-01-10 2018-01-10 User data visualization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810022133.7A CN108268624B (en) 2018-01-10 2018-01-10 User data visualization method and system

Publications (2)

Publication Number Publication Date
CN108268624A true CN108268624A (en) 2018-07-10
CN108268624B CN108268624B (en) 2020-04-24

Family

ID=62773340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810022133.7A Active CN108268624B (en) 2018-01-10 2018-01-10 User data visualization method and system

Country Status (1)

Country Link
CN (1) CN108268624B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063131A (en) * 2018-08-02 2018-12-21 陶雷 A kind of system and method carrying out content output based on structural data processing
CN109213904A (en) * 2018-08-02 2019-01-15 陶雷 A kind of system and method that presentation data are handled based on structured schemes
CN109767269A (en) * 2019-01-15 2019-05-17 网易(杭州)网络有限公司 A kind for the treatment of method and apparatus of game data
CN111125658A (en) * 2019-12-31 2020-05-08 深圳市分期乐网络科技有限公司 Method, device, server and storage medium for identifying fraudulent users
CN112347343A (en) * 2020-09-25 2021-02-09 北京淇瑀信息科技有限公司 Customized information pushing method and device and electronic equipment
CN113806594A (en) * 2020-12-30 2021-12-17 京东科技控股股份有限公司 Business data processing method, device, equipment and storage medium based on decision tree

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278464B1 (en) * 1997-03-07 2001-08-21 Silicon Graphics, Inc. Method, system, and computer program product for visualizing a decision-tree classifier
CN105408894A (en) * 2014-06-25 2016-03-16 华为技术有限公司 Method and device for determining user identity category
CN105956122A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Object attribute determining method and device
CN106295702A (en) * 2016-08-15 2017-01-04 西北工业大学 A kind of social platform user classification method analyzed based on individual affective behavior
CN107438050A (en) * 2016-05-26 2017-12-05 北京京东尚科信息技术有限公司 Identify the method and system of the potential malicious user of website

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278464B1 (en) * 1997-03-07 2001-08-21 Silicon Graphics, Inc. Method, system, and computer program product for visualizing a decision-tree classifier
CN105408894A (en) * 2014-06-25 2016-03-16 华为技术有限公司 Method and device for determining user identity category
CN105956122A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Object attribute determining method and device
CN107438050A (en) * 2016-05-26 2017-12-05 北京京东尚科信息技术有限公司 Identify the method and system of the potential malicious user of website
CN106295702A (en) * 2016-08-15 2017-01-04 西北工业大学 A kind of social platform user classification method analyzed based on individual affective behavior

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
丁爽斯: "基于大数据的互联网金融欺诈行为识别研究", 《中国优秀硕士学位论文全文数据库经济与管理科学辑》 *
王博: "基于行为分析的恶意代码分类与可视化", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063131A (en) * 2018-08-02 2018-12-21 陶雷 A kind of system and method carrying out content output based on structural data processing
CN109213904A (en) * 2018-08-02 2019-01-15 陶雷 A kind of system and method that presentation data are handled based on structured schemes
CN109063131B (en) * 2018-08-02 2021-09-28 陶雷 System and method for outputting content based on structured data processing
CN109213904B (en) * 2018-08-02 2021-09-28 陶雷 System and method for processing presentation data based on structured scheme
CN109767269A (en) * 2019-01-15 2019-05-17 网易(杭州)网络有限公司 A kind for the treatment of method and apparatus of game data
CN109767269B (en) * 2019-01-15 2022-02-22 网易(杭州)网络有限公司 Game data processing method and device
CN111125658A (en) * 2019-12-31 2020-05-08 深圳市分期乐网络科技有限公司 Method, device, server and storage medium for identifying fraudulent users
CN111125658B (en) * 2019-12-31 2024-03-22 深圳市分期乐网络科技有限公司 Method, apparatus, server and storage medium for identifying fraudulent user
CN112347343A (en) * 2020-09-25 2021-02-09 北京淇瑀信息科技有限公司 Customized information pushing method and device and electronic equipment
CN113806594A (en) * 2020-12-30 2021-12-17 京东科技控股股份有限公司 Business data processing method, device, equipment and storage medium based on decision tree

Also Published As

Publication number Publication date
CN108268624B (en) 2020-04-24

Similar Documents

Publication Publication Date Title
CN108268624A (en) User data method for visualizing and system
CN110223168B (en) Label propagation anti-fraud detection method and system based on enterprise relationship map
CN110704572B (en) Suspected illegal fundraising risk early warning method, device, equipment and storage medium
US7028036B2 (en) System and method for visualization of continuous attribute values
CN108170830A (en) Group event data visualization method and system
Nost et al. Q-method and the performance of subjectivity: reflections from a survey of US stream restoration practitioners
CN107944745B (en) Risk information evaluation method and system
CN112258303B (en) Surrounding string mark early warning analysis method and device, electronic equipment and storage medium
CN107729915A (en) For the method and system for the key character for determining machine learning sample
CN108280644A (en) Group member relation data method for visualizing and system
CN108021651A (en) Network public opinion risk assessment method and device
CN112053061A (en) Method and device for identifying surrounding label behaviors, electronic equipment and storage medium
CN107844911A (en) Performance report using network door to products & services
US20210406930A1 (en) Benefit surrender prediction
Pang et al. Project Risk Ranking Based on Principal Component Analysis-An Empirical Study in Malaysia-Singapore Context
Amyrotos Adaptive Visualizations for Enhanced Data Understanding and Interpretation
CN114048974A (en) Artificial intelligent talent evaluation method, equipment and medium based on multi-scene simulation
Krishnan et al. Performance measurement link between the balanced scorecard dimensions: an empirical study of the manufacturing sector in Malaysia
JP2019083076A (en) Evaluation device, evaluation method and evaluation program
Martinez et al. Visualization of multi-level data quality dimensions with QuaIIe
Aher et al. Prediction of course selection by student using combination of data mining algorithms in E-learning
Riedel The problems of assessing transnational mobility: Identifying latent groups of immigrants in Germany using factor mixture analysis
CN113705072A (en) Data processing method, data processing device, computer equipment and storage medium
Kınaa Exploring Recent Ideological Divides in Turkey: Political and Cultural Axes
Buitenhuis Designing a holistic method for enhancing data quality with the use of machine learning: A master thesis for ICT in Business & the Public Sector at Leiden University

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20181024

Address after: 100084 10 floor 1009-1, 3 building, 1 Zhongguancun East Road, Haidian District, Beijing.

Applicant after: Hua Ching Qing Chiao information technology (Beijing) Co., Ltd.

Address before: 100084 Tsinghua Yuan, Beijing, Haidian District

Applicant before: Tsinghua University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant