CN112508074A - Visualization display method and system and readable storage medium - Google Patents

Visualization display method and system and readable storage medium Download PDF

Info

Publication number
CN112508074A
CN112508074A CN202011386790.3A CN202011386790A CN112508074A CN 112508074 A CN112508074 A CN 112508074A CN 202011386790 A CN202011386790 A CN 202011386790A CN 112508074 A CN112508074 A CN 112508074A
Authority
CN
China
Prior art keywords
sample data
different
decision tree
cluster analysis
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011386790.3A
Other languages
Chinese (zh)
Other versions
CN112508074B (en
Inventor
刘颖麒
林家玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Feiquan Cloud Data Service Co ltd
Original Assignee
Shenzhen Feiquan Cloud Data Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Feiquan Cloud Data Service Co ltd filed Critical Shenzhen Feiquan Cloud Data Service Co ltd
Priority to CN202011386790.3A priority Critical patent/CN112508074B/en
Priority claimed from CN202011386790.3A external-priority patent/CN112508074B/en
Publication of CN112508074A publication Critical patent/CN112508074A/en
Application granted granted Critical
Publication of CN112508074B publication Critical patent/CN112508074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a visual display method, which comprises the following steps: acquiring characteristic data corresponding to different clients, and taking the characteristic data as sample data corresponding to the different clients; performing cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients, wherein the cluster analysis results are the client categories to which the clients of the different sample data belong; and training a decision tree by using the training set according to the training set formed by the clustering analysis result and the sample data, and performing visual display on the decision tree trained by using the training set to enable the clustering analysis operation of the sample data to be visual. The invention also discloses a visual display system and a readable storage medium. The client type can be confirmed through the clustering analysis result, and the clustering analysis operation can be visualized through the decision tree, so that targeted business strategies can be made for different types of clients, and the effectiveness of making the business strategies is improved.

Description

Visualization display method and system and readable storage medium
Technical Field
The invention relates to the technical field of visualization, in particular to a visualization display method, a visualization display system and a readable storage medium.
Background
In the internet environment, companies accumulate certain user data, and thus demand for customer clustering based on the user data is generated. The feature dimension of the user data is large, and when the feature dimension of the user data exceeds the understanding and calculating range of the human brain, the user data is generally required to be clustered by means of an algorithm in the field of machine learning.
However, current customer clustering mainly employs clustering models for classification. For the cluster analysis result, the service personnel usually only know the customer category after the customer is clustered, but not know the specific difference between various customer groups, so that the corresponding service strategy can not be made in a targeted manner according to the cluster analysis result. Therefore, the current customer classification scheme cannot restore the cluster analysis process, and cannot make effective business strategies for different types of customers only because the differences between the different types of customers cannot be distinguished according to the cluster analysis result of the cluster analysis.
Disclosure of Invention
The invention mainly aims to provide a visual display method, a visual display system and a readable storage medium, and aims to solve the problem that in the prior art, the clustering analysis result based on clustering analysis cannot distinguish the difference between different types of clients, so that an effective service strategy cannot be established for the different types of clients.
In order to achieve the above object, the present invention provides a visual display method, which comprises the following steps:
acquiring characteristic data corresponding to different clients, and taking the characteristic data as sample data corresponding to the different clients;
performing cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients, wherein the cluster analysis results are the client categories to which the clients of the different sample data belong;
and training a decision tree by using the training set according to the training set formed by the clustering analysis result and the sample data, and performing visual display on the decision tree trained by using the training set to enable the clustering analysis operation of the sample data to be visual.
Optionally, the step of training a decision tree using the training set includes:
and taking the sample data in the training set as the input of the decision tree, taking the cluster analysis result in the training set as the output of the decision tree, and training the decision tree.
Optionally, the step of visually displaying the decision tree trained by using the training set includes:
traversing a decision tree trained by the training set, acquiring each decision path from input to output in the decision tree, and acquiring display output information of each node in the decision tree;
and visually displaying the decision tree according to the acquired display output information of each decision path and each node.
Optionally, the step of visualizing the decision tree trained by using the training set to visualize the cluster analysis operation of the sample data includes:
calculating characteristic parameters corresponding to different characteristics, wherein the characteristic parameters are parameters representing the importance of the different characteristics;
visually displaying the characteristic parameters corresponding to different characteristics in the decision tree;
and determining target characteristics from the characteristics according to the displayed characteristic parameters, and formulating a business strategy based on a cluster analysis result corresponding to the target characteristics.
Optionally, after the step of calculating the feature parameters corresponding to different features, the method further includes:
determining the importance levels corresponding to different features according to the feature parameters;
and visually displaying the importance levels corresponding to the different characteristics so as to make corresponding business strategies for different classes of customers according to the displayed importance levels.
Optionally, the step of performing cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to different clients includes:
carrying out data cleaning on the sample data to obtain cleaned target sample data;
determining the layering number of the cluster analysis model by adopting a preset layering algorithm;
and according to the number of the layers, performing cluster analysis on the target sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients.
Optionally, the step of performing data cleansing on the sample data includes:
determining feature tags corresponding to different types in the sample data, and determining a service type;
and screening out sample data matched with the service type from the sample data according to the characteristic tag, and carrying out data cleaning on the sample data corresponding to the service type.
Optionally, the step of performing data cleansing on the sample data includes:
acquiring association degree data among different characteristics;
screening out mutually independent sample data from the sample data according to the association degree data;
and carrying out data cleaning on the screened mutually independent sample data.
In addition, in order to achieve the above object, the present invention further provides a visualization display system, which includes a memory, a processor, and a visualization display program stored on the processor and executable on the processor, wherein the processor implements the steps of the visualization display method as described above when executing the visualization display program.
In addition, to achieve the above object, the present invention further provides a readable storage medium, on which a visualization displaying program is stored, and the visualization displaying program, when executed by a processor, implements the steps of the visualization displaying method as described above.
The embodiment of the invention obtains the characteristic data corresponding to different clients, performs cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to different clients, and then trains a decision tree by using the training set according to the training set consisting of the cluster analysis results and the sample data, and performs visual display on the decision tree trained by using the training set, so that the cluster analysis operation of the sample data is visualized, the cluster analysis of the sample data corresponding to different clients is avoided, the cluster analysis results are intelligently obtained, and the specific difference between different clients cannot be obtained. The decision tree is trained through a training set formed by the clustering analysis result and the sample data, and is visually displayed, so that differences and connections among different client categories can be intuitively mastered based on the displayed decision tree, a service strategy can be made in a targeted manner according to the mastered differences and connections, and the effectiveness of making the service strategy is improved.
Drawings
Fig. 1 is a schematic structural diagram of a visualization display system of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram illustrating a first embodiment of a visualization display method according to the present invention;
fig. 3 is a flowchart illustrating a visualization displaying method according to a second embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the invention is: acquiring characteristic data corresponding to different clients, and taking the characteristic data as sample data corresponding to the different clients; performing cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients, wherein the cluster analysis results are the client categories to which the clients of the different sample data belong; and training a decision tree by using the training set according to the training set formed by the clustering analysis result and the sample data, and performing visual display on the decision tree trained by using the training set to enable the clustering analysis operation of the sample data to be visual.
The current passenger group classification scheme generally adopts a clustering analysis mode for classification, but clustering analysis can only output clustering analysis results, the classification process is unknown, the difference between different clustering analysis results is not convenient to master, and an effective service strategy cannot be formulated according to different clustering analysis results. Therefore, the invention provides a visual display method, a system and a readable storage medium, which can obtain the characteristic data corresponding to different clients, take the characteristic data as the sample data corresponding to different clients, then carry out cluster analysis on the sample data corresponding to different clients to obtain the cluster analysis results corresponding to different clients, wherein the cluster analysis results are the client classes to which the clients of different sample data belong, then train a decision tree by using the training set according to the training set formed by the cluster analysis results and the sample data, and visually display the decision tree trained by using the training set to visualize the cluster analysis operation of the sample data, thereby avoiding the reduction of the effectiveness of business strategy making due to the fact that the specific difference between different classes of clients cannot be distinguished when the cluster analysis operation is adopted, the effectiveness of business strategy formulation is improved.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a visualization display system of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the visual presentation system may include: a communication bus 1002, a processor 1001, such as a CPU, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration of the visual display system illustrated in FIG. 1 does not constitute a limitation of the visual display system, and may include more or fewer components than illustrated, or some components in combination, or a different arrangement of components.
In the visual display system shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and communicating with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call the visualization presentation program stored in the memory 1005 and perform the following operations:
acquiring characteristic data corresponding to different clients, and taking the characteristic data as sample data corresponding to the different clients;
performing cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients, wherein the cluster analysis results are the client categories to which the clients of the different sample data belong;
and training a decision tree by using the training set according to the training set formed by the clustering analysis result and the sample data, and performing visual display on the decision tree trained by using the training set to enable the clustering analysis operation of the sample data to be visual.
Alternatively, the processor 1001 may call the visualization presentation program stored in the memory 1005, and further perform the following operations:
and taking the sample data in the training set as the input of the decision tree, taking the cluster analysis result in the training set as the output of the decision tree, and training the decision tree.
Alternatively, the processor 1001 may call the visualization presentation program stored in the memory 1005, and further perform the following operations:
traversing a decision tree trained by the training set, acquiring each decision path from input to output in the decision tree, and acquiring display output information of each node in the decision tree;
and visually displaying the decision tree according to the acquired display output information of each decision path and each node.
Optionally, the feature data includes data corresponding to a plurality of features, after the step of visually displaying the decision tree trained by using the training set to visualize the cluster analysis operation of the sample data, the processor 1001 invokes a visual display program stored in the memory 1005, and performs the following operations:
calculating characteristic parameters corresponding to different characteristics, wherein the characteristic parameters are parameters representing the importance of the different characteristics;
visually displaying the characteristic parameters corresponding to different characteristics in the decision tree;
and determining target characteristics from the characteristics according to the displayed characteristic parameters, and formulating a business strategy based on a cluster analysis result corresponding to the target characteristics.
Optionally, after the step of calculating the feature parameters corresponding to different features, the processor 1001 may call the visualization display program stored in the memory 1005, and further perform the following operations:
determining the importance levels corresponding to different features according to the feature parameters;
and visually displaying the importance levels corresponding to the different characteristics so as to make corresponding business strategies for different classes of customers according to the displayed importance levels.
Alternatively, the processor 1001 may call the visualization presentation program stored in the memory 1005, and further perform the following operations:
carrying out data cleaning on the sample data to obtain cleaned target sample data;
determining the layering number of the cluster analysis model by adopting a preset layering algorithm;
and according to the number of the layers, performing cluster analysis on the target sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients.
Alternatively, the processor 1001 may call the visualization presentation program stored in the memory 1005, and further perform the following operations:
determining characteristic labels corresponding to different types of data in the sample data, and determining a service type;
and screening out sample data matched with the service type from the sample data according to the characteristic tag, and carrying out data cleaning on the sample data corresponding to the service type.
Optionally, before the step of performing data cleaning on the sample data to obtain the cleaned feature sample data, the processor 1001 may call the visualization display program stored in the memory 1005, and further perform the following operations:
acquiring association degree data among different characteristics;
screening out mutually independent sample data from the sample data according to the association degree data;
and carrying out data cleaning on the screened mutually independent sample data.
Referring to fig. 2, fig. 2 is a flowchart of a visualization displaying method according to a first embodiment of the present invention, in this embodiment, the visualization displaying method includes the following steps:
step S10: acquiring characteristic data corresponding to different clients, and taking the characteristic data as sample data corresponding to the different clients;
when the customer groups are divided through cluster analysis, the obtained cluster analysis result is only the customer category to which different customers belong, and although the classification requirement can be met, the specific difference between the customers in different categories cannot be known, the cluster analysis process cannot be corresponded with the business rule, and the business strategy cannot be updated in a targeted manner according to the difference between the customers in different categories. For example, if the clients are divided into 3 classes, the distinction between the 3 classes of clients is exactly what, and how to make business strategies for different classes cannot be obtained according to the output result of the cluster analysis model. Therefore, in order to avoid that the business strategy cannot be effectively formulated due to the fact that specific differences among different classes of clients cannot be known, in the embodiment, a supervision learning model with strong interpretability is introduced: and (4) a decision tree. After the sample data is subjected to cluster analysis, a decision tree is trained based on a cluster analysis result output by the cluster analysis model, so that cluster analysis operation is visualized through the decision tree, and the defect that the decision tree is not strong in interpretability is overcome. That is, the visualization scheme in this embodiment may be divided into two parts, the first part is to perform cluster analysis on sample data, and the second part is to train a decision tree by combining the cluster analysis result with the sample data corresponding to different clients. The cluster analysis process is a process of dividing a set of physical or abstract objects into a plurality of classes consisting of similar objects, and the decision tree belongs to the category of classification, and is used for dividing and classifying new data under the existing classification standard.
Specifically, before performing cluster analysis, sample data corresponding to different clients is acquired. In this embodiment, feature data corresponding to different customers is used as sample data, where the feature data may be data corresponding to a feature, such as revenue; the data may also correspond to a plurality of characteristics, for example, the data may include attribute information representing characteristics of the customer, such as sex, residence, age, height, weight, income, and purchase frequency, corresponding to different customers. Moreover, different business requirements correspond to different business types, and different business types can correspond to different characteristic data, for example, for marketing business types, the characteristic data can comprise identity characteristic information representing the identity of a customer, such as gender, age and the like, and consumption level information representing the purchasing ability of the customer, such as income level, purchasing times and the like; for the advertisement service type, it may include: the characteristic data of the user habit information which represents the preference of the client, such as the browsing duration and the browsing times, and the like. Therefore, the acquired sample data can be selected according to different service requirements.
In a specific embodiment, after sample data is obtained, feature tags corresponding to different sample data need to be determined, so as to perform cluster analysis on the sample data corresponding to different clients subsequently. That is, after sample data is obtained, in order to distinguish the sample data, it is necessary to add a feature tag to different sample data, such as adding a gender tag to data representing the gender of a customer, adding an age tag to the customer representing the age of the customer, adding a transaction amount tag to data representing the transaction amount of the customer, and the like. In a specific practical operation process, reasonable characteristics corresponding to business requirements can be screened out from characteristic labels which can be obtained by an enterprise and can be provided by a third-party supplier to serve as sample characteristics of a training cluster analysis model.
Step S20: performing cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients, wherein the cluster analysis results are the client categories to which the clients of the different sample data belong;
after sample data corresponding to different clients is obtained, a cluster analysis model can be used for carrying out cluster analysis on the obtained sample data to obtain cluster analysis results corresponding to the different clients, the cluster analysis results are client categories to which the clients with different sample data belong in a plurality of preset client categories, and the client categories can be divided according to different service requirements. For example, the system can be divided into high-level customers, medium-level customers and low-level customers according to consumption level, and also can be divided into important customers, potential customers and developable customers according to purchase data or browsing data.
Specifically, a cluster analysis algorithm, which may be Kmeans, DBSCAN, GMM, or the like, may be used to train the cluster analysis model. Kmeans is a K-means algorithm, the best category attribution is calculated based on the similarity of the distance between a point and the point, the K value must be specified in advance, and the method has the characteristics of high speed, suitability for finding spherical clusters, outliers and the like; DBSCAN (sensitivity-Based Spatial Clustering of Applications with Noise) is a Density-Based Spatial Clustering algorithm, samples of any shape can be found relative to anti-Noise (outliers can be found), and as long as the Density of a sample point is greater than a certain threshold value, the sample is added to the nearest cluster; the GMM (Gaussian Mixed Model) is a Model of one class and one Model, and has the characteristics of comprehensibility, high speed and the like. For the trained classification model, because the classification rules of the client classification are learned and a plurality of client categories are preset, the cluster analysis result can be obtained after the sample data is input into the cluster analysis model, namely the categories of different clients are obtained.
In a specific embodiment, before performing cluster analysis on the sample data corresponding to different clients, in order to improve the accuracy and efficiency of classification, the sample data may be subjected to data cleaning, then after obtaining the cleaned target sample data, the number of layers of the cluster analysis model may be determined by using a preset layering algorithm, and then the cluster analysis model is used to perform cluster analysis on the cleaned target sample data according to the number of layers determined, so as to quickly and effectively obtain the cluster analysis results corresponding to different clients. The preset hierarchical algorithm may be selected according to a cluster analysis algorithm corresponding to the cluster analysis model, for example, when the cluster analysis algorithm is Kmeans, an elbow method (elbow method) may be used to determine the number of hierarchical layers. Because Kmeans uses euclidean distance to determine the similarity of samples, the intra-group distance should be small enough to obtain a better cluster analysis result, and different hierarchical numbers may be used to obtain the minimum intra-group distance, and the hierarchical number is selected as the final hierarchical number when the intra-group distance is the minimum. For example, if the intra-group distance starts to converge to a certain degree when the number of layers is 4, 4 layers are determined as the optimum number of layers. In addition, the process of data cleaning the sample data may include: null processing, singular value processing, text digitization processing, normalization processing and de-duplication processing. The processing method of null value processing is various, and can be determined by combining specific data volume, data distribution, service requirements and the like, for example, when the total sample volume is enough, null value samples can be directly deleted, and when the total sample volume is small, a filling substitution method can be considered, and null values are filled by using an average value, 0 or-1 and the like; (2) singular value processing refers to processing extreme values (singular values) that do not conform to the traffic conditions, such as gender characteristics should have only two values, and if the third word is present, the extreme values are singular values. Usually, such extreme values are not many, and can be directly removed, if the singular value is caused by a system bug, the singular value can be regenerated after being repaired; (3) the text digitization processing refers to that for the sex-oriented feature, the feature data is a text value, and at the moment, the text value needs to be converted into numbers so as to be suitable for different classification models to carry out model training; (4) the normalization processing means that the value ranges of different feature data are very different, for example, the transaction amount may be thousands of, but the transaction times may be only 2 digits, if no processing is performed, the data dispersion degree is very high, and the model convergence speed is significantly reduced. Usually, some normalization method is used, such as subtracting the mean value from all the values and dividing the result by the standard deviation, so as to unify the value range between [ -1, 1 ]; (5) the deduplication processing refers to deleting repeated data in the sample data and only needs to reserve 1 piece of the same data.
In another embodiment, when data cleaning is performed on sample data, the sample data corresponding to the service requirement needs to be screened from the obtained sample data corresponding to different clients, and then the data cleaning is performed on the sample data corresponding to the service requirement, so as to improve the reliability of the analysis result output by the cluster analysis model. Since there may be a plurality of service types to be implemented by different customer groups, feature screening needs to be performed on sample data according to specific service types and service properties. For example, the client characteristics of 1 payment company include two transaction modes of code scanning and card swiping, but the corresponding service type during cluster analysis may only be for the code scanning transaction service, so that although card swiping transaction record data can be obtained, the card swiping transaction record data does not need to be used during training of the hierarchical model, only sample data containing the card swiping transaction record needs to be screened out, and the classification efficiency can be improved.
In another embodiment, when data cleaning is performed on sample data, in order to increase the training speed of the model and reduce the labor and material costs of training, association degree data between different features needs to be acquired, so as to screen out mutually independent sample data from the sample data according to the association degree data. The association data represents the association between different feature data, and may specifically be a correlation coefficient or the like. When the relevance degree data of the correlated sample data is smaller than a preset relevance degree threshold value, the correlated sample data can be considered to be mutually independent; and when the relevance degree data of the correlated sample data exceeds a preset relevance degree threshold value, the correlated sample data is considered to be mutually dependent, and only one feature data is required to be reserved at the moment. The preset correlation threshold may be set according to a specific application requirement, and is not limited herein. For example, the correlation coefficient between the attribute features may be calculated, and the attribute features with a lower absolute value of the correlation coefficient are selected to train the model, and if the correlation coefficient between the transaction time and the transaction amount is found to be 0.9, 1 of the features is directly selected to reduce the model training cost.
Certainly, when data cleaning is performed on sample data, the sample data matched with the service type can be screened from the sample data according to the service type for data cleaning, and meanwhile, mutually independent sample data can also be screened from the sample data according to the association degree data among different characteristics for data cleaning. And the execution sequence of the step of screening out the sample data matched with the service type from the sample data according to the service type for data cleaning and the step of screening out the mutually independent sample data from the sample data according to the association degree data between different characteristics for data cleaning is not limited, and the steps can be executed successively according to a preset sequence or at the same time.
Step S30: training a decision tree by using the training set according to the training set formed by the clustering analysis result and the sample data, and performing visual display on the decision tree trained by using the training set to enable the clustering analysis operation of the sample data to be visual;
after the cluster analysis result is obtained, the client categories of different clients can be obtained. However, in order to visualize the cluster analysis operation of the sample data, it is necessary to mark the cluster analysis result as a target feature, combine the marked cluster analysis result with the acquired feature data to form a training set of the decision tree, and train the decision tree by using the training set. Because the decision tree has the characteristics of low computational complexity, easy understanding of output results and the like, the decision tree trained by the training set is visually displayed, so that the defect that the output results of the cluster analysis are difficult to understand can be overcome, and the cluster analysis of the sample data is visualized. The displayed decision tree is used for visualizing the clustering analysis process of the sample data, namely the decision tree trained by using the training set is displayed visually, so that the association between different clustering analysis results (different client categories) and different characteristics can be visually determined according to the displayed decision tree, the difference between different client categories can be determined according to the association between the clustering analysis results and the characteristics, the client categories among different clients can be determined according to the clustering analysis results output by clustering analysis, the difference among the different clients can be determined according to the displayed decision tree, and the clustering analysis process of the sample data is visualized. And the clustering analysis and the operation of the sample data are visualized, namely, the difference between different types of clients is visualized.
Based on the presented decision tree, business personnel can then quickly determine the specific differences between different categories of customers, from which customized services or customized marketing strategies, etc. can be provided for the different categories of customers. Of course, the visual display system may also identify differences between different categories of customers according to the displayed decision tree, and then automatically match corresponding customized services or customized marketing strategies for different categories of customers according to the differences.
In this embodiment, by obtaining feature data corresponding to different clients, taking the feature data as sample data corresponding to different clients, then performing cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to different clients, where the cluster analysis results are client categories to which the clients with different sample data belong, then training a decision tree by using the training set according to the training set formed by the cluster analysis results and the sample data, and visually displaying the decision tree trained by using the training set, the cluster analysis operation of the sample data is visualized, so as to facilitate finding of differences among the clients with different categories, avoid that the interpretability of a cluster analysis model is not strong due to the fact that the differences among the clients with different categories cannot be found, and thus, the method is not beneficial to targeted specified business strategies, and improves the effectiveness of business strategy making.
Referring to fig. 3, fig. 3 is a flowchart of a visualization displaying method according to a second embodiment of the present invention, in this embodiment, the visualization displaying method includes the following steps:
step S11: acquiring characteristic data corresponding to different clients, and taking the characteristic data as sample data corresponding to the different clients;
step S12: performing cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients, wherein the cluster analysis results are the client categories to which the clients of the different sample data belong;
step S13: according to a training set formed by the clustering analysis result and the sample data, taking the sample data in the training set as the input of the decision tree, taking the clustering analysis result in the training set as the output of the decision tree, and training the decision tree;
step S14: and visually displaying the decision tree trained by the training set to enable the clustering analysis operation of the sample data to be visual.
In this embodiment, after the feature data corresponding to different clients is obtained to form sample data corresponding to different clients, and the sample data corresponding to different clients is subjected to cluster analysis to obtain cluster analysis results corresponding to different clients, in order to effectively train the decision tree, so that the decision tree is visually displayed, the cluster analysis process of the sample data can be restored, so that the cluster analysis operation of the sample data is visualized, and a decision tree algorithm needs to be selected first. The selected decision tree algorithm may be an ID3, C4.5, CART, or the like algorithm. The decision tree constructed by the ID3 algorithm can have a plurality of branches, but cannot handle the situation that the feature data is continuous data, and the feature of the segmented data selected each time is the current best choice and does not care whether the segmented data is optimal or not; the C4.5 algorithm takes the information gain ratio as a criterion of selecting branches, penalizes the characteristics with more values by introducing split information items, makes up the problem that the continuity of characteristic data cannot be processed in the ID3, but the performance of the C4.5 algorithm is reduced because the continuous attribute values need to be scanned and sequenced; the decision tree constructed by the CART algorithm is a binary tree, data are cut into two parts by adopting a binary cutting method and respectively enter a left sub-tree and a right sub-tree, so that each non-leaf node has two children, and the CART algorithm can be used for classification and regression. Because different decision tree algorithms have different characteristics, the corresponding decision tree algorithm can be selected according to specific application requirements. After the decision tree algorithm is determined, the cluster analysis result in the training set can be used as the output (leaf node) of the decision tree, the sample data is used as the input (root node and child node) of the decision tree, then the determined decision tree algorithm is used for learning the corresponding relation between the output and the input, namely learning the decision rule between the leaf node and the root node and the child node, the training decision tree is squeezed according to the decision rule, and the decision tree is visually displayed.
In an embodiment, when the decision tree trained by using the training set is visually displayed, each decision path from input to output in the decision tree, that is, the decision path from the root node to each leaf node, and display output information of each node in the decision tree may be displayed, where the display output information may include: decision conditions corresponding to each child node (such as whether the age is 18-20 years, whether the monthly income is 8000 and the like), clustering analysis results corresponding to leaf nodes (such as good melons or bad melons and the like) and the like. Therefore, all decision paths from the root node to each leaf node in the decision tree and display output information corresponding to each node (including the root node, the leaf node and the child nodes except the root node and the leaf node) in the decision tree can be obtained by traversing the decision tree trained by using the training set, and then the decision tree is visually displayed according to the obtained decision paths and the display output information, so that the clustering analysis operation of the sample data can be visualized.
In another embodiment, the decision tree trained by using the training set can be visually displayed, and the decision tree trained by using the training set can be visually displayed according to the importance of different features, so that the problem that the cluster analysis operation of the sample data is still difficult to understand under the complex conditions of more features, more decision paths and the like when the decision paths are displayed is avoided. Specifically, feature parameters corresponding to different features may be calculated first, where the feature parameters are parameters characterizing importance of the different features, and the importance of the features may be quantified by recording total splitting times of the different features and by using total/average information gain during training. For example, the importance of the features may be scored by the number of times the features are used in the entire decision tree model or the total/average information gain that is brought about, and the scores corresponding to different features are used as the feature parameters corresponding to different features; the importance of the features can also be quantitatively calculated by using test data in a trained decision tree model. For example, a trained decision tree model can be used for scoring test data, calculating an evaluation index corresponding to the current service type, then disordering the sequence among the data, randomizing the data, scoring the randomized data again, and calculating the evaluation index; calculating the index change rate corresponding to each feature according to the evaluation index calculated for the first time and the evaluation index calculated for the second time, and taking the index change rate as the feature parameter corresponding to different features; of course, in some other embodiments, the normalized value of the information entropy or the reduction amount of the kini index may be used as the characteristic parameter corresponding to different characteristics. Specifically, different calculation methods may be selected according to the service type and the decision algorithm corresponding to the decision tree model to calculate the feature parameters corresponding to different features, which is not limited herein. And then, visually displaying the decision tree according to the characteristic parameters corresponding to different characteristics, namely determining target characteristics from sample data according to the displayed characteristic parameters so as to focus on the clustering analysis result corresponding to the target characteristics, thereby reasonably making a corresponding business strategy.
In another embodiment, after the feature parameters corresponding to different features are obtained through calculation, a plurality of importance levels can be pre-divided according to parameter ranges corresponding to the feature parameters, then the importance levels corresponding to the different features are determined from the pre-divided importance levels according to the feature parameters corresponding to the different features, the importance levels corresponding to the different features are directly displayed visually, when the feature dimensions are too large, for example, when the feature dimensions exceed a certain number, target features needing important attention can be determined more intuitively and quickly, and corresponding operation strategies can be formulated for different classes of clients according to the importance levels.
It should be noted that the step of visually displaying the output information according to the display of each decision path and each node, the step of visually displaying the feature parameters corresponding to different features, and the step of visually displaying the important registrations corresponding to different features may be sequentially performed according to a preset order, for example, the decision paths corresponding to different features are displayed first, the feature parameters corresponding to different features are displayed, and then the feature grades corresponding to different features are displayed; or the method can be performed independently, for example, when the feature dimension exceeds a first number, the feature parameters corresponding to different features can be displayed visually; when the feature dimension exceeds a second number, the importance levels corresponding to different features can be visually displayed; and when the characteristic degree is less than the first quantity, the decision path and the node display output information can be visually displayed. Wherein the first number is less than the second number; of course, the decision path and the feature parameters corresponding to different features and the importance level may be simultaneously displayed on the same display interface, so as to distinguish the difference between different types of clients according to the decision path and the feature parameters corresponding to different features, and make a service policy more specifically. The specific presentation form may be visualized in a diversified presentation form such as a tree chart or a bar chart, which is not limited herein.
In the embodiment, after a training set is formed according to a cluster analysis result and sample data, the sample data in the training set is used as the input of the decision tree, the cluster analysis result in the training set is used as the output of the decision tree, the decision tree is trained, and a decision tree corresponding to cluster analysis operation can be constructed.
In addition, the embodiment of the present invention further provides a visualization display system, where the visualization display system includes a memory, a processor, and a visualization display program stored on the processor and executable on the processor, and the processor implements the steps of the visualization display method when executing the visualization display program.
In addition, an embodiment of the present invention further provides a readable storage medium, where the readable storage medium stores a visualization displaying program, and the visualization displaying program, when executed by a processor, implements the steps of the visualization displaying method as described above.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a television, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A visual display method is characterized by comprising the following steps:
acquiring characteristic data corresponding to different clients, and taking the characteristic data as sample data corresponding to the different clients;
performing cluster analysis on the sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients, wherein the cluster analysis results are the client categories to which the clients of the different sample data belong;
and training a decision tree by using the training set according to the training set formed by the clustering analysis result and the sample data, and performing visual display on the decision tree trained by using the training set to enable the clustering analysis operation of the sample data to be visual.
2. A visual presentation method as claimed in claim 1 wherein said step of training a decision tree using said training set comprises:
and taking the sample data in the training set as the input of the decision tree, taking the cluster analysis result in the training set as the output of the decision tree, and training the decision tree.
3. A visual presentation method as claimed in claim 2 wherein said step of visually presenting a decision tree trained using said training set comprises:
traversing a decision tree trained by the training set, acquiring each decision path from input to output in the decision tree, and acquiring display output information of each node in the decision tree;
and visually displaying the decision tree according to the acquired display output information of each decision path and each node.
4. The visual presentation method of claim 1, wherein the feature data comprises data corresponding to a plurality of features, and the step of visually presenting the decision tree trained by using the training set to visualize the cluster analysis operation of the sample data comprises:
calculating characteristic parameters corresponding to different characteristics, wherein the characteristic parameters are parameters representing the importance of the different characteristics;
visually displaying the characteristic parameters corresponding to different characteristics in the decision tree;
and determining target characteristics from the characteristics according to the displayed characteristic parameters, and formulating a business strategy based on a cluster analysis result corresponding to the target characteristics.
5. A visualization presentation method as claimed in claim 4, wherein after said step of calculating feature parameters corresponding to different features, said method further comprises:
determining the importance levels corresponding to different features according to the feature parameters;
and visually displaying the importance levels corresponding to the different characteristics so as to make corresponding business strategies for different classes of customers according to the displayed importance levels.
6. A visual presentation method as claimed in claim 1, wherein said step of performing cluster analysis on said sample data corresponding to different customers to obtain cluster analysis results corresponding to different customers comprises:
carrying out data cleaning on the sample data to obtain cleaned target sample data;
determining the layering number of the cluster analysis model by adopting a preset layering algorithm;
and according to the number of the layers, performing cluster analysis on the target sample data corresponding to different clients to obtain cluster analysis results corresponding to the different clients.
7. A visual presentation method as claimed in claim 6 wherein said step of data cleansing said sample data comprises:
determining characteristic labels corresponding to different types of data in the sample data, and determining a service type;
and screening out sample data matched with the service type from the sample data according to the characteristic tag, and carrying out data cleaning on the sample data corresponding to the service type.
8. A visual presentation method as claimed in claim 6 wherein said step of data cleansing said sample data comprises:
acquiring association degree data among different characteristics;
screening out mutually independent sample data from the sample data according to the association degree data;
and carrying out data cleaning on the screened mutually independent sample data.
9. A visualization display system, comprising a memory, a processor and a visualization display program stored on the memory and executable on the processor, wherein the processor implements the steps of the visualization display method according to any one of claims 1 to 8 when executing the visualization display program.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a visualization presentation program, which when executed by a processor implements the steps of the visualization presentation method according to any one of claims 1 to 8.
CN202011386790.3A 2020-11-30 Visual display method, system and readable storage medium Active CN112508074B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011386790.3A CN112508074B (en) 2020-11-30 Visual display method, system and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011386790.3A CN112508074B (en) 2020-11-30 Visual display method, system and readable storage medium

Publications (2)

Publication Number Publication Date
CN112508074A true CN112508074A (en) 2021-03-16
CN112508074B CN112508074B (en) 2024-05-14

Family

ID=

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070078849A1 (en) * 2005-08-19 2007-04-05 Slothouber Louis P System and method for recommending items of interest to a user
CN103714138A (en) * 2013-12-20 2014-04-09 南京理工大学 Area data visualization method based on density clustering
CN106096748A (en) * 2016-04-28 2016-11-09 武汉宝钢华中贸易有限公司 Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms
US20170083920A1 (en) * 2015-09-21 2017-03-23 Fair Isaac Corporation Hybrid method of decision tree and clustering technology
CN106682915A (en) * 2016-12-25 2017-05-17 东北电力大学 User cluster analysis method in customer care system
CN107862342A (en) * 2017-11-27 2018-03-30 清华大学 Lift the visual analysis system and method for tree-model
CN108256907A (en) * 2018-01-09 2018-07-06 北京腾云天下科技有限公司 A kind of construction method and computing device of customer grouping model
CN108492194A (en) * 2018-03-06 2018-09-04 平安科技(深圳)有限公司 Products Show method, apparatus and storage medium
CN109376759A (en) * 2018-09-10 2019-02-22 平安科技(深圳)有限公司 User information classification method, device, computer equipment and storage medium
CN110276382A (en) * 2019-05-30 2019-09-24 平安科技(深圳)有限公司 Listener clustering method, apparatus and medium based on spectral clustering
CN110874604A (en) * 2018-08-30 2020-03-10 Tcl集团股份有限公司 Model training method and terminal equipment
CN111783840A (en) * 2020-06-09 2020-10-16 苏宁金融科技(南京)有限公司 Visualization method and device for random forest model and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070078849A1 (en) * 2005-08-19 2007-04-05 Slothouber Louis P System and method for recommending items of interest to a user
CN103714138A (en) * 2013-12-20 2014-04-09 南京理工大学 Area data visualization method based on density clustering
US20170083920A1 (en) * 2015-09-21 2017-03-23 Fair Isaac Corporation Hybrid method of decision tree and clustering technology
CN106096748A (en) * 2016-04-28 2016-11-09 武汉宝钢华中贸易有限公司 Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms
CN106682915A (en) * 2016-12-25 2017-05-17 东北电力大学 User cluster analysis method in customer care system
CN107862342A (en) * 2017-11-27 2018-03-30 清华大学 Lift the visual analysis system and method for tree-model
CN108256907A (en) * 2018-01-09 2018-07-06 北京腾云天下科技有限公司 A kind of construction method and computing device of customer grouping model
CN108492194A (en) * 2018-03-06 2018-09-04 平安科技(深圳)有限公司 Products Show method, apparatus and storage medium
CN110874604A (en) * 2018-08-30 2020-03-10 Tcl集团股份有限公司 Model training method and terminal equipment
CN109376759A (en) * 2018-09-10 2019-02-22 平安科技(深圳)有限公司 User information classification method, device, computer equipment and storage medium
CN110276382A (en) * 2019-05-30 2019-09-24 平安科技(深圳)有限公司 Listener clustering method, apparatus and medium based on spectral clustering
CN111783840A (en) * 2020-06-09 2020-10-16 苏宁金融科技(南京)有限公司 Visualization method and device for random forest model and storage medium

Similar Documents

Publication Publication Date Title
CN110222272B (en) Potential customer mining and recommending method
US6507851B1 (en) Customer information retrieving method, a customer information retrieving apparatus, a data preparation method, and a database
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
CN106611344A (en) Method and device for mining potential customers
Wu et al. User value identification based on improved RFM model and k-means++ algorithm for complex data analysis
CN109636482B (en) Data processing method and system based on similarity model
CN111949887A (en) Item recommendation method and device and computer-readable storage medium
CN115131101A (en) Individualized intelligent recommendation system for insurance products
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
CN111209469A (en) Personalized recommendation method and device, computer equipment and storage medium
CN114861050A (en) Feature fusion recommendation method and system based on neural network
US20130325651A1 (en) Product recommendation
CN113159881B (en) Data clustering and B2B platform customer preference obtaining method and system
Rashi et al. An AI-Based Customer Relationship Management Framework for Business Applications
CN113326432A (en) Model optimization method based on decision tree and recommendation method
CN113327132A (en) Multimedia recommendation method, device, equipment and storage medium
CN109146606B (en) Brand recommendation method, electronic equipment, storage medium and system
CN112508074B (en) Visual display method, system and readable storage medium
Chen et al. Business analytics for used car price prediction with statistical models
CN112508074A (en) Visualization display method and system and readable storage medium
US20230230143A1 (en) Product recommendation system, product recommendation method, and recordingmedium storing product recommendation program
CN111400567B (en) AI-based user data processing method, device and system
Siemes Churn prediction models tested and evaluated in the Dutch indemnity industry
CN113837843A (en) Product recommendation method, device, medium and electronic equipment
CN110443646B (en) Product competition relation network analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant