CN114936252A - Credit card customer attrition data dimension reduction visual analysis method based on semantic feature set - Google Patents

Credit card customer attrition data dimension reduction visual analysis method based on semantic feature set Download PDF

Info

Publication number
CN114936252A
CN114936252A CN202210436751.2A CN202210436751A CN114936252A CN 114936252 A CN114936252 A CN 114936252A CN 202210436751 A CN202210436751 A CN 202210436751A CN 114936252 A CN114936252 A CN 114936252A
Authority
CN
China
Prior art keywords
data
credit card
semantic
analysis
dimension reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210436751.2A
Other languages
Chinese (zh)
Other versions
CN114936252B (en
Inventor
王可
罗孟华
龙洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou University of Finance and Economics
Original Assignee
Guizhou University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou University of Finance and Economics filed Critical Guizhou University of Finance and Economics
Priority to CN202210436751.2A priority Critical patent/CN114936252B/en
Publication of CN114936252A publication Critical patent/CN114936252A/en
Application granted granted Critical
Publication of CN114936252B publication Critical patent/CN114936252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The invention discloses a credit card customer attrition data dimension reduction visualization analysis method based on a semantic feature set. The invention relates to the technical field of data visualization analysis, wherein the method performs visualization analysis on characteristics of a data set in a multi-view fusion mode and collects data lost by credit card customers; performing dimensionality reduction visual analysis on credit card customers based on a T-SNE algorithm and a PCA method, and analyzing data lost by the credit card customers; and analyzing the credit card customer data based on the semantic feature group, performing dimension reduction visualization analysis, and analyzing the data lost by the credit card customers. Based on the essence of visualization tasks, the invention provides a dimensionality reduction visualization method based on a semantic feature set. A multi-view fusion visualization method, a T-SNE algorithm and a PCA dimension reduction visualization method are contrastively analyzed, and a credit card customer attrition data dimension reduction visualization analysis method based on a semantic feature set is realized.

Description

Credit card customer attrition data dimension reduction visual analysis method based on semantic feature set
Technical Field
The invention relates to the technical field of data visualization analysis, in particular to a credit card customer attrition data dimension reduction visualization analysis method based on a semantic feature set.
Background
Along with the analysis of a large amount of data, visualization technology is more and more emphasized by professionals and ordinary users because of intuition, effectiveness and convenience in understanding.
In the era of digital economy, the financial industry concerned with social production and people's life is the place of heavy interest in data generation and analysis, and the demand for visualization technology is ever increasing. The network loan will postpone the existing blind trend, and the credit card will gradually return to the mainstream business of personal financial small loan, so it has very important meaning to the visual research of the customer data.
With the development of hardware technology, the large-screen interaction and multi-view mixed visualization technology brings unprecedented visual impact to the industry, obtains a certain degree of influence, and becomes the mainstream of the current visualization technology applied in the industry. The idea of multiple views is to see and analyze the same thing from multiple angles, so as to obtain more comprehensive understanding and better analysis result. The method is widely applied to aspects of visual analysis technology, machine learning and the like. However, from the perspective of visualization technology, the nature of multi-view is that a plurality of visual charts are placed on the same page to supplement each other, so that the effect of viewing data from multiple angles is achieved to a certain extent. But the multi-view method has no substantial progress in the understanding of visualization technology and data itself, and is an improvement in the form of representation.
The single view representation is a relatively classical approach with respect to the multi-view approach. However, the existing high-dimensional data is well presented in a single view, but the difficulty of the current visualization research still remains, and the core of the breakthrough development of the visualization technology is also the core. A multidimensional data single-view visualization technology is mainly characterized in that multidimensional data are subjected to dimension reduction and then visualized analysis and display through a linear dimension reduction method represented by Principal Component Analysis (PCA) and a nonlinear dimension reduction algorithm represented by T-distributed Stochastic Neighbor Embedding (T-SNE). In the real dataset, linear data are relatively few, and the T-SNE algorithm is widely used.
In many researches, a visual analysis and dimension reduction method of high-dimensional data are the core. And the simple discussion of the data dimension reduction method of the data itself without considering the semantics of the data itself is a biased view in the early stage of the scientific development of the data. At the present stage, more and more scholars think that [20] The dimension reduction algorithm of the data and the actual semantic combination analysis of the features are the maximum utilization of the effective information of the data.
Although the currently proposed dimension reduction visualization method mostly takes mathematical methods such as T-SNE and the like as the main methods, the actual application effect of many real data levels is not ideal. Aiming at the actual situation of credit card data, under the condition that numerical data and non-numerical data exist simultaneously, the difficulty of data dimension reduction visualization is increased. In order to solve the problems, the invention provides a semantic feature grouping method aiming at public data set credit card customer churners on the basis of combing the actual semantics of each feature of a data set, and simultaneously performs effect comparison of a single view and a multi-view by combining a mathematical dimension reduction method, thereby realizing dimension reduction visual display based on actual problem guidance.
Disclosure of Invention
The invention provides a credit card customer loss data dimension reduction visual analysis method based on semantic feature set for overcoming the defects of the prior art, and the invention provides the following technical scheme:
a credit card customer churn data dimension reduction visualization analysis method based on semantic feature sets comprises the following steps:
step 1: carrying out visual analysis on the characteristics of the data set in a multi-view fusion mode, and collecting data lost by credit card customers;
step 2: performing dimensionality reduction visual analysis on credit card customers based on a T-SNE algorithm and a PCA method, and analyzing data lost by the credit card customers;
and step 3: and analyzing the credit card customer data based on the semantic feature group, performing dimension reduction visualization analysis, and analyzing the data lost by the credit card customers.
Preferably, the step 1 specifically comprises:
step 1.1: performing semantic analysis on all data features, establishing groups, dividing all data features into unique semantic groups according to semantics, and ensuring no cross among the groups;
performing semantic analysis on all data features according to formula (1) and formula (2):
Figure BDA0003612801080000031
Figure BDA0003612801080000032
step 1.2: and when the two semantic features contain the same semantic interpretation factors, defining semantic redundancy as semantic redundancy, deleting the semantic redundancy as shown in a formula (3), and simultaneously calculating a correlation coefficient as shown in a formula (4) as a reference to determine a final semantic feature group:
Figure BDA0003612801080000033
Figure BDA0003612801080000034
step 1.3: after the groups are selected, a plurality of features in each group are subjected to numerical analysis projection according to covariance and as shown in a formula (5), so as to meet the final dimension reduction quantity requirement:
Figure BDA0003612801080000035
preferably, the step 2 specifically comprises:
step 2.1: analyzing the credit card customer churn data by adopting a current popular multi-view fusion method; displaying from the angles of client loss ratio, age level, education level, data ratio of the current graduate married male gold card client, most data characteristic value, transaction data spiral and the like; removing the client number of the original data, carrying out T-SNE dimension reduction on the non-numerical data after numerical conversion is carried out on the non-numerical data, and carrying out color labeling on the lost clients and the non-lost clients;
step 2.2: carrying out PCA (principal component analysis) dimension reduction on the same credit card attrition client data set, and carrying out color labeling on attrition clients and non-attrition clients;
step 2.3: and (3) combing the loss data characteristics of the credit card customers, grouping the remaining 19 semantic characteristics after removing the customer numbers and the data labels, deleting the semantic characteristics by referring to the calculation result of a correlation coefficient formula 4, and finally confirming the grouping.
Preferably, the step 3 specifically comprises:
roughly dividing all semantic features into two groups of client personal information and transaction information, wherein the personal information group contains a lot of non-numerical data, and the transaction information group contains all numerical data; from the perspective of a semantic feature group, performing semantic analysis on the personal information group, wherein the semantic analysis is related to attrition customers and personal information; the age, sex and marital status of the client are relatively unrelated and are basic information of the client; education, annual income and credit card type are important bases for the customer to apply for the credit card; according to semantic redundancy rules, taking the age, gender and marital status of a client as a client personal information semantic feature set;
the correlation degree between the monthly number of the transaction account and the age of the client is higher by referring to the coefficient result; the total transaction number has a certain correlation with the transaction total number and the transaction total amount in the past 12 months, and the correlation between the total transaction number and the transaction total amount is higher; credit card lines are highly correlated with open purchase credit lines; the correlation between the total turnover number and the utilization rate is high; there is a relationship between the number of transactions in the fourth and first quarter and the change in the transaction amount.
Preferably, after removing according to the semantic redundancy rule, the correlation coefficient refers to the principle that more than 0.6 is 1 and 2 between 0.2 and 0.6, and the principle that the number of the related multiple semantic features is the minimum is reserved for final confirmation, so that the transaction information semantic feature group is obtained, namely the transaction number, the transaction amount in the fourth and first quarter, the transaction amount change and the average utilization rate.
A credit card customer attrition data dimension reduction visualization analysis system based on semantic feature sets, the system comprising:
the view fusion module is used for carrying out visual analysis on the characteristics of the data set in a multi-view fusion mode and collecting data of credit card customer loss;
the dimension reduction analysis module is used for performing dimension reduction visual analysis on the credit card customers based on a T-SNE algorithm and a PCA method and analyzing data lost by the credit card customers;
and the semantic analysis module analyzes the credit card customer data based on the semantic feature group, performs dimension reduction visualization analysis and analyzes data lost by the credit card customer.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The invention has the following beneficial effects:
based on the essence of visualization tasks, the invention provides a dimensionality reduction visualization method based on a semantic feature set. The visualization problem of credit card customer churn data is discussed based on a Kaggle public data set, a multi-view fusion visualization method, a T-SNE algorithm and a PCA dimension reduction visualization method are contrastively analyzed, the credit card customer churn data dimension reduction visualization analysis method based on a semantic feature group is realized, the non-uniqueness and good interpretability of the semantic feature group are further analyzed and verified on the basis, and a new thought is provided for visualization technology and financial data analysis.
Drawings
FIG. 1 is a flow chart of a semantic feature set dimension reduction visualization method;
FIG. 2 is a credit card customer data analysis based on multi-view visualization;
FIG. 3 is a two-dimensional visual analysis of credit card customer attrition data based on the T-SNE algorithm;
FIG. 4 is a three-dimensional visualization analysis of credit card customer attrition data based on the T-SNE algorithm;
FIG. 5 is a two-dimensional visualization analysis of credit card customer attrition data based on the PCA algorithm;
FIG. 6 is a three-dimensional visualization analysis of credit card customer attrition data based on the PCA algorithm;
FIG. 7 is a calculation of correlation coefficients for credit card customer churn data;
FIG. 8 is a two-dimensional visualization analysis of credit card customer attrition data based on semantic feature sets;
FIG. 9 is a three-dimensional visualization analysis of credit card customer attrition data based on semantic feature sets;
FIG. 10 is a two-dimensional visualization analysis based on another set of semantic feature groups;
FIG. 11 is a three-dimensional visualization analysis based on another set of semantic feature sets;
FIG. 12 is a two-dimensional visualization analysis validation based on semantic feature sets;
FIG. 13 is a three-dimensional visualization analysis validation based on semantic feature sets.
Detailed Description
The present invention is described in detail below with reference to specific examples.
The first embodiment is as follows:
as shown in fig. 1 to 13, the specific optimized technical solution adopted to solve the above technical problems of the present invention is: the invention relates to a credit card customer attrition data dimension reduction visualization analysis method based on a semantic feature group.
A credit card customer churn data dimension reduction visualization analysis method based on semantic feature sets comprises the following steps:
step 1: carrying out visual analysis on the characteristics of the data set in a multi-view fusion mode, and collecting data lost by credit card customers;
the step 1 specifically comprises the following steps:
step 1.1: performing semantic analysis on all data characteristics, establishing groups, dividing all data characteristics into unique semantic groups according to semantics, and ensuring no intersection among the groups;
performing semantic analysis on all data features according to formula (1) and formula (2):
Figure BDA0003612801080000071
Figure BDA0003612801080000072
step 1.2: and when the two semantic features contain the same semantic interpretation factor, defining semantic redundancy to delete as shown in formula (3), and simultaneously calculating a correlation coefficient as shown in formula (4) as a reference to determine a final semantic feature group:
Figure BDA0003612801080000073
Figure BDA0003612801080000074
step 1.3: after the groups are selected, a plurality of features in each group are subjected to numerical analysis projection according to covariance and as shown in a formula (5) so as to meet the requirement of final dimension reduction quantity:
Figure BDA0003612801080000075
step 2: performing dimensionality reduction visual analysis on credit card customers based on a T-SNE algorithm and a PCA method, and analyzing data lost by the credit card customers;
the step 2 specifically comprises the following steps:
step 2.1: analyzing the credit card customer churn data by adopting a current popular multi-view fusion method; displaying from the angles of client loss ratio, age level, education level, data ratio of the current graduate married male gold card client, most data characteristic value, transaction data spiral and the like; removing the client number of the original data, carrying out T-SNE dimension reduction on the non-numerical data after numerical conversion is carried out on the non-numerical data, and carrying out color labeling on the lost clients and the non-lost clients;
step 2.2: carrying out PCA (principal component analysis) dimension reduction on the same credit card attrition client data set, and carrying out color marking on attrition clients and non-attrition clients;
step 2.3: and (3) combing the loss data characteristics of the credit card customers, grouping the remaining 19 semantic characteristics after removing the customer numbers and the data labels, deleting the semantic characteristics by referring to the calculation result of a correlation coefficient formula 4, and finally confirming the grouping.
And 3, step 3: and analyzing the credit card customer data based on the semantic feature group, performing dimension reduction visualization analysis, and analyzing the data lost by the credit card customers.
The step 3 specifically comprises the following steps:
roughly dividing all semantic features into two groups of client personal information and transaction information, wherein the personal information group contains a lot of non-numerical data, and the transaction information group contains all numerical data; from the perspective of the semantic feature group, semantic analysis of the attrition customers and the personal information association is mainly performed in the personal information group; the age, sex and marital status of the client are relatively unrelated and are basic information of the client; education, annual income and credit card type are important bases for the customer to apply for the credit card; according to semantic redundancy rules, taking the age, gender and marital status of a client as a client personal information semantic feature set;
the correlation degree between the month number of the transaction account and the age of the client is higher according to the reference coefficient result; the total transaction number has a certain correlation with the total transaction number and the total transaction amount in the past 12 months, and the correlation between the total transaction number and the total transaction amount is high; credit card lines are highly correlated with open purchase credit lines; the correlation between the total turnover number and the utilization rate is high; there is a relationship between the number of transactions in the fourth and first quarter and the change in the transaction amount.
Preferably, after removing according to the semantic redundancy rule, the correlation coefficient refers to the principle that more than 0.6 is 1 and 2 between 0.2 and 0.6, and the principle that the number of the related multiple semantic features is the minimum is reserved for final confirmation, so that the transaction information semantic feature group is obtained, namely the transaction number, the transaction amount in the fourth and first quarter, the transaction amount change and the average utilization rate.
The invention also provides a credit card customer attrition data dimension reduction visualization analysis system based on the semantic feature group, which comprises:
the view fusion module is used for carrying out visual analysis on the characteristics of the data set in a multi-view fusion mode and collecting data of credit card customer loss;
the dimension reduction analysis module is used for performing dimension reduction visual analysis on the credit card customers based on a T-SNE algorithm and a PCA method and analyzing data lost by the credit card customers;
and the semantic analysis module analyzes the credit card customer data based on the semantic feature group, performs dimension reduction visualization analysis and analyzes data lost by the credit card customer.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The second embodiment is as follows:
the credit card customer churn data set used by the present invention is derived from the public data set of the Kaggle contest. The data set comprises 10127 pieces of customer information, 21 dimensions of data characteristics, specific characteristic information and statistical analysis according to the table, all data characteristics are combed from the perspective of financial data analysis, a main mode for analyzing whether credit card customers lose is explored, visual verification display is further carried out, and support is provided for data analysis of the credit card customers. The visualization task of the present invention is defined as follows:
and the task 1 visually analyzes the characteristics of the data set in a multi-view fusion mode and explores the data characteristics of credit card customer loss.
And task 2 is used for carrying out dimension reduction visual analysis on credit card customers and discussing the lost data characteristics of the credit card customers from the perspective of the T-SNE algorithm and the PCA method.
And 3, analyzing the credit card client data from the perspective of the semantic feature set, performing dimension reduction visual analysis on the credit card client data, and discussing and analyzing the data characteristics of credit card client loss.
TABLE 1 data set characterization information
Figure BDA0003612801080000101
Figure BDA0003612801080000111
With the rise of the research enthusiasm of the machine learning method, the mainstream data dimension reduction method mainly adopts pure mathematical calculation without considering the semantic meaning of the data characteristics. In the traditional mode, from the perspective of domain knowledge, characteristics of data per se are not researched too much. The two methods have certain emphasis on the angle and are seemingly opposite, but the comprehensive consideration is not contradictory. When faced with a real problem, a certain degree of domain knowledge represents the prior experience of the professional problem, while the data values themselves represent the specificity of the problem. Researchers need to consider the importance of the domain knowledge and cannot be bound by the domain knowledge; meanwhile, practical problems need to be solved in a targeted manner from the characteristics of the data. Based on this thinking, the present invention proposes a Semantic feature group (SG) method. Firstly, semantic analysis is carried out on all data characteristics according to formula (1) and formula (2), and grouping is established. All data features need to be divided into unique semantic groups according to semantics, and no cross connection among the groups is guaranteed. If various grouping possibilities exist in the semantic features, the fact that more than one grouping method exists in the data feature set is indicated, and meanwhile the reasonability of the grouping can be verified by using the visualization result after the grouping. And then, semantic feature deletion is carried out according to the importance degree of the semantic features to the actual problem, at the moment, if the two semantic features contain the same semantic interpretation factor, semantic redundancy (shown as a formula 3) is defined to carry out deletion, and correlation coefficient calculation (shown as a formula 4) is carried out as reference to determine a final semantic feature group. Due to the dimension limitation of the dimension reduction visualization problem, the result is mostly concentrated in 1-3 groups, namely plane visualization and three-dimensional visualization. After the groups are selected, a plurality of features in each group are subjected to numerical analysis projection according to covariance (as shown in formula 5) to meet the final dimension reduction quantity requirement, and the specific flow is shown in fig. 1.
For the current dimension reduction method, some statistical characteristics of the data are mainly considered, or all the data are projected to a low-dimensional space meeting a certain condition, so that the purpose of dimension reduction and visualization is achieved. In fact, this semantic-independent approach can create potential problems for data analysis. If the A-feature semantic of the data is the height of a human, it is evident that there is an outlier in the 4 th data. If the semantics of the C feature are the monthly income of the customer, then projecting the monthly income data onto the height data while reducing the dimensions can create some confusion in the interpretability of new data. In important applications, a poorly explained method or model is only used as a reference, and not as a true decision support solution.
Figure BDA0003612801080000131
Figure BDA0003612801080000132
Figure BDA0003612801080000133
Figure BDA0003612801080000134
Figure BDA0003612801080000135
Table 2 semantic feature group data example
A B C D
1 1.6 80 15000 31
2 1.8 71 8000 42
3 2.0 100 5000 25
4 3.1 120 20000 13
5 1.4 40 500 8
The number of the needed dimensionality can be freely selected by pure numerical dimensionality reduction, and from the practical semantic perspective of data features, the invention considers that the lower limit of dimensionality reduction exists once the data set is materialized according to the obtained specific data. As shown in the data in the table 2, if the characteristic semantics are height, weight, monthly income and the number of months of credit card transaction, the characteristic semantics can be divided into two groups of client personal information and financial information, the dimension can be reduced to 2 dimensions at most, and if the characteristic semantics are forced to be reduced to 1 dimension through a mathematical means, the interpretability and the reasonableness of the data information are bound to cause problems.
First, the current popular multi-view fusion method is used to analyze credit card customer churn data. The display is performed in terms of customer loss ratio, age level, education level, data ratio of the financial card customers of the married men of the division graduation, most data characteristic value, various transaction data spiral and the like, and is shown in figure 2.
As can be seen from fig. 1, the multi-view fusion visualization method for analyzing the credit card customer churn data has certain convenience, and different sides of the unified data can be seen from multiple angles, so that different information can be obtained for mutual supplement. However, it is relatively difficult to quickly propose a general evaluation, and there is a certain distance between an attrition client and a non-attrition client, which is obviously different from the attrition client in the visualization result.
The T-SNE algorithm is a nonlinear mapping method for converting actual distances among data into probabilities, solves the problems of crowding and optimization in mapping by using long tail T distribution, is good at mapping high-dimensional data to 2-3 dimensions, and has important influence in the existing dimension reduction visualization method.
According to the invention, the client number of the original data is removed, the non-numerical data is subjected to numerical conversion, the T-SNE dimension reduction is carried out on the non-numerical data, the color marking is carried out on the lost client and the non-lost client, and the two-dimensional and three-dimensional visualization effects are shown in figures 3 and 4.
As can be seen from the figure, the T-SNE algorithm has a certain effect on the visual analysis of the credit card customer churn data, but most of red and blue overlapped areas show that the difference between churn customers and non-churn customers is controlled, and the T-SNE algorithm is not suitable and does not achieve a good distinguishing effect.
The PCA algorithm is used as a linear dimensionality reduction method and is the basis of most algorithms. The method projects multidimensional data to several dimensions with the largest covariance matrix eigenvalues, thereby keeping the largest data information quantity as far as possible. Although the single appearance in the field of computer vision and natural language processing is less in recent years, the method is simple, convenient and good in calculation efficiency, so that the method can be combined with a plurality of methods for use, and still has wide application position and expanded research value in the aspect of numerical calculation today.
The invention carries out PCA dimension reduction on the same credit card attrition client data set, carries out color labeling on attrition clients and non-attrition clients, and has two-dimensional and three-dimensional visualization effects as shown in figures 5 and 6.
As can be seen from fig. 5, the PCA algorithm has a certain effect on the visual analysis of the credit card customer churn data, and churn customers have their own relatively independent red regions, but the red-blue overlapped regions are still obvious. As can be seen from fig. 6, data of attrition customers and non-attrition customers in three-dimensional space are also difficult to distinguish, and the PCA algorithm does not achieve the expected visual distinguishing effect for the problem of the present invention.
The invention combs the lost data characteristics of credit card customers, groups the remaining 19 semantic characteristics after removing the customer numbers and data labels, deletes the semantic characteristics by referring to the calculation result (shown in figure 7) of a correlation coefficient (formula 4), and finally confirms the grouping.
First, based on the specific information in Table 1, the overall semantic features are roughly divided into two groups of customer personal information (rows 3-9 in Table 1) and transaction information (rows 10-21 in Table 1). The personal information group contains a lot of non-numerical data, and the transaction information group contains all numerical data. Although non-numerical data can be converted into numerical data for quantitative calculation, the existing quantitative conversion mode is more suitable to be used as auxiliary reference information rather than decisive information from the perspective of the semantic feature set of the invention. Therefore, semantic analysis of attrition customers and personal information association is mainly performed in the personal information group. The age, sex and marital status of the client are relatively unrelated and are basic information of the client; education level, annual income and credit card type are important bases when a customer applies for a credit card, but no necessary relation is provided between the education level, the annual income and the credit card type and whether the customer loses; the number of family members of the client and the personal marital status of the client have certain redundant information relationship. According to the semantic redundancy rule (formula 3), the age, the sex and the marital status of the client are taken as the semantic feature set of the personal information of the client. Referring to the correlation coefficient result of fig. 6 later, the correlation between the transaction account month number and the age of the client is high; the total transaction number has a certain correlation with the transaction total number and the transaction total amount in the past 12 months, and the correlation between the total transaction number and the transaction total amount is higher; credit card lines are highly correlated with open purchase credit lines; the correlation between the total turnover degree and the utilization rate is high; there is a relationship between the number of transactions in the fourth and first quarter and the change in the transaction amount. After removing according to semantic redundancy rules, the correlation coefficient refers to the principle that more than 0.6 is 1 and 2 between 0.2 and 0.6, and the principle that the number of the related semantic features is the minimum is reserved for final confirmation, so that the transaction information semantic feature group is obtained, namely the transaction number, the transaction amount in the fourth and first quarters, the transaction amount change and the average utilization rate.
Then, the data are projected to 2-3 dimensional space according to the covariance size, and two-dimensional and three-dimensional visualization is performed, and the obtained result is shown in fig. 8 and fig. 9.
As can be seen from fig. 8 and 9, the semantic feature set method has a clear and definite effect on the visual analysis of the churn data of credit card customers, churn customers and non-churn customers are basically distinguished, and there is almost no red-blue overlapped region. The semantic feature set visualization analysis method achieves the expected visualization distinguishing effect on the problems researched by the invention.
The method is based on the credit card loss data of the Kaggle competition, compares a multi-view fusion visualization method and a T-SNE and PCA dimension reduction visualization method, and realizes clear visualization display of the credit card loss client data in a semantic feature set mode.
Further research and analysis show that the grouping mode is not unique according to different understanding of the characteristics of the words. For example, from the perspective of financial data analysis, the actual transaction data information of the customer has a great significance for judging whether the customer runs away, and the personal information of the customer is more important when applying for a credit card, so from the perspective of semantics, the personal information group of the customer can be deleted, the dimensionality reduction visualization analysis is directly performed on the semantic characteristic group of the transaction information of the customer, the correlation characteristic with the personal information group is removed, and finally the transaction quantity, the transaction quantity change of the fourth and first seasons and the transaction quantity change are confirmed according to the same rule, and the average utilization rate is obtained, and the visualization result is shown in fig. 9 and fig. 10.
As can be seen from fig. 10 and 11, the semantic feature group also has a good visual differentiation effect on the visual analysis of the credit card customer churn data.
From fig. 8 to fig. 11, the analysis result can be further verified, the customer transaction information is used as a key for analyzing whether the customer is lost, and the customer personal information is used as an auxiliary function, so that the invention provides auxiliary visualization by forming a customer expenditure semantic feature group by the number of family members and annual income of the customer, and obtains fig. 12 and fig. 13, and the group of visualization results verifies the effectiveness of the method of the invention.
The above description is only a preferred embodiment of the credit card client churn data dimension reduction visualization analysis method based on the semantic feature group, and the protection scope of the credit card client churn data dimension reduction visualization analysis method based on the semantic feature group is not limited to the above embodiments, and all technical solutions belonging to the idea belong to the protection scope of the present invention. It should be noted that modifications and variations which do not depart from the gist of the invention will be those skilled in the art to which the invention pertains and which are intended to be within the scope of the invention.

Claims (8)

1. A credit card customer attrition data dimension reduction visualization analysis method based on semantic feature sets is characterized by comprising the following steps: the method comprises the following steps:
step 1: carrying out visual analysis on the characteristics of the data set in a multi-view fusion mode, and collecting data lost by credit card customers;
and 2, step: performing dimensionality reduction visual analysis on credit card customers based on a T-SNE algorithm and a PCA method, and analyzing data lost by the credit card customers;
and step 3: and analyzing the credit card customer data based on the semantic feature group, performing dimension reduction visualization analysis, and analyzing the data lost by the credit card customers.
2. The credit card customer attrition data dimension reduction visualization analysis method based on semantic feature set as claimed in claim 1, wherein: the step 1 specifically comprises the following steps:
step 1.1: performing semantic analysis on all data features, establishing groups, dividing all data features into unique semantic groups according to semantics, and ensuring no cross among the groups;
performing semantic analysis on all data features according to formula (1) and formula (2):
Figure FDA0003612801070000011
Figure FDA0003612801070000012
step 1.2: and when the two semantic features contain the same semantic interpretation factor, defining semantic redundancy to delete as shown in formula (3), and simultaneously calculating a correlation coefficient as shown in formula (4) as a reference to determine a final semantic feature group:
Figure FDA0003612801070000013
Figure FDA0003612801070000014
step 1.3: after the groups are selected, a plurality of features in each group are subjected to numerical analysis projection according to covariance and as shown in a formula (5) so as to meet the requirement of final dimension reduction quantity:
Figure FDA0003612801070000021
3. the credit card customer attrition data dimension reduction visualization analysis method based on semantic feature set as claimed in claim 2, wherein: the step 2 specifically comprises the following steps:
step 2.1: analyzing the credit card customer churn data by adopting a current popular multi-view fusion method; showing from the angles of customer loss proportion, age level, education level, data proportion of the financial card customers of the married men of the division graduation, most data characteristic value, various transaction data spirals and the like; removing the client number of the original data, performing value conversion on non-value data, performing T-SNE (T-SNE) dimension reduction on the non-value data, and performing color marking on lost clients and non-lost clients;
step 2.2: carrying out PCA (principal component analysis) dimension reduction on the same credit card attrition client data set, and carrying out color marking on attrition clients and non-attrition clients;
step 2.3: and (3) combing the loss data characteristics of the credit card customers, grouping the remaining 19 semantic characteristics after removing the customer numbers and the data labels, deleting the semantic characteristics by referring to the calculation result of a correlation coefficient formula 4, and finally confirming the grouping.
4. The credit card customer attrition data dimension reduction visualization analysis method based on semantic feature set as claimed in claim 3, wherein: the step 3 specifically comprises the following steps:
roughly dividing all semantic features into two groups of client personal information and transaction information, wherein the personal information group contains a lot of non-numerical data, and the transaction information group contains all numerical data; from the perspective of the semantic feature group, semantic analysis of the attrition customers and the personal information association is mainly performed in the personal information group; the age, the sex and the marital status of the client have relatively no association relationship and are basic information of the client; education, annual income and credit card type are important bases for the customer to apply for the credit card; according to semantic redundancy rules, taking the age, gender and marital status of a client as a client personal information semantic feature set;
the correlation degree between the month number of the transaction account and the age of the client is higher according to the reference coefficient result; the total transaction number has a certain correlation with the total transaction number and the total transaction amount in the past 12 months, and the correlation between the total transaction number and the total transaction amount is high; the credit card limit is highly correlated with the open purchase credit limit; the correlation between the total turnover number and the utilization rate is high; there is a relationship between the number of transactions in the fourth and first quarter and the change in the transaction amount.
5. The credit card customer attrition data dimension reduction visualization analysis method based on semantic feature set as claimed in claim 4, wherein: after removing according to semantic redundancy rules, the correlation coefficient refers to the principle that more than 0.6 is 1 and 2 between 0.2 and 0.6, and the principle that the number of strips is minimum is reserved when multiple semantic features are correlated for final confirmation, so that the obtained transaction information semantic feature group is transaction number, transaction amount in the fourth and first quarter and the change of the transaction amount, and the average utilization rate.
6. A credit card customer attrition data dimension reduction visualization analysis system based on semantic feature sets is characterized in that: the system comprises:
the view fusion module is used for performing visual analysis on the characteristics of the data set in a multi-view fusion mode and collecting credit card customer loss data;
the system comprises a dimension reduction analysis module, a service analysis module and a service analysis module, wherein the dimension reduction analysis module is used for performing dimension reduction visual analysis on credit card customers based on a T-SNE algorithm and a PCA method and analyzing loss data of the credit card customers;
and the semantic analysis module analyzes the credit card customer data based on the semantic feature group, performs dimension reduction visualization analysis and analyzes data lost by the credit card customer.
7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that: the processor, when executing the computer program, realizes the steps of the method of any one of claims 1 to 5.
8. A computer-readable storage medium having a computer program stored thereon, the computer program comprising: the computer program, when executed by a processor, implementing the steps of the method of any one of claims 1 to 5.
CN202210436751.2A 2022-04-24 2022-04-24 Credit card customer attrition data dimension reduction visual analysis method based on semantic feature set Active CN114936252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210436751.2A CN114936252B (en) 2022-04-24 2022-04-24 Credit card customer attrition data dimension reduction visual analysis method based on semantic feature set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210436751.2A CN114936252B (en) 2022-04-24 2022-04-24 Credit card customer attrition data dimension reduction visual analysis method based on semantic feature set

Publications (2)

Publication Number Publication Date
CN114936252A true CN114936252A (en) 2022-08-23
CN114936252B CN114936252B (en) 2023-01-31

Family

ID=82861435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210436751.2A Active CN114936252B (en) 2022-04-24 2022-04-24 Credit card customer attrition data dimension reduction visual analysis method based on semantic feature set

Country Status (1)

Country Link
CN (1) CN114936252B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050234762A1 (en) * 2004-04-16 2005-10-20 Pinto Stephen K Dimension reduction in predictive model development
US20110244919A1 (en) * 2010-03-19 2011-10-06 Aller Joshua V Methods and Systems for Determining Image Processing Operations Relevant to Particular Imagery
KR20110125075A (en) * 2010-05-12 2011-11-18 주식회사 아이네크 Semantic search method and system for supporting meaningful grouping and visual navigation based on bibliography metadata
CN104750795A (en) * 2015-03-12 2015-07-01 北京云知声信息技术有限公司 Intelligent semantic searching system and method
CN107239448A (en) * 2017-06-07 2017-10-10 长沙学院 A kind of explanatory principal component analytical method
CN107832713A (en) * 2017-11-13 2018-03-23 南京邮电大学 A kind of human posture recognition method based on OptiTrack
CN108805361A (en) * 2018-06-21 2018-11-13 国网安徽省电力公司合肥供电公司 A kind of method for visualizing of fusion city electricity consumption and distributed power generation
CN109285063A (en) * 2017-07-19 2019-01-29 上海臻客商务咨询有限公司 A kind of credit card equity tendency judgement system based on big data analysis
US20190087409A1 (en) * 2017-09-15 2019-03-21 International Business Machines Corporation Visual comparison of documents using latent semantic differences
CN109583482A (en) * 2018-11-13 2019-04-05 河海大学 A kind of infrared human body target image identification method based on multiple features fusion Yu multicore transfer learning
CN111611323A (en) * 2020-04-09 2020-09-01 山东财经大学 Data fusion-oriented iterative structured multi-view subspace clustering method, device and readable storage medium
US20210209125A1 (en) * 2019-05-21 2021-07-08 Sisense Ltd. System and method for generating analytical insights utilizing a semantic knowledge graph

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050234762A1 (en) * 2004-04-16 2005-10-20 Pinto Stephen K Dimension reduction in predictive model development
US20110244919A1 (en) * 2010-03-19 2011-10-06 Aller Joshua V Methods and Systems for Determining Image Processing Operations Relevant to Particular Imagery
KR20110125075A (en) * 2010-05-12 2011-11-18 주식회사 아이네크 Semantic search method and system for supporting meaningful grouping and visual navigation based on bibliography metadata
CN104750795A (en) * 2015-03-12 2015-07-01 北京云知声信息技术有限公司 Intelligent semantic searching system and method
CN107239448A (en) * 2017-06-07 2017-10-10 长沙学院 A kind of explanatory principal component analytical method
CN109285063A (en) * 2017-07-19 2019-01-29 上海臻客商务咨询有限公司 A kind of credit card equity tendency judgement system based on big data analysis
US20190087409A1 (en) * 2017-09-15 2019-03-21 International Business Machines Corporation Visual comparison of documents using latent semantic differences
CN107832713A (en) * 2017-11-13 2018-03-23 南京邮电大学 A kind of human posture recognition method based on OptiTrack
CN108805361A (en) * 2018-06-21 2018-11-13 国网安徽省电力公司合肥供电公司 A kind of method for visualizing of fusion city electricity consumption and distributed power generation
CN109583482A (en) * 2018-11-13 2019-04-05 河海大学 A kind of infrared human body target image identification method based on multiple features fusion Yu multicore transfer learning
US20210209125A1 (en) * 2019-05-21 2021-07-08 Sisense Ltd. System and method for generating analytical insights utilizing a semantic knowledge graph
CN111611323A (en) * 2020-04-09 2020-09-01 山东财经大学 Data fusion-oriented iterative structured multi-view subspace clustering method, device and readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SHIPING HUANG等: "Exploration of dimensionality reduction for text visualization", 《LOW POWER ELECTRONICS AND DESIGN》 *
ZOHEB H. BORBORA等: "User Behavior Modelling Approach for Churn Prediction in Online Games", 《2012 INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY, RISK AND TRUST AND 2012 INTERNATIONAL CONFERNECE ON SOCIAL COMPUTING》 *
张长青: "基于自表达的多视角子空间聚类方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *
郑亚茹: "基于深度学习的视觉语义SLAM技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN114936252B (en) 2023-01-31

Similar Documents

Publication Publication Date Title
Kersten NEGO—Group decision support system
Didimo et al. Network visualization for financial crime detection
CN107808337A (en) Factor Clustering and device, equipment and storage medium
Didenko et al. Insurance Innovations as a Part of the Financial Inclusion
Rentzmann et al. Unsupervised learning: What is a sports car?
CN108876216A (en) Creative capability digitization method and system
CN114936252B (en) Credit card customer attrition data dimension reduction visual analysis method based on semantic feature set
Pospiech et al. A descriptive big data model using grounded theory
CN108009847A (en) The method for taking out shop embedding feature extractions under scene
US11461337B2 (en) Attribute annotation for relevance to investigative query response
CN110992111A (en) Image mining method for applicant user based on big data
Sagarra et al. Assessing the asymmetric effects on branch rivalry of Spanish financial sector restructuring
CN105023041A (en) Data clustering method and B2B platform customer preference acquisition method and device
Siddhartha Digit recognition of MNIST handwritten using convolutional neural networks (CNN)
Sangole et al. Representing high-dimensional data sets as closed surfaces
Soutar et al. A benefit segmentation of the financial planning market
Lindahl et al. Use and perception of design for environment (DfE) in small and medium sized enterprises in Sweden
Hooper et al. Representations in accounting: the metaphor effect
Kim et al. Predicting debt default of P2P loan borrowers using self-organizing map
Kamal et al. Application of Cluster Analysis in Building Retirement Wealth Adequacy Profile: A Case Study in Malaysia
Wu et al. User portraits and investment planning based on accounting data
Hu Multivariate understanding of income and expenditure in United States households with statistical learning
Donaldson The Augmented Investment Management Industry
Li et al. Customer segmentation analysis based on SOM clustering
Risi et al. Research on Supply Chain Application based on Big Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant