CN111046111A

CN111046111A - Data processing method and terminal equipment

Info

Publication number: CN111046111A
Application number: CN201911084175.4A
Authority: CN
Inventors: 李炜; 李挺; 崔文豪
Original assignee: Shanghai Zhuoxue Technology Co ltd
Current assignee: Shanghai Zhuoxue Technology Co ltd
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2020-04-21

Abstract

The invention discloses a data processing method and terminal equipment, wherein the method comprises the following steps: s1: acquiring user information, clustering, and marking the unpurchased user information with the similarity larger than a set threshold as first data by a computer program through calculating the similarity; s2: acquiring information of historical financial products and pushed financial products, calculating similarity by a computer program, and marking user information of purchasing the historical financial products and not purchasing the pushed financial products with the similarity larger than a set threshold as second data; s3: screening user information of the financial products in a database by the computer program, associating the financial products with the pushed financial products, screening user information of which the association degree is greater than a set threshold value and the pushed financial products are not purchased, and marking the user information as third data; s4: and the computer program combines and deduplicates the obtained first data, second data and third data to obtain final data information.

Description

Data processing method and terminal equipment

Technical Field

The invention relates to a data processing method, relates to the field of financial marketing, and particularly relates to a data processing method based on a multi-dimensional hybrid algorithm and a terminal device.

Background

With the development of market economy and the general improvement of financial consciousness of people, the number of customers of each large financial institution is increased sharply, how to stabilize old customers and attract new customers becomes a key problem, and potential customer groups are mined based on the pushed financial products and are concerned by financial institutions. Therefore, various strategies are proposed by various large financial institutions, a plurality of financial products are pushed out, and potential customer groups are inevitably mined based on the characteristics of the financial products.

Based on project practice experience, if the conventional classical single recommendation algorithm is used in a product recommendation scheme of a financial institution, on one hand, the calculation amount is large, and the recommendation efficiency is low; on the other hand, the method is based on a classical single recommendation method, so that the potential customer group is not sufficiently mined. In view of the above, a data processing method is provided.

Disclosure of Invention

The invention aims to provide a data processing method, which combines a database and a data processing module to perform comprehensive calculation and data processing from multiple dimensions to obtain final data information.

In order to realize the task, the invention adopts the following technical scheme:

a method of data processing comprising the steps of:

s1: acquiring user information in a database, clustering by a computer program according to whether a user purchases a financial product, calculating the similarity of a purchasing user and a non-purchasing user by the computer program through calculating characteristic parameters between the purchasing user and the non-purchasing user after clustering, and marking the non-purchasing user information with the similarity larger than a set threshold value as first data;

s2: acquiring information of historical financial products and pushed financial products in a database, calculating the similarity of the historical financial products and the pushed financial products by a computer program through characteristic parameters between the financial products, and marking user information of purchasing the historical financial products and not purchasing the pushed financial products with the similarity larger than a set threshold as second data;

s3: screening user information of the financial products in a database by the computer program, associating the financial products with the pushed financial products, screening user information of which the association degree is greater than a set threshold value and the pushed financial products are not purchased, and marking the user information as third data;

s4: and the computer program combines and deduplicates the obtained first data, second data and third data to obtain final data information.

As a further improvement of the present invention, the user information in said step S1 includes user ID, user transaction information, user asset information, and user held financial product information.

As a further improvement of the invention, the characteristic parameters are specifically established by a user wide table for the computer program through user ID, user transaction information, user asset information and information of financial products held by the user.

As a further improvement of the present invention, the establishment of the user wide table specifically includes:

and the computer program associates the user transaction information, the user asset information and the financial product information held by the user through the user ID to form a user wide table.

As a further improvement of the present invention, the present invention further includes a supplement of the user wide table, wherein the supplement specifically includes:

and supplementing the user transaction information, the user asset information and the information missing from the financial product information held by the user by adopting a value of 0 or 1.

As a further improvement of the present invention, the clustering in step S1 is specifically to use Kohonen algorithm modeling to cluster the users.

As a further improvement of the present invention, the step S2 is preceded by the construction of a financial product feature table, the financial product width table includes a historical financial product feature table and a pushed financial product feature table, and the computer program calculates the similarity between the historical financial product feature table and the pushed financial product feature table according to the feature parameters between the two financial product feature tables.

As a further improvement of the invention, the financial product characteristic table comprises the product type, purchase condition, product characteristics and applicable population of the financial product.

As a further improvement of the present invention, the step S3 specifically includes:

the computer program screens out user information of the financial products in the database, associates the support degree and the confidence degree of the financial products held by the user information with the pushed financial products, screens out user information of which the association degree is greater than a set threshold and the pushed financial products are not purchased, and marks the user information as third data.

The invention also discloses a terminal device, which comprises a memory and a processor, wherein the processor is stored with a computer program capable of running on the processor, and when the processor executes the computer program, the following steps are realized:

Compared with the prior art, the invention has the following technical characteristics:

1. the data processing method provided by the invention is based on project practice experience, potential customer groups are mined from different angles and different methods, and users are clustered at first, so that the calculation amount is smaller and the efficiency is higher compared with single financial product recommendation based on user similarity; meanwhile, in the later period, compared with a single recommendation algorithm based on product similarity or a recommendation algorithm based on customer similarity, potential customer groups are mined more fully and accurately by a mixed recommendation algorithm of different methods from different angles such as the purchase users of historical financial products and the correlation degree between the historical financial products and the pushed financial products;

2. because the potential customer groups are excavated more fully, the accuracy is higher, and when the financial institution customer manager implements accurate marketing, the marketing strategy can be formulated more pertinently, so that the marketing effect is improved, and the financial institution income is increased.

Drawings

FIG. 1 is a flow chart in example 1 provided by the present invention;

FIG. 2 is a flow chart in example 2 provided by the present invention;

FIG. 3 is a partial flowchart of example 2 of the present invention;

FIG. 4 is a second partial flowchart of embodiment 2 of the present invention;

fig. 5 is a third partial flowchart in embodiment 2 of the present invention.

Detailed Description

The invention is further illustrated with reference to the following figures and detailed description.

Example 1

Referring to fig. 1, the data processing method of the present invention includes the following steps:

specifically, the user information includes a user ID, user transaction information, user asset information, and user-owned financial product information.

In this embodiment, the characteristic parameter is specifically a user wide table established by the computer program through the user ID, the user transaction information, the user asset information, and the information of the financial product held by the user.

By establishing the user wide table, more user information can be obtained in time, and further, the data information is more comprehensive when data is processed.

The establishment of the user wide table specifically comprises the following steps:

After the association, the ID of each user corresponds to the corresponding transaction information, asset information and information of the held financial product, so that the search is convenient.

The method further comprises the supplement of the user wide table, wherein the supplement specifically comprises the following steps:

And a user wide table is supplemented, so that the data operation problem caused by data loss is avoided.

The clustering specifically is to utilize a Kohonen algorithm for modeling and clustering the users.

specifically, the step S2 is preceded by constructing a financial product feature table, the financial product width table includes a historical financial product feature table and a pushed financial product feature table, and the computer program calculates the similarity between the historical financial product feature table and the pushed financial product feature table according to the feature parameters between the two financial product feature tables.

The financial product feature list comprises product types, purchase conditions, product characteristics and applicable groups of financial products.

the computer program screens out user information of the financial products in the database, associates the support degree and the confidence degree of the financial products with the pushed financial products, screens out user information of which the association degree is greater than a set threshold and the pushed financial products are not purchased, and marks the user information as third data.

In this embodiment, a terminal device is further disclosed, which specifically includes a memory and a processor, where a computer program operable on the processor is stored in the processor, and is characterized in that when the processor executes the computer program, the following steps are implemented:

s3: the computer program screens out user information of the financial products in the database, associates the financial products with the pushed financial products, screens out user information of which the association degree is greater than a set threshold value and the pushed financial products are not purchased, and marks the user information as third data.

Example 2

Referring to fig. 2-5, for a specific data processing method, in this embodiment, the following steps are specifically included:

step 1, extracting and collecting basic information tables of customers including customer ages, sexes, contact ways, customer grades, family addresses, child conditions, risk grades, whether blacklists exist and the like from a data warehouse deployed by a bank; the system comprises a transaction behavior table including the total transaction amount, the total transaction times, the monthly-daily-average transaction amount and the like of a client, an information table including the total time point assets, the monthly-daily-average assets, the yearly-daily-average assets, the nearly three monthly-daily-average assets and the like of the client, a held product information table including the number of products currently held by the client, the held product amount, the held product name and the like, and a held product basic information table;

step 2, associating a customer basic information table, a customer transaction behavior table, a customer asset information table and a customer holding product information table according to customer IDs by utilizing an sql language to form a customer width table;

and 3, cleaning abnormal values of the data based on the constructed wide table, and specifically: removing abnormal behavior customers such as blacklist customers, recent multiple default customers, sleeping customers, zero-asset customers and customers with missing contact information;

specifically, when a customer wide table is constructed, a customer default number field is added, and when data filtering nodes of a Tempo big data analysis platform are used for filtering, the data are directly displayed.

Step 4, filling missing values in numerical fields such as time point assets, monthly and daily average assets, transaction amount, transaction times and the like by using zero values respectively based on the wide table processed in the step 3; filling missing values in the age field of the client by using a median; filling gender fields by using a mode and carrying out data attribute transformation processing;

filling missing values according to the associated wide table, specifically: filling 0 values in numerical value fields such as asset and held product digital segments; filling the age field with median; and (3) carrying out data attribute transformation on the gender field: male is replaced with 0 and female is replaced with 1, and the missing value is filled according to mode; specifically, based on practical experience, for financial product recommendation, the gender of a customer has a certain influence on the preference of the pushed product, and the missing value based on the gender field is filled by using a mode, so that the recommendation accuracy can be improved.

Step 5, exploring the common characteristics of the customer group purchasing the financial product from the aspects of basic attributes of customers, assets of customers, income and expenditure conditions, product holding conditions and the like by using a statistical method based on the exploration thought of the data exploration stage according to the processed data wide table obtained in the step 4;

specifically, from the basic information of the client: extracting common characteristics from angles such as gender, age, education level, marital status and the like; from the customer property information: the client asset class, AUM day average and other common characteristics related to the assets; from the transaction behavior table: detecting common characteristics of the passenger group traffic behaviors; finding out common characteristics such as behaviors of products held by the guest group from the information table of the held products;

step 6, according to the common characteristics of the customer groups obtained in the step 5, respectively clustering the purchasing customers and the non-purchasing customers by utilizing Kohonen algorithm modeling from the aspects of basic attributes of the customers, assets of the customers, the conditions of balance and products holding, dividing the purchasing customer groups and the non-purchasing customer groups into different clusters, respectively calculating the similarity between the centers of the non-purchasing clusters and the centers of the purchased customers through a similarity measurement scheme (Euclidean distance), and taking the customer groups with the similarity larger than a set threshold value of 0.85 as first data;

specifically, the euclidean distance is calculated as follows:

sim = 1/sqrt ((Xi 1-Xj 1) ^2+ (Xi 2-Xj 2) ^2+. 11. + (Xin-Xjn) ^ 2), wherein Xi, Xj represent two financial product sample data, Xi1.. in represents the characteristic data of financial product i.

In this embodiment, the obtaining of the first data specifically includes:

clustering the purchasing passenger groups by using a Kohonen clustering algorithm to obtain several subclasses, wherein each subclass corresponds to a corresponding clustering center; correspondingly operating the unpurchased buyer group to obtain a corresponding clustering center; then calculating the similarity between different clustering centers of the buyer group and different clustering centers of the unpurchased buyer group, and taking the buyer group corresponding to the unpurchased buyer group clustering center with the similarity greater than a set threshold value with the similarity to the buyer group clustering center as first data.

Step 7, calculating the similarity between the historical products and the current pushed products pushed by the financial institution through a similarity measurement scheme (Euclidean distance) based on the characteristic information table of the products pushed by the bank;

specifically, the similarity between the product held by the customer and the pushed product is determined according to the product type, purchase condition, product characteristics and applicable population.

8, screening historical products with the similarity greater than a set threshold value of 0.85 with the current financial product according to the similarity obtained in the step 7;

specifically, based on the product progress information table, calculating the similarity between the historical product and the current product, which are pushed out by the financial institution, through a similarity measurement scheme (the reciprocal of the Euclidean distance); according to the obtained similarity, screening historical products with the similarity larger than a set threshold value with the current financial product;

step 9, screening the historical product obtained in the step 8 but not purchasing the current financial product customer group as second data;

step 10, digging out a product with the highest support degree and confidence degree with the currently-pushed financial product by utilizing an association rule Apriori algorithm based on a product behavior data table supported by a client;

step 11: screening out the customer groups who purchase the products obtained in the step 10 but do not purchase the current financial products to obtain third data;

step 12: and combining and removing the data based on the first data, the second data and the third data obtained in the steps 6, 9 and 11 to obtain a potential customer group for accurate marketing.

Based on the wide table constructed above, the step of removing abnormal behavior clients by using data filtering nodes of the Tempo big data analysis platform specifically includes:

and blacklist customers, recently multiple defaulting customers, sleeping customers, zero-asset customers and customers with missing contact information are removed, so that the mining of the customer group is more valuable.

The processing of missing value filling and data attribute transformation based on the wide table processed in step 3 includes:

filling missing values according to the associated wide table, specifically: filling 0 values in numerical value fields such as asset and held product digital segments; filling the age field with median; and (3) carrying out data attribute transformation on the gender field: male is replaced with 0 and female is replaced with 1, and the missing value is filled according to mode;

specifically, the step 4 of exploring the common characteristics of the customer group purchasing the financial product based on the exploration thought of the data exploration phase according to the processed data wide table specifically includes:

screening out a customer group purchasing the current pushed financial products, and obtaining the basic information of the customers: extracting common characteristics from angles such as gender, age, education level, marital status and the like; from the customer property information: the client asset class, AUM day average and other common characteristics related to the assets; from the transaction behavior table: detecting common characteristics of the passenger group traffic behaviors; finding out common characteristics such as behaviors of products held by the guest group from the information table of the held products;

specifically, the customer group common characteristics obtained in step 5 are respectively clustered on the purchasing customer and the non-purchasing customer, the similarity between the cluster centers of the non-purchasing customers and the cluster centers of the purchasing customers is calculated through a similarity measurement scheme, and the customer group with the similarity greater than a set threshold value of 0.85 is taken as a potential customer group 1, wherein the similarity measurement scheme is as follows:

the similarity between customers is calculated using the euclidean distance.

In the embodiment, a Kohonen clustering algorithm is used for clustering purchasing customer groups to obtain several subclasses, and each subclass corresponds to a corresponding clustering center; correspondingly operating the unpurchased buyer group to obtain a corresponding clustering center; and then calculating the similarity between different clustering centers of the buyer purchasing group and different clustering centers of the buyer unpurchased group, and taking the group corresponding to the buyer unpurchased group clustering center with the similarity greater than a set threshold as a potential recommended group.

Specifically, the product-based information table calculates the similarity between the historical product and the current pushed product, which are pushed out by the financial institution, by using a similarity measurement scheme (euclidean distance), and screens out the historical product obtained in the step 8 but not the current pushed financial product as the potential customer group 2, wherein the similarity measurement scheme is as follows:

calculating similarity between financial products using Euclidean distance

Specifically, the product with the highest support degree and the highest confidence degree of the current financial product is mined based on the product behavior data supported by the customer and by using an association rule algorithm, and a customer group which purchases the product obtained in the step 10 but does not purchase the current financial product is screened out as a potential customer group 3, wherein the association rule algorithm is as follows:

and obtaining the product with the maximum support degree and confidence degree with the current financial product through an association rule Apriori algorithm.

Through the data processing method provided by the application, as shown in fig. 1, a plurality of classical recommendation algorithms are mixed and applied to the field of financial product recommendation, in addition, in consideration of financial institutions, the number of customers is huge, so that the similarity calculation is overlarge, the recommendation timeliness is poor, and when the clustering algorithm is applied to the similarity calculation process, compared with the potential customer base mining by a single angle and a single method, the potential customer base mining method has the advantages of smaller calculation amount and higher efficiency, and the potential customer base mining method is more sufficient and accurate.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A data processing method, characterized by comprising the steps of:

2. A data processing method according to claim 1, wherein the user information in step S1 includes user ID, user transaction information, user asset information, and user held financial product information.

3. A data processing method according to claim 2, wherein the characteristic parameter is specifically a user wide table created by a computer program through user ID, user transaction information, user asset information and user held financial product information.

4. A data processing method according to claim 3, wherein the establishing of the user wide table specifically comprises:

5. The data processing method according to claim 4, further comprising a user wide table supplement, wherein the supplement is specifically:

6. The data processing method according to claim 1, wherein the clustering in step S1 is performed on the users by using Kohonen algorithm modeling.

7. The data processing method according to claim 1, wherein the step S2 is preceded by constructing a financial product characteristics table, wherein the financial product width table comprises a historical financial product characteristics table and a pushed financial product characteristics table, and the computer program calculates the similarity between the two financial product characteristics tables according to the characteristic parameters.

8. The data processing method of claim 7, wherein the financial product characteristics table comprises product types, purchase conditions, product characteristics, and applicable groups of financial products.

9. The data processing method according to one of claims 1 to 8, wherein the step S3 is specifically:

10. A terminal device comprising a memory and a processor, wherein the processor stores a computer program operable on the processor, and wherein the processor executes the computer program to perform the steps of: