CN111967973A - Bank client data processing method and device - Google Patents

Bank client data processing method and device Download PDF

Info

Publication number
CN111967973A
CN111967973A CN202010834016.8A CN202010834016A CN111967973A CN 111967973 A CN111967973 A CN 111967973A CN 202010834016 A CN202010834016 A CN 202010834016A CN 111967973 A CN111967973 A CN 111967973A
Authority
CN
China
Prior art keywords
data
machine learning
bank
sub
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010834016.8A
Other languages
Chinese (zh)
Inventor
徐晓健
童楚婕
李福洋
严洁
栾英英
彭勃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010834016.8A priority Critical patent/CN111967973A/en
Publication of CN111967973A publication Critical patent/CN111967973A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Abstract

The invention discloses a bank customer data processing method and a device, wherein the method comprises the following steps: obtaining bank customer data, the bank customer data comprising: personal data, transaction data, liability data or any combination thereof; classifying the bank client data according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data. The invention is convenient for processing the bank customer data and realizes the customer data classification with high accuracy and reliability.

Description

Bank client data processing method and device
Technical Field
The invention relates to the technical field of data analysis, in particular to a bank customer data processing method and device.
Background
In order to provide more targeted personalized service for each customer, a commercial bank needs to classify customer data, and classification results can be used for multiple scenes of financial service, such as accurate marketing based on customer groups, product promotion and the like.
With the increase of the number of customers of the commercial bank, the customer consumption behavior is extremely complex, and the data volume is larger and larger along with the time lapse, the existing clustering algorithm or the manual classification method can not meet the requirement of the commercial bank business on the classification result of the customer data, and the problems of poor classification accuracy and poor reliability exist.
Therefore, there is a need for a bank customer data processing scheme that can overcome the above problems.
Disclosure of Invention
The embodiment of the invention provides a bank client data processing method, which is used for processing bank client data and realizing high-accuracy and reliable client data classification and comprises the following steps:
obtaining bank customer data, the bank customer data comprising: personal data, transaction data, liability data or any combination thereof;
classifying the bank client data according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data.
The embodiment of the invention provides a bank customer data processing device, which is used for processing bank customer data and realizing high-accuracy and reliable customer data classification, and comprises the following components:
the data acquisition module is used for acquiring bank customer data, and the bank customer data comprises: personal data, transaction data, liability data or any combination thereof;
and the data classification module is used for classifying the bank client data according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the bank client data processing method when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program for executing the bank customer data processing method.
Compared with the scheme of classifying the customer data through a clustering algorithm or manually in the prior art, the embodiment of the invention obtains the bank customer data, and the bank customer data comprises the following steps: personal data, transaction data, liability data or any combination thereof; classifying the bank client data according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data. In addition, considering that the learning emphasis points of the features of each machine learning model are different, the embodiment of the invention sets a corresponding weight value for each trained machine learning model by using a genetic algorithm, and establishes the classification model according to a plurality of trained machine learning models and the set weight value corresponding to each trained machine learning model, thereby adaptively adjusting the weight value according to different business requirements, effectively improving the feature learning capability, excavating data hidden features as much as possible, and improving the accuracy and reliability of the classification result of the client data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a schematic diagram of a data processing method for bank customers according to an embodiment of the present invention;
FIG. 2 is a diagram of a data processing device of a bank client according to an embodiment of the present invention;
fig. 3 is a block diagram of a bank client data processing device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
First, terms referred to in the embodiments of the present application are described:
lightgbm model: the lightgbm model is a gradient boosting framework proposed by microsoft, using a tree-based learning algorithm. The model has the advantages of high speed and high precision, and can process large-scale data.
And (3) passenger group classification: the customer group classification refers to the behavior of classifying customers with similar behavior characteristics and value characteristics into the same group according to certain standards.
xgboost model: the xgboost model is a tree-based integrated learning method which uses a plurality of weak classifiers through a boost frame and takes a negative gradient as a learning strategy. The method shows excellent effect and efficiency in application practice, so that the method is widely advocated by the industry.
A neural network: the neural network is an algorithm model which simulates the behavior characteristics of the animal neural network and performs distributed parallel information processing. The model completes information processing by adjusting the mutual connection relationship among a large number of nodes inside.
Genetic algorithm: the genetic algorithm is an optimization algorithm for searching an optimal solution by simulating a natural evolution process, and the algorithm can automatically acquire and guide an optimized search space without a determined rule and adaptively adjust the search direction.
As mentioned above, as the number of customers of the commercial bank increases, the customer consumption behavior is extremely complex, and the data volume becomes larger and larger with the passage of time, the existing clustering algorithm or manual classification method has not been able to meet the requirement of the commercial banking business on the classification result of the customer data. The clustering-based customer group method cannot mine hidden features in the data, the data utilization rate is low, and the result reliability and accuracy are poor; the features that a single algorithm can learn are limited, so that the accuracy of the passenger group classification model based on the single algorithm is limited; the applicability of a classification method based on a single model is limited by the algorithm itself and cannot be well applied to all scenes.
In order to process bank customer data and achieve high-accuracy and reliable customer data classification, an embodiment of the present invention provides a bank customer data processing method, as shown in fig. 1, where the method may include:
step 101, obtaining bank customer data, wherein the bank customer data comprises: personal data, transaction data, liability data or any combination thereof;
and 102, classifying the bank client data according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data.
As shown in fig. 1, the embodiment of the present invention obtains the bank customer data, where the bank customer data includes: personal data, transaction data, liability data or any combination thereof; classifying the bank client data according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data. In addition, considering that the learning emphasis points of the features of each machine learning model are different, the embodiment of the invention sets a corresponding weight value for each trained machine learning model by using a genetic algorithm, and establishes the classification model according to a plurality of trained machine learning models and the set weight value corresponding to each trained machine learning model, thereby adaptively adjusting the weight value according to different business requirements, effectively improving the feature learning capability, excavating data hidden features as much as possible, and improving the accuracy and reliability of the classification result of the client data.
In specific implementation, bank customer data is obtained, and the bank customer data comprises: personal data, transaction data, liability data, or any combination thereof.
In an embodiment, the personal data may include: the system comprises a user, a mobile phone, a credit card holder, a credit card amount and a credit card credit line, wherein the user comprises one or any combination of sex data, age data, academic data, occupation data, account opening data, attribution data, a mobile phone number, first account opening time, deposit information, financing information, securities position taking information, the credit card holder number and the credit card credit line. The transaction data may include: the system comprises one or any combination of recent credit card transaction times, recent credit card transaction amount data, recent transfer times, recent transfer amount data, mobile banking registration time, mobile banking shopping data, payment information, mobile banking login times, mobile banking function click information and mobile banking residence time information. The property liability data may include loan pre-credit line information.
In the embodiment, after the bank customer data is obtained, the bank customer data is subjected to association division processing, and the association is performed according to the main keys provided by different data sources by using the user ID, so that data integration is realized. Specifically, the data integration is to integrate data from different sources and different content according to the number of the client. All of the collected data may be considered specific to each individual. And (4) setting the client as A, extracting all the bank client data of the client A from the corresponding data source, and storing the result.
In specific implementation, the bank client data are classified according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data.
In an embodiment, the plurality of trained machine learning models comprises: an xgboost machine learning model, a lightgbm machine learning model and a neural network model. The xgboost machine learning model, the lightgbm machine learning model and the neural network model are prior arts, and those skilled in the art can know the xgboost machine learning model, the lightgbm machine learning model and the neural network model by looking up data, and the invention is not described in detail. It will be understood by those skilled in the art that the above listed machine learning models are exemplary, and different machine learning models can be determined according to requirements when implemented, and all the related variations should fall into the scope of the present invention.
In an embodiment, the weight value corresponding to each trained machine learning model is set as follows: initializing a weight value corresponding to each trained machine learning model; and performing iterative updating on the weight value corresponding to each trained machine learning model by using a genetic algorithm for multiple times, wherein for each iterative updating, a first preset number of sub-generation data is obtained after the iterative updating, and the next iterative updating is performed on the weight value according to the first preset number of sub-generation data.
In this embodiment, for each iteration update, after the iteration update, a first preset number of sub-generation data is obtained, and performing a next iteration update on the weight value according to the first preset number of sub-generation data includes: for each iteration update, acquiring a first preset number of sub-generation data in the population after the iteration update; performing cross processing on the first preset number of sub-generation data to obtain a second preset number of sub-generation data; replacing a second preset number of sub-generation data in the next generation population with the second preset number of sub-generation data; and carrying out next iteration updating on the weight value according to the replaced sub-generation data in the next generation population.
In this embodiment, after each machine learning model is trained, a weight value starts to be set, and each machine learning model corresponds to one weight value. At the beginning, a group of weight values are randomly set, the machine learning model is used for processing data, then the difference between the output value and the target value of the machine learning model is used as a target function, and the genetic algorithm is used for continuously optimizing the weight to obtain the final weight. The genetic algorithm used in the embodiment of the invention is different from the traditional genetic algorithm, and for each iteration update, the first preset number of sub-generation data in the population is obtained after the iteration update; performing cross processing on the first preset number of sub-generation data to obtain a second preset number of sub-generation data; replacing a second preset number of sub-generation data in the next generation population with the second preset number of sub-generation data; and carrying out next iteration updating on the weight value according to the replaced sub-generation data in the next generation population. For example, after each iteration optimization is completed, 2 optimal filial generations with the highest fitness are selected, only the two optimal filial generations are subjected to cross operation to generate the 3 rd filial generation, the three filial generations are directly copied to the next generation population to replace the 3 filial generations with the lowest fitness, and finally, the final next generation population is obtained, and the optimization is continued on the basis of the population. The classification targets of the passenger groups with different service requirements are different, and the used characteristics are different, so that the self-adaptive classification algorithm selection is carried out according to different service requirements. The fusion weight of the model is automatically adjusted by utilizing a genetic algorithm, and the weight is adaptively adjusted according to different service requirements, so that the self-adaptation of the algorithm according to different service requirements is realized, and the application scene of the model is further expanded. It should be noted that the adaptive processing mentioned in this application refers to the processing of different tasks. The data of different tasks are different, and the relationship between the data and the result is also different. According to the embodiment of the invention, a mode of weighting and readjusting the result by a plurality of models is utilized, the learning characteristics of different models are different, and the result is adjusted by utilizing the weight, so that a larger adjustment space is given to the model. The traditional genetic algorithm only copies the descendant with the highest fitness to the next generation, and the rest is randomly generated, so that some high-quality descendants can be discarded, and the result and the algorithm performance are further influenced. The embodiment of the invention selects the first 2 filial generations with the highest fitness, generates a third filial generation by crossing and unchanging, and adds the three filial generations into the next optimized filial generation until the final optimization is completed.
In an embodiment, the bank customer data processing method further includes: after bank client data are obtained, cleaning the bank client data by utilizing a triple standard deviation detection algorithm, performing vacancy filling processing on the cleaned bank client data by utilizing a mean filling algorithm, and performing data vectorization processing on the bank client data after the vacancy filling processing; classifying the bank customer data according to the bank customer data and a pre-established classification model, wherein the classifying comprises the following steps: and classifying the bank client data according to the bank client data after the data vectorization processing and a pre-established classification model.
In this embodiment, after the bank client data is obtained, the bank client data is cleaned by using a triple standard deviation detection algorithm, where the abnormal data is that a small part of sample data and the total data often have a large difference in characteristics and the like due to human factors, accidental errors or the like. The triple standard deviation detection algorithm is mainly expressed by variance according to Chebyshev inequality, the dispersion degree of data and statistics, and when the data obey normal distribution, the probability that the data exceeds triple standard deviation is only 0.27%. We can refer to data that differ from the mean by more than three standard deviations in absolute value as outliers. For samples with missing information less than 20%, the data are filled by a method of filling missing values, namely, the average filling algorithm is used for filling the blank in the cleaned bank client data, and the samples with missing information more than 20% are directly removed. For example, mean-filling may be used to handle data loss. Due to the personal information of the user, the attributes in the merchant information have various expression forms, such as male and female gender values; the occupation value includes teachers, doctors, students and the like. For convenience of subsequent data mining processing, the data mining processing method can be expressed as a Vector Space Model (VSM), that is, data vectorization processing is performed on the bank customer data after the vacancy filling processing. The basic principle of the vector space model is to represent a user or a business by a series of attributes, and each attribute is taken as one dimension of a feature space coordinate system. Thus, each user or merchant diRepresented as a binary feature vector pattern: di=(<ti1,wi1>,<ti2,wi2>,...,<tiM,wiM>) wherein wikRepresenting a characteristic attribute tikThe weight of (c). In the vectorization process, after the characteristic attribute is determined, the vector space modelIt can be simplified to the weight vector form: di=(wi1,wi2,...,wiM). A commonly used attribute weight calculation method is a boolean weight method, which is the simplest weight definition method, and user information or merchant information is quantized into a 0, 1 vector. The boolean weight is to mark the presence or absence of a feature attribute by a boolean quantity 0, 1, if present, the vector dimension is marked as 1, if not, the vector dimension is marked as 0, and the formula is expressed as follows:
Figure BDA0002639018200000061
in the embodiment, the xgboost machine learning model, the lightgbm machine learning model and the neural network model are respectively subjected to feature engineering processing, and parameters of the xgboost machine learning model, the lightgbm machine learning model and the neural network model are continuously adjusted by utilizing preprocessed data until the models reach the optimal performance. Considering that the characteristics of a single algorithm which can be learned by the single algorithm are limited, the accuracy of the classification model of the passenger group based on the single algorithm is limited, and the final classification performance is influenced. In order to avoid this situation as much as possible, the embodiment of the present invention uses three algorithms, namely, an xgboost machine learning model, a lightgbm machine learning model, and a neural network, to perform the object group classification at the same time. The feature points learned by different algorithms are different, so that the method can excavate different hidden features in the data as far as possible, further improves the model performance, and has higher classification accuracy. And then calling a classification model to classify the object group. And weighting the results output by the xgboost machine learning model, the lightgbm machine learning model and the neural network model by using the adjusted weights, and obtaining a final passenger group classification result according to the weighted result. The learned characteristics of different models are different, and the different importance of the influence on the result is different, and after the weight is applied to the results output by different models, the influence of the important characteristics on the results can be improved, namely, the adaptive adjustment of the importance of the characteristics is realized, and the accuracy of the results is further improved. For example, assuming that the results of the xgboost machine learning model, the lightgbm machine learning model and the neural network model are x1, x2 and x3, x is a multidimensional vector, and the weights are k1, k2 and k3, the final result is k1 × x1+ k2 × x2+ k3 × x 3.
According to the embodiment of the invention, the multi-model fusion mode is adopted for carrying out passenger group classification, hidden features in data can be fully mined, the data utilization is high, and the result accuracy and reliability are greatly improved; the method has the advantages that the clients are classified simultaneously by utilizing a plurality of classification models, and the learned feature emphasis points of different algorithms are different, so that the hidden features in the data can be excavated as much as possible by the processing method provided by the invention, the model performance is more excellent, and the classification result is more accurate; the weight of each classification algorithm is automatically adjusted by utilizing a genetic algorithm, and the weight is adaptively adjusted according to different service requirements, so that the self-adaptation of the algorithm to different service requirements is realized. The method has the advantages that the customer group classification is automatically completed in a mode of directly processing the basic information and the transaction historical data of the customers, the use method is simple, the use is convenient and fast, the efficiency is high, and a large amount of time cost and labor cost can be saved; the method integrates the classification of the guest groups into an end-to-end process, can be popularized to other classification problems except the classification of the guest groups only by replacing corresponding data sets, and has wide application range and low popularization cost.
Based on the same inventive concept, the embodiment of the present invention further provides a bank customer data processing apparatus, as described in the following embodiments. Because the principles of solving the problems are similar to the bank customer data processing method, the implementation of the device can be referred to the implementation of the method, and repeated details are not repeated.
Fig. 2 is a block diagram of a bank customer data processing device according to an embodiment of the present invention, and as shown in fig. 2, the device includes:
a data obtaining module 201, configured to obtain bank customer data, where the bank customer data includes: personal data, transaction data, liability data or any combination thereof;
the data classification module 202 is configured to classify the bank client data according to the bank client data and a pre-established classification model, where the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data.
In one embodiment, the data classification module 202 is further configured to:
initializing a weight value corresponding to each trained machine learning model;
and performing iterative updating on the weight value corresponding to each trained machine learning model by using a genetic algorithm for multiple times, wherein for each iterative updating, a first preset number of sub-generation data is obtained after the iterative updating, and the next iterative updating is performed on the weight value according to the first preset number of sub-generation data.
In one embodiment, the data classification module 202 is further configured to:
for each iteration update, acquiring a first preset number of sub-generation data in the population after the iteration update;
performing cross processing on the first preset number of sub-generation data to obtain a second preset number of sub-generation data;
replacing a second preset number of sub-generation data in the next generation population with the second preset number of sub-generation data;
and carrying out next iteration updating on the weight value according to the replaced sub-generation data in the next generation population.
In one embodiment, the plurality of trained machine learning models comprises: an xgboost machine learning model, a lightgbm machine learning model and a neural network model.
In one embodiment, as shown in fig. 3, the bank customer data processing apparatus of fig. 2 further includes:
the preprocessing module 203 is configured to, after obtaining bank client data, perform cleaning processing on the bank client data by using a triple standard deviation detection algorithm, perform gap filling processing on the bank client data after the cleaning processing by using a mean value filling algorithm, and perform data vectorization processing on the bank client data after the gap filling processing;
the data classification module 202 is further configured to: and classifying the bank client data according to the bank client data after the data vectorization processing and a pre-established classification model.
In summary, in the embodiments of the present invention, bank customer data is obtained, where the bank customer data includes: personal data, transaction data, liability data or any combination thereof; classifying the bank client data according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data. In addition, considering that the learning emphasis points of the features of each machine learning model are different, the embodiment of the invention sets a corresponding weight value for each trained machine learning model by using a genetic algorithm, and establishes the classification model according to a plurality of trained machine learning models and the set weight value corresponding to each trained machine learning model, thereby adaptively adjusting the weight value according to different business requirements, effectively improving the feature learning capability, excavating data hidden features as much as possible, and improving the accuracy and reliability of the classification result of the client data.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (12)

1. A bank customer data processing method is characterized by comprising the following steps:
obtaining bank customer data, the bank customer data comprising: personal data, transaction data, liability data or any combination thereof;
classifying the bank client data according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data.
2. The bank customer data processing method according to claim 1, wherein the weight value corresponding to each trained machine learning model is set as follows:
initializing a weight value corresponding to each trained machine learning model;
and performing iterative updating on the weight value corresponding to each trained machine learning model by using a genetic algorithm for multiple times, wherein for each iterative updating, a first preset number of sub-generation data is obtained after the iterative updating, and the next iterative updating is performed on the weight value according to the first preset number of sub-generation data.
3. The bank customer data processing method according to claim 2, wherein for each iteration update, a first preset number of sub-generation data is obtained after the iteration update, and the next iteration update is performed on the weight value according to the first preset number of sub-generation data, and the method comprises the following steps:
for each iteration update, acquiring a first preset number of sub-generation data in the population after the iteration update;
performing cross processing on the first preset number of sub-generation data to obtain a second preset number of sub-generation data;
replacing a second preset number of sub-generation data in the next generation population with the second preset number of sub-generation data;
and carrying out next iteration updating on the weight value according to the replaced sub-generation data in the next generation population.
4. The bank customer data processing method according to claim 1, wherein the plurality of trained machine learning models comprises: an xgboost machine learning model, a lightgbm machine learning model and a neural network model.
5. The bank customer data processing method according to claim 1, further comprising: after bank client data are obtained, cleaning the bank client data by utilizing a triple standard deviation detection algorithm, performing vacancy filling processing on the cleaned bank client data by utilizing a mean filling algorithm, and performing data vectorization processing on the bank client data after the vacancy filling processing;
classifying the bank customer data according to the bank customer data and a pre-established classification model, wherein the classifying comprises the following steps: and classifying the bank client data according to the bank client data after the data vectorization processing and a pre-established classification model.
6. A bank customer data processing apparatus, comprising:
the data acquisition module is used for acquiring bank customer data, and the bank customer data comprises: personal data, transaction data, liability data or any combination thereof;
and the data classification module is used for classifying the bank client data according to the bank client data and a pre-established classification model, wherein the classification model is pre-established according to a plurality of trained machine learning models and a set weight value corresponding to each trained machine learning model, the weight value corresponding to each trained machine learning model is set by using a genetic algorithm, and each machine learning model is trained according to bank client historical data.
7. The bank customer data processing apparatus of claim 6 wherein the data classification module is further to:
initializing a weight value corresponding to each trained machine learning model;
and performing iterative updating on the weight value corresponding to each trained machine learning model by using a genetic algorithm for multiple times, wherein for each iterative updating, a first preset number of sub-generation data is obtained after the iterative updating, and the next iterative updating is performed on the weight value according to the first preset number of sub-generation data.
8. The bank customer data processing apparatus of claim 7 wherein the data classification module is further to:
for each iteration update, acquiring a first preset number of sub-generation data in the population after the iteration update;
performing cross processing on the first preset number of sub-generation data to obtain a second preset number of sub-generation data;
replacing a second preset number of sub-generation data in the next generation population with the second preset number of sub-generation data;
and carrying out next iteration updating on the weight value according to the replaced sub-generation data in the next generation population.
9. The bank customer data processing apparatus according to claim 6, wherein the plurality of trained machine learning models comprises: an xgboost machine learning model, a lightgbm machine learning model and a neural network model.
10. The bank customer data processing apparatus according to claim 6, further comprising:
the preprocessing module is used for cleaning the bank client data by utilizing a triple standard deviation detection algorithm after the bank client data are obtained, performing vacancy filling processing on the cleaned bank client data by utilizing a mean filling algorithm, and performing data vectorization processing on the bank client data after the vacancy filling processing;
the data classification module is further to: and classifying the bank client data according to the bank client data after the data vectorization processing and a pre-established classification model.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 5 when executing the computer program.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 5.
CN202010834016.8A 2020-08-18 2020-08-18 Bank client data processing method and device Pending CN111967973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010834016.8A CN111967973A (en) 2020-08-18 2020-08-18 Bank client data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010834016.8A CN111967973A (en) 2020-08-18 2020-08-18 Bank client data processing method and device

Publications (1)

Publication Number Publication Date
CN111967973A true CN111967973A (en) 2020-11-20

Family

ID=73389621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010834016.8A Pending CN111967973A (en) 2020-08-18 2020-08-18 Bank client data processing method and device

Country Status (1)

Country Link
CN (1) CN111967973A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971162A (en) * 2014-04-04 2014-08-06 华南理工大学 Method for improving BP (back propagation) neutral network and based on genetic algorithm
CN108830292A (en) * 2018-05-08 2018-11-16 西北大学 Data classification model optimization method and classification method
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance
CN111523604A (en) * 2020-04-27 2020-08-11 中国银行股份有限公司 User classification method and related device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971162A (en) * 2014-04-04 2014-08-06 华南理工大学 Method for improving BP (back propagation) neutral network and based on genetic algorithm
CN108830292A (en) * 2018-05-08 2018-11-16 西北大学 Data classification model optimization method and classification method
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance
CN111523604A (en) * 2020-04-27 2020-08-11 中国银行股份有限公司 User classification method and related device

Similar Documents

Publication Publication Date Title
CN109408731B (en) Multi-target recommendation method, multi-target recommendation model generation method and device
Douzas et al. Effective data generation for imbalanced learning using conditional generative adversarial networks
US11741361B2 (en) Machine learning-based network model building method and apparatus
CN109496322B (en) Credit evaluation method and device and gradient progressive decision tree parameter adjusting method and device
Karim et al. Decision tree and naive bayes algorithm for classification and generation of actionable knowledge for direct marketing
US20190340533A1 (en) Systems and methods for preparing data for use by machine learning algorithms
CN108898479B (en) Credit evaluation model construction method and device
WO2021164382A1 (en) Method and apparatus for performing feature processing for user classification model
CN111967971B (en) Bank customer data processing method and device
CN109582714B (en) Government affair item data processing method based on time attenuation association
CN111178399A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN112700274A (en) Advertisement click rate estimation method based on user preference
CN112395487A (en) Information recommendation method and device, computer-readable storage medium and electronic equipment
CN111178902B (en) Network payment fraud detection method based on automatic feature engineering
CN111984842B (en) Bank customer data processing method and device
CN111539444B (en) Gaussian mixture model method for correction type pattern recognition and statistical modeling
CN111667307A (en) Method and device for predicting financial product sales volume
CN116503158A (en) Enterprise bankruptcy risk early warning method, system and device based on data driving
CN113822390B (en) User portrait construction method and device, electronic equipment and storage medium
CN111967973A (en) Bank client data processing method and device
CN117194966A (en) Training method and related device for object classification model
CN110084376B (en) Method and device for automatically separating data into boxes
CN110297977B (en) Personalized recommendation single-target evolution method for crowd funding platform
CN113656707A (en) Financing product recommendation method, system, storage medium and equipment
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination