CN112270595A

CN112270595A - Data reconciliation decision method, device, server and storage medium

Info

Publication number: CN112270595A
Application number: CN202011080428.3A
Authority: CN
Inventors: 杨国为; 张凡龙; 黄璞; 万鸣华; 杨章静; 詹天明
Original assignee: NANJING AUDIT UNIVERSITY
Current assignee: NANJING AUDIT UNIVERSITY
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-01-26

Abstract

The invention belongs to the technical field of financial services, and discloses a method, a device, a server and a storage medium for data reconciliation decision, which comprise the following steps: establishing a user classification model: constructing a user consumption classification model by taking a decision tree algorithm as a core, and excavating consumption characteristics of different types of consumption groups; data preprocessing: solving the problems of fragmentation and incompleteness by adopting methods such as aggregation, average value replacement and the like, deleting redundant data, and generating a consumption data record set meeting the classification mining requirement; and (3) generating a personalized statement: after the consumption condition of each cardholder is analyzed by utilizing the plurality of classification models established in the prior art, according to the comprehensive information such as the contact between different client groups and advertisement types, the ranking of the cardholder in group classification, the number of advertisement distribution copies and time, and the like, targeted advertisement distribution is carried out on the user through a personalized recommendation algorithm, a personalized bill generation model of the credit card client is constructed, and a personalized statement of account is generated for each client.

Description

Data reconciliation decision method, device, server and storage medium

Technical Field

The invention belongs to the technical field of financial services, and particularly relates to a data reconciliation decision method, a device, a server and a storage medium.

Background

At present: generally, after a cardholder applies for a credit card, there is little initiative in contacting the bank other than depositing or withdrawing and changing the card. At this time, the bill sent to the cardholder by the bank every month becomes a carrier for the bank to communicate with the cardholder. The statement of account is a document that is of high interest to the user and is to be persisted, and is not a mere accessory to a credit card, but rather an important communication medium between the bank and the user. It is characterized in that: users receive monthly periodic, most effective marketing and service promotional media, high reading rates, and feedback rates increase year by year.

It has become a necessary service to provide customers with mailed statements, but it has been necessary for banks to send statements to customers on a scheduled basis, both from a talent and a physical source. In the face of huge cost expenditure, the advertising revenue cannot be ignored, and the method is an important way for compensating the cost. The billing-oriented user is a large, potential consumer group whose advertising value is difficult to gauge. According to research of various enterprises, publicity in this way has better effect than other media and is easier to be accepted by users. The business advertisement is mainly printed on the back or blank of the bill to introduce business, tariff conditions, marketing measures, questionnaire survey and the like, so that the effects of communicating users and promoting consumption can be achieved, and the business propaganda cost can be saved. Due to the characteristics of wide related areas, many times of delivery, high advertising and advertising hit rate and the like of bank bills, merchants closely related to users, such as mobile phones, communication terminals, household appliances, tourism, catering, real estate, home services and the like, gradually pay attention to the good carrier.

Through the above analysis, the problems and defects of the prior art are as follows: the old investment of manpower and material resources is needed for sending the statement to the user according to the date, and the resource waste is large.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method, a device, a server and a storage medium for data reconciliation decision.

The invention is realized in such a way that a data reconciliation decision method comprises the following steps:

step one, collecting and storing consumption data of users in a big data network through a data collection module;

the collecting consumption data of the users in the big data network through the data collecting module comprises the following steps:

(1.1) counting the consumption characteristic vector of each user according to records in the network data set to form a rough data set;

(1.2) screening out consumption characteristic vectors corresponding to all known first-class objects from the rough data set, and filtering the screened consumption characteristic vectors to obtain consumption sample data;

(1.3) carrying out the recoding processing of missing values, singular values and discrete character type fields on each field in the acquired sample data;

(1.4) carrying out normalization processing on each field in the sample data in a z-score mode, and carrying out discretization processing on each field in the sample after the normalization processing;

(1.5) constructing a regression model based on the processed consumption sample data, and then determining whether each of all known second-class target objects potentially belongs to the first-class target object using the constructed regression model;

step two, preliminarily classifying the acquired data to obtain required large-class data information;

the preliminary classifying the collected data comprises:

(2.1) obtaining a randomly generated rice population comprising a plurality of rice individuals;

(2.2) acquiring the collected related consumption data, and calculating the fitness value of each rice individual according to the data to be consumed;

(2.3) dividing the plurality of rice individuals into maintainer line individuals, sterile line individuals and restorer line individuals according to the fitness value;

(2.4) obtaining the optimal individual for hybridization generated in the process of hybridizing the maintainer line individual and the sterile line individual;

(2.5) obtaining an optimal selfing individual generated in the selfing process of the individual of the restorer line;

(2.6) obtaining an individual with the optimal fitness value in the optimal individual for hybridization and the optimal individual for self-crossing;

(2.7) using the individual with the optimal fitness value as a clustering center to classify the data to be classified; eliminating data in sterile line individuals to obtain required large-scale data information;

step three, data preprocessing: solving the problems of fragmentation and incompleteness by adopting methods such as aggregation, average value replacement and the like, deleting redundant data, and generating a consumption data record set meeting the classification mining requirement;

the specific method for deleting the redundant data comprises the following steps:

s11, dividing the data blocks of the collected data objects by using a fixed length algorithm in each logic channel by using different data block lengths;

s12, finding the first redundant data block of the current data object for the data blocks preliminarily classified in step two based on the criterion used in each logical channel and the data blocks divided for the current data object in the logical channel using the criterion.

S13, eliminating an overlapping portion existing between two or more first redundant data blocks based on the offset and the length of the first redundant data block found in each of the plurality of logical channels;

s14, deleting the data of the current data object by deleting a second redundant data block, wherein the second redundant data block comprises the first redundant data block after the overlapping part is eliminated;

receiving the required big data formed by cleaning the redundant data or temporarily stored by the big data management module according to a preset management strategy;

uploading the required big data to a big data server in a storage module, storing the required big data in blocks in the big data server, and distinguishing the subclasses again;

step six, establishing a user classification model: constructing a user consumption classification model by taking a decision tree algorithm as a core, and excavating consumption characteristics of different types of consumption groups;

step seven, generating a personalized statement: analyzing the consumption condition of each card holder by using the established classification model;

and step eight, according to the relation between different client groups and the advertisement types, the ranking of cardholders in group classification, the advertisement distribution number and time and other comprehensive information, carrying out targeted advertisement distribution on the users through a personalized recommendation algorithm, constructing a personalized bill generation model of the credit card clients, and generating personalized statement bills for each client.

Further, in the first step, when the consumption data of the users in the big data network are collected, the customers are classified according to different geographic angles, customer incomes, knowledge backgrounds and education degrees of the customers and different financing and life styles owned by different age groups.

Further, in step three, after data preprocessing, classifying the customer consumption modes of the credit card customers based on the decision tree according to the consumption information of the card customers, wherein the specific classification method comprises the following steps:

s21, selecting a classification keyword, selecting the total annual consumption amount of the client as a keyword for classifying the client, and taking the total amount of money consumed by the client through a credit card in one year as a standard for judging the consumption capability of the client;

s22, identifying the client category;

s23, selecting training samples to construct a classification model;

and S24, analyzing the decision tree classification result.

Further, in the sixth step, after constructing the user consumption classification model, firstly, parameter optimization is performed on the classification model, and the adopted specific method comprises the following steps:

s31, determining the number of parameters of the parameters constructed in the classification model, and generating parameter correlation vectors with the set number of dimensions as the number of the parameters;

s32, initializing each parameter related vector to obtain a set number of initial parameter related vectors containing initial component information;

s33, iteratively updating each initial parameter related vector according to a set updating strategy to obtain a target parameter related vector containing global optimal component information;

and S34, determining the optimal parameter value of each construction parameter according to the global optimal component information.

Another object of the present invention is to provide an apparatus for data reconciliation decision, comprising:

the data acquisition module is connected with the central control and processing module and is used for acquiring customer information and consumption data of users in the big data network;

the preliminary classification module is connected with the central control and processing module and is used for preliminarily classifying the acquired data to form required large-class data information;

the data preprocessing module is connected with the central control and processing module and is used for preprocessing the large-scale data information, deleting redundant data and generating a consumption data record set meeting the classification mining requirement;

the central control and processing module is connected with the data acquisition module, the primary classification module, the data preprocessing module, the secondary classification module, the model establishing module and the data generating module and is used for processing the acquired data and performing coordination control on each module according to a processing result and preset parameters;

the secondary classification module is connected with the central control and processing module and is used for storing the required big data in blocks in the big data server and distinguishing the subclasses again;

the model building module is connected with the central control and processing module and used for constructing a user consumption classification model and mining consumption characteristics of different types of consumption groups;

and the data generation module is connected with the central control and processing module and is used for constructing a personalized bill generation model of the credit card client and generating a personalized statement bill.

Further, the preliminary classification module includes:

the key information extraction unit is used for extracting different keywords as classification bases;

the data block dividing unit is used for dividing the data block of the current data object by using different standards in each of the plurality of logical channels;

and the classified storage unit is used for respectively packaging and storing the divided data blocks and naming the keywords.

Further, the data preprocessing module comprises:

a first redundant data block determining unit, configured to find, in each logical channel, one or more first redundant data blocks of a current data object based on data blocks divided by the current data object in the logical channel, respectively;

and the data de-duplication unit is used for de-duplicating the current data object for all the first redundant data blocks found by the first redundant data block determination unit.

An overlap elimination unit for eliminating an overlap existing between two or more first redundant data blocks based on an offset and a length of the first redundant data block found in each of the plurality of logical channels.

Further, the central control and processing module comprises:

the parameter presetting unit is used for presetting and inputting control parameters through external input equipment;

the data processing unit is used for processing and analyzing the acquired data by a user according to preset parameters;

and the control instruction generating unit is used for generating a control instruction according to the processing result and sending the control instruction to different controlled modules.

By combining all the technical schemes, the invention has the advantages and positive effects that: the invention takes the personalized financial statement as a research object, provides a personalized service solution based on the data mining technology, and the research result can be applied to the processing of credit card statements, the statement processing of various service industries such as telecommunication and network, and the like, and opens up a new idea for providing personalized service for other industries; meanwhile, the problem of repeatability in credit card consumption data is solved. The algorithm mainly selects important attributes to carry out multiple independent basic neighbor sorting, N times of basic neighbor sorting needs to be independently operated for N specified important attributes, and meanwhile, the window size needs to be continuously adjusted according to the relation between the similarity and the threshold, so that the operation needs to take longer time, and the matching efficiency of the algorithm is lower.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for data reconciliation decision according to an embodiment of the present invention.

Fig. 2 is a flowchart of a method for eliminating duplicate records according to an embodiment of the present invention.

Fig. 3 is a flowchart for classifying the amount of the fee according to the embodiment of the present invention.

Fig. 4 is a flowchart of a specific method for firstly performing parameter optimization on a classification model after constructing a user consumption classification model according to an embodiment of the present invention.

FIG. 5 is a schematic diagram of a server structure provided by an embodiment of the present invention;

in fig. 5: 1. a data acquisition module; 2. a preliminary classification module; 2. a data preprocessing module; 3. a central control and processing module; 5. a secondary classification module; 6. a model building module; 7. and a data generation module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides a method, an apparatus, a server and a storage medium for data reconciliation decision, which are described in detail below with reference to the accompanying drawings.

As shown in fig. 1, a method for data reconciliation decision provided in the embodiment of the present invention includes the following steps:

s101, collecting and storing consumption data of users in a big data network through a data collection module;

s102, preliminarily classifying the acquired data to obtain required large-class data information;

s103, data preprocessing: solving the problems of fragmentation and incompleteness by adopting methods such as aggregation, average value replacement and the like, deleting redundant data, and generating a consumption data record set meeting the classification mining requirement;

s104, receiving the required big data formed by redundant data cleaning or temporarily stored by a big data management module according to a preset management strategy;

s105, uploading the required big data to a big data server in a storage module, storing the required big data in the big data server in a blocking manner, and distinguishing the subclasses again;

s106, establishing a user classification model: constructing a user consumption classification model by taking a decision tree algorithm as a core, and excavating consumption characteristics of different types of consumption groups;

s107, generating a personalized statement: analyzing the consumption condition of each card holder by using the established classification model;

and S108, according to the relation between different client groups and advertisement types, the ranking of cardholders in group classification, the advertisement distribution number and time and other comprehensive information, targeted advertisement distribution is carried out on the users through a personalized recommendation algorithm, a personalized bill generation model of credit card clients is constructed, and personalized statement bills are generated for each client.

As shown in fig. 2, in step S103 provided in the embodiment of the present invention, a specific method for deleting redundant data includes:

s201, dividing data blocks of the acquired data objects by using a fixed length algorithm in each logic channel by using different data block lengths;

s202, searching a first redundant data block of the current data object for the data blocks preliminarily classified in the step two based on the standard used in each logical channel and the data blocks divided for the current data object in the logical channel by using the standard.

S203, eliminating the overlapping part existing between two or more first redundant data blocks based on the offset and the length of the first redundant data blocks found in each logic channel of the plurality of logic channels;

and S204, deleting the second redundant data block to delete the repeated data of the current data object, wherein the second redundant data block comprises the first redundant data block with the overlapped part eliminated.

In step S101, the collecting consumption data of users in a big data network by a data collection module according to the embodiment of the present invention includes:

(1.5) constructing a regression model based on the processed consumption sample data, and then determining whether each of all known second class target objects potentially belongs to the first class target object using the constructed regression model.

In step S101, when the consumption data of the users in the big data network provided by the embodiment of the present invention is collected, the customers are classified according to the difference in geography, customer income, knowledge background and education level of the customers, and the difference in financing and life style owned by different age groups.

In step S102, the preliminary classification of the collected data provided by the embodiment of the present invention includes:

(2.7) using the individual with the optimal fitness value as a clustering center to classify the data to be classified; and eliminating data in sterile line individuals to obtain required large-scale data information.

As shown in fig. 3, in step S103, after the data preprocessing, the decision tree-based customer consumption pattern classification method for credit card customers according to the consumption information of the card customers provided by the embodiment of the present invention includes:

s301, selecting classified keywords, selecting the total annual consumption amount of a client as the keywords for classifying the client, and taking the total amount of money consumed by the client through a credit card in one year as a standard for judging the consumption capability of the client;

s302, identifying the client category;

s303, selecting training samples to construct a classification model;

s304, analyzing the classification result of the decision tree.

As shown in fig. 4, in step S106, after constructing the user consumption classification model, the method according to the embodiment of the present invention first performs parameter optimization on the classification model, and the specific method adopted includes:

s401, determining the number of parameters of the constructed parameters in the classification model, and generating parameter correlation vectors with the set number of dimensions as the number of the parameters;

s402, initializing each parameter related vector to obtain a set number of initial parameter related vectors containing initial component information;

s403, iteratively updating each initial parameter related vector according to a set updating strategy to obtain a target parameter related vector containing globally optimal component information;

s404, determining the optimal parameter value of each construction parameter according to the global optimal component information.

As shown in fig. 5, an apparatus for data reconciliation decision provided in an embodiment of the present invention includes:

the data acquisition module 1 is connected with the central control and processing module and is used for acquiring customer information and consumption data of users in the big data network;

the preliminary classification module 2 is connected with the central control and processing module and is used for preliminarily classifying the acquired data to form required large-class data information;

the data preprocessing module 3 is connected with the central control and processing module and is used for preprocessing the large-scale data information, deleting redundant data and generating a consumption data record set meeting the classification mining requirement;

the central control and processing module 4 is connected with the data acquisition module, the primary classification module, the data preprocessing module, the secondary classification module, the model establishing module and the data generating module, and is used for processing the acquired data and performing coordination control on each module according to a processing result and preset parameters;

the secondary classification module 5 is connected with the central control and processing module and is used for storing the required big data in blocks in the big data server and distinguishing the subclasses again;

the model building module 6 is connected with the central control and processing module and used for constructing a user consumption classification model and mining consumption characteristics of different types of consumption groups;

and the data generation module 7 is connected with the central control and processing module and is used for constructing a personalized bill generation model of the credit card client and generating a personalized statement bill.

The preliminary classification module provided by the embodiment of the invention comprises:

The data preprocessing module provided by the embodiment of the invention comprises:

The central control and processing module provided by the embodiment of the invention comprises:

The above description is only for the purpose of illustrating the preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, and any modification, equivalent replacement, and improvement made by those skilled in the art within the technical scope of the present invention disclosed herein, which is within the spirit and principle of the present invention, should be covered by the present invention.

Claims

1. A method for data reconciliation decision, characterized in that the method for data reconciliation decision comprises the following steps:

the preliminary classifying the collected data comprises:

s12, finding the first redundant data block of the current data object for the data blocks preliminarily classified in the step two based on the standard used in each logical channel and the data blocks divided for the current data object by using the standard in the logical channel;

2. The method for data reconciliation decision making according to claim 1 wherein in step one, when the consumption data of the users in the big data network is collected, the customers are classified according to the geographic angle, the income of the customers, the difference of the knowledge background and education degree of the customers, the difference of the owned finances and the life style of different age groups.

3. The method for data reconciliation decision of claim 1 wherein in step three, after the data preprocessing, the credit card client is classified based on the decision tree based client consumption pattern according to the card client consumption information, and the specific classification method comprises:

s22, identifying the client category;

s23, selecting training samples to construct a classification model;

and S24, analyzing the decision tree classification result.

4. The method for data reconciliation decision-making according to claim 1, wherein in step six, after constructing the user consumption classification model, parameter optimization is firstly carried out on the classification model, and the adopted specific method comprises the following steps:

5. An apparatus for performing a data reconciliation decision making method according to any one of claims 1 to 4, wherein the apparatus for performing a data reconciliation decision making method comprises:

6. The apparatus of data reconciliation decision of claim 5 wherein the preliminary classification module comprises:

7. The apparatus of data reconciliation decision of claim 5 wherein the data preprocessing module comprises:

a data de-duplication unit for de-duplicating the current data object for all the first redundant data blocks found by the first redundant data block determination unit;

8. The apparatus for data reconciliation decision of claim 5 wherein the central control and processing module comprises:

9. A server, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of data reconciliation decision according to any of claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out a method of data reconciliation decision according to one of the claims 1 to 6.