CN114722935A - Credit card customer data subdivision method and device - Google Patents
Credit card customer data subdivision method and device Download PDFInfo
- Publication number
- CN114722935A CN114722935A CN202210355525.1A CN202210355525A CN114722935A CN 114722935 A CN114722935 A CN 114722935A CN 202210355525 A CN202210355525 A CN 202210355525A CN 114722935 A CN114722935 A CN 114722935A
- Authority
- CN
- China
- Prior art keywords
- credit card
- data
- card client
- client data
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2323—Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/22—Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Development Economics (AREA)
- Discrete Mathematics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention provides a credit card client data subdivision method and a device, before subdividing a target credit card client data set, a credit card client data KD-tree is established by using the target credit card client data set, so that the target credit card client data set is subdivided by using a space search algorithm combining the credit card client data KD-tree and a DBSCAN algorithm. The KD-tree structure ensures that the core point in the DBSCAN algorithm can be determined only by traversing a limited number of search paths when the DBSCAN algorithm is executed, the time complexity of the final algorithm is O (nlogn), and compared with the time complexity O (n2) only using the DBSCAN algorithm, the time for traversing invalid data is saved, the credit card client data subdivision time is effectively reduced, and the credit card client data subdivision efficiency is greatly improved.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a credit card client data subdivision method and a credit card client data subdivision device.
Background
With the development of big data technology, the operation mode of the bank personal credit card business is gradually changed to an international advanced mode which takes a customer as a center, takes data as a center and takes information as a basis, and scientific customer relationship management is very important for the quality of the business bank credit card business.
At present, banks generally adopt a traditional experience-based subdivision method to subdivide credit card customer data, or adopt a simple cluster analysis method to subdivide the credit card customer data. Among them, the experience-based subdivision method is less accurate and inefficient. The simple clustering analysis method needs to repeatedly traverse all credit card client data for many times in the subdivision process, and for the credit card clients with the number of silver lines in the millions, the method has too large time cost and low efficiency and cannot meet the high-efficiency requirement of the business development of the bank credit cards.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for subdividing credit card client data, which effectively reduce the time for subdividing the credit card client data and greatly improve the efficiency for subdividing the credit card client data.
In order to achieve the above purpose, the invention provides the following specific technical scheme:
in a first aspect, an embodiment of the present invention discloses a credit card customer data subdivision method, including:
preprocessing original credit card client data according to subdivision requirements to obtain a target credit card client data set;
creating a credit card client data KD-tree using the target credit card client data set;
subdividing the target credit card customer data set using a spatial search algorithm that combines the credit card customer data KD-tree with a DBSCAN algorithm.
In some embodiments, the pre-processing the original credit card customer data according to the subdivision requirements to obtain the target credit card customer data set comprises:
determining a plurality of credit card customer attributes according to the subdivision requirements;
extracting said target credit card customer data set comprising a plurality of said credit card customer attributes from said original credit card customer data.
In some embodiments, creating a credit card client data KD-tree using the target credit card client data set comprises:
calculating the data variance corresponding to each credit card client attribute in the data set to be divided, determining the credit card client attribute corresponding to the maximum data variance as a division dimension, and taking the data set to be divided as the target credit card client data set in an initial state;
ranking the credit card client data in the target credit card client data set in the segmentation dimension and determining a median value in the ranking;
dividing credit card client data not greater than the median into a left sub-tree and dividing credit card client data greater than the median into a right sub-tree by taking the credit card client data corresponding to the median as a node;
and respectively determining credit card client data corresponding to the left sub-tree and the right sub-tree as a data set to be divided, and returning to calculate the data variance corresponding to each credit card client attribute in the data set to be divided until the left sub-tree and the right sub-tree only contain one credit card client data.
In some embodiments, subdividing the target credit card client data set using a spatial search algorithm in which the credit card client data KD-tree is combined with a DBSCAN algorithm comprises:
setting the neighborhood radius and density of the DBSCAN algorithm;
searching a data set in a neighborhood radius of target data p in the credit card client data KD-tree through a KD-tree space search algorithm, wherein the target data p is any credit card client data in the target credit card client data set in an initial state;
if the number of the credit card customer data in the searched data set is smaller than the density, re-selecting the target data p;
if the number of the credit card customer data in the searched data set is not smaller than the density, marking the target data p as a core point, and starting a new subdivision object;
traversing each credit card client data in the neighborhood radius of the target data p, and adding the credit card client data of which the number of the credit card data in the data set in the neighborhood radius is not less than the density into the subdivision object;
and selecting any one credit card client data from the unprocessed credit card client data in the target credit card client data set as target data p, returning to execute the KD-tree space search algorithm to search the data set in the neighborhood radius of the target data p in the KD-tree of the credit card client data until all the credit card client data in the target credit card client data set are processed, and finishing subdivision.
In a second aspect, an embodiment of the present invention discloses a credit card customer data subdivision device, including:
the data preprocessing unit is used for preprocessing the original credit card client data according to subdivision requirements to obtain a target credit card client data set;
a KD-Tree creating unit for creating a credit card client data KD-tree using the target credit card client data set;
and the data subdivision unit is used for subdividing the target credit card client data set by using a space search algorithm combining the credit card client data KD-tree and the DBSCAN algorithm.
In some embodiments, the data preprocessing unit is specifically configured to:
determining a plurality of credit card customer attributes according to the subdivision requirements;
extracting said target credit card customer data set comprising a plurality of said credit card customer attributes from said original credit card customer data.
In some embodiments, the KD-tree creation unit is specifically configured to:
calculating the data variance corresponding to each credit card client attribute in the data set to be divided, determining the credit card client attribute corresponding to the maximum data variance as a division dimension, and taking the data set to be divided as the target credit card client data set in an initial state;
ranking the credit card client data in the target credit card client data set in the segmentation dimension and determining a median value in the ranking;
dividing credit card client data not greater than the median into a left sub-tree and dividing credit card client data greater than the median into a right sub-tree by taking the credit card client data corresponding to the median as a node;
and respectively determining credit card client data corresponding to the left sub-tree and the right sub-tree as a data set to be divided, and returning to the step of calculating the data variance corresponding to the attribute of each credit card client in the data set to be divided until the left sub-tree and the right sub-tree only contain one credit card client data.
In some embodiments, the data subdivision unit is specifically configured to:
setting the neighborhood radius and density of the DBSCAN algorithm;
searching a data set in a neighborhood radius of target data p in the credit card client data KD-tree through a KD-tree space search algorithm, wherein the target data p is any credit card client data in the target credit card client data set in an initial state;
if the number of the credit card customer data in the searched data set is smaller than the density, re-selecting the target data p;
if the number of the credit card customer data in the searched data set is not smaller than the density, marking the target data p as a core point, and starting a new subdivision object;
traversing each credit card client data in the neighborhood radius of the target data p, and adding the credit card client data of which the number of the credit card data in the data set in the neighborhood radius is not less than the density into the subdivision object;
and selecting any one credit card client data from the unprocessed credit card client data in the target credit card client data set as target data p, returning to execute the KD-tree space search algorithm to search the data set in the neighborhood radius of the target data p in the KD-tree of the credit card client data until all the credit card client data in the target credit card client data set are processed, and finishing subdivision.
In a third aspect, the embodiment of the invention discloses a storage medium, and the storage medium comprises a stored program, wherein the program executes the credit card customer data subdivision method described in any implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present invention discloses an electronic device, where the electronic device includes at least one processor, and at least one memory and a bus connected to the processor;
the processor and the memory complete mutual communication through the bus;
the processor is configured to invoke program instructions in the memory to perform a credit card customer data subdivision method as described in any implementation of the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
before the target credit card client data set is subdivided, a credit card client data KD-tree is established by using the target credit card client data set, so that the target credit card client data set is subdivided by using a space search algorithm combining the credit card client data KD-tree and a DBSCAN algorithm. The KD-tree structure ensures that the core point in the DBSCAN algorithm can be determined only by traversing a limited number of search paths when the DBSCAN algorithm is executed, and the time complexity of the final algorithm is O (nlogn) which is relative to the time complexity O (n) only using the DBSCAN algorithm2) The time for traversing invalid data is saved, the subdividing time of the customer data of the credit card is effectively reduced, and the subdividing efficiency of the customer data of the credit card is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for subdividing credit card customer data according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a part of a method for subdividing credit card customer data according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart illustrating a part of a credit card customer data subdivision method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a credit card customer data subdividing device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The inventor finds out through research that: the method for subdividing credit card customer data by a bank mainly comprises the following steps: the K-means algorithm and the DBSCAN algorithm. The Kmeans algorithm is the most classical clustering algorithm based on division, and the algorithm idea is as follows: clustering is performed centering on k points in space, classifying the objects closest to them. And updating the central value of each cluster through successive iteration until an optimal clustering result is obtained. The value of the Kmeans algorithm to the customer data clustering center of the bank credit card is random, the value of the clustering data k has great influence on the clustering result, the clustering result is difficult to be determined at one time, the clustering result needs to be adjusted, and the optimal division result can be achieved through multiple iterations. In the clustering process, clustering is performed by calculating the Euclidean distance between each data point and the clustering center, so that the efficiency is low.
DBSCAN (sensitivity-Based Spatial Clustering of Applications with Noise) is a relatively representative Spatial Clustering algorithm that partitions a data set Based on Density. The algorithm idea is as follows: for each of the clustersThe number of objects included in the radius must be equal to or greater than a predetermined threshold. The traditional DBSCAN clustering algorithm is also called as a brute force algorithm, exhaustive search is adopted to subdivide a data set, the customer data of the bank credit card needs to be repeatedly traversed for many times in the dividing process, the traversal of many data points is redundant, the algorithm time complexity is very high, and O (n) is achieved2)。
Therefore, the current method for subdividing the credit card customer data by the bank is high in complexity, long in time consumption and low in efficiency.
In order to solve the technical problem, the invention uses the KD-tree in the DBSCAN algorithm, provides a bank credit card client data subdivision method with higher time efficiency, reduces the bank credit card client data subdivision time, and greatly improves the credit card client data subdivision efficiency.
Specifically, referring to fig. 1, the method for subdividing the credit card customer data disclosed in this embodiment includes the following steps:
s101: preprocessing original credit card client data according to subdivision requirements to obtain a target credit card client data set;
it can be understood that a large amount of bank card customer data is stored in the bank system, the original credit card customer data comprises a plurality of attributes, such as customer age, income level, consumption amount, personal credit and the like, the plurality of attributes can be selected according to subdivision requirements to subdivide the credit card customer data, the selected attributes are different, the obtained credit card customer data subdivision results are different, and therefore different marketing strategies are pertinently adopted for different customer groups.
Each attribute corresponds to one dimension, and a plurality of attributes are selected to subdivide the credit card customer data, namely, the credit card customer data is subdivided in a plurality of dimensions, such as income level of customers in two dimensions, consumption amount, age, sex, personal credit and the like of customers in three dimensions.
Thus, prior to subdividing the credit card customer data, the original credit card customer data is preprocessed according to the subdivision requirements, i.e. a plurality of credit card customer attributes are determined according to the subdivision requirements, and a target credit card customer data set comprising said plurality of credit card customer attributes is extracted from the original credit card customer data.
S102: creating a credit card customer data KD-tree using the target credit card customer data set;
in the process of creating the client data KD-number of the credit card, in order to obtain higher data resolution and balance, the embodiment selects the dimension with the largest variance to divide the client data of the credit card, and simultaneously takes the middle point of the selected dimension as a dividing point in order to save the tree building time to the maximum extent.
Referring to fig. 2, an implementation manner of S102 disclosed in this embodiment is as follows:
s201: calculating the data variance corresponding to each credit card client attribute in the data set to be divided, and determining the credit card client attribute corresponding to the maximum data variance as a division dimension;
it should be noted that, in the initial state, the data set to be divided is the target credit card client data set.
S202: sorting the credit card client data in the target credit card client data set in the segmentation dimension, and determining a median value in the sorting;
that is, the credit card client data in the target credit card client data set is sorted by size of the data in the cut dimension, and a median value in the sorted sort by cut dimension is determined.
S203: dividing the credit card client data not greater than the median into a left sub-tree and dividing the credit card client data greater than the median into a right sub-tree by taking the credit card client data corresponding to the median as nodes;
s204: judging whether the left subtree and the right subtree only contain one credit card client data;
if yes, go to S205: completing creating a KD-tree of the credit card client data;
if not, executing S206: credit card client data corresponding to a subtree containing more than one credit card client data is determined as a data set to be divided, and execution returns to S201.
The following describes the steps of constructing a KD-tree of credit card client data, taking as an example that the target credit card client data set includes two credit card client attributes of client income level and consumption amount:
1) inputting a credit card customer data set;
2) if the input data is empty, returning empty credit card client data KD-tree;
3) if the input data is not null, respectively calculating the data variance of the credit card client data in two dimensions of client income and consumption amount, and selecting the maximum variance value;
4) selecting the dimension of the maximum variance value as a segmentation dimension;
5) sorting the credit card customer data sets in a segmentation dimension, and taking a median value as a threshold value;
6) dividing the residual bank credit card client data into a left sub-tree and a right sub-tree, wherein the point value of the left sub-tree is less than or equal to the point value of the middle point of the division dimension;
7) and (5) repeatedly executing the steps (3) to (6) on the left subtree and the right subtree until the left subtree and the right subtree only contain one datum, and the construction of the KD-tree of the credit card client data is finished.
The preliminary subdivision of the credit card client data is realized by creating a KD-tree of the credit card client data.
S103: the target credit card customer data set is subdivided using a spatial search algorithm combining the credit card customer data KD-tree and DBSCAN algorithm.
Referring to fig. 3, an optional implementation manner of S103 disclosed in this embodiment is as follows:
s301: setting the neighborhood radius and density of the DBSCAN algorithm;
s302: searching a data set in the neighborhood radius of target data p in the KD-tree of the credit card client data through a KD-tree space search algorithm;
in the initial state, the target data p is any one of the credit card client data in the target credit card client data set.
S303: judging whether the number of the credit card customer data in the searched data set is smaller than the density;
if the number of the credit card customer data in the searched data set is less than the density, executing S304: selecting any one of the unprocessed credit card client data in the target credit card client data set as target data p, and returning to execute S302;
if the number of the credit card client data in the searched data set is not less than the density, executing S305: marking the target data p as a core point, and starting a new subdivision object;
s306: traversing each credit card client data in the neighborhood radius of the target data p, and adding the credit card client data of which the number of the credit card data in the data set in the neighborhood radius is not less than the density into the subdivision object;
s307: judging whether the credit card customer data in the target credit card customer data set are processed completely;
if all the processing is completed, S308: the subdivision is finished;
if not, executing S304: any one of the unprocessed credit card client data in the target credit card client data set is selected as target data p, and the process returns to step S302.
It can be understood that, in this embodiment, the tree structure of the card client data KD-tree enables determination of the core point to be completed only by traversing a limited number of search paths, so that time for traversing invalid data is greatly saved. It can be seen that the larger the value of n is, the more excellent the algorithm is.
Based on the above embodiment, the present invention discloses a method for subdividing credit card client data, and the embodiment of the present invention correspondingly discloses a device for subdividing credit card client data, please refer to fig. 4, where the device includes:
a data preprocessing unit 401, configured to preprocess the original credit card client data according to the subdivision requirements, to obtain a target credit card client data set;
a KD-tree creation unit 402 for creating a credit card client data KD-tree using said target credit card client data set;
a data subdividing unit 403 for subdividing the target credit card client data set using a space search algorithm in which the credit card client data KD-tree is combined with the DBSCAN algorithm.
In some embodiments, the data preprocessing unit 401 is specifically configured to:
determining a plurality of credit card customer attributes according to the subdivision requirements;
extracting said target credit card customer data set comprising a plurality of said credit card customer attributes from said original credit card customer data.
In some embodiments, the KD-tree creation unit 402 is specifically configured to:
calculating the data variance corresponding to each credit card client attribute in the data set to be divided, determining the credit card client attribute corresponding to the maximum data variance as a division dimension, and taking the data set to be divided as the target credit card client data set in an initial state;
ranking the credit card client data in the target credit card client data set in the segmentation dimension and determining a median value in the ranking;
dividing credit card client data not greater than the median into a left sub-tree and dividing credit card client data greater than the median into a right sub-tree by taking the credit card client data corresponding to the median as a node;
and respectively determining credit card client data corresponding to the left sub-tree and the right sub-tree as a data set to be divided, and returning to the step of calculating the data variance corresponding to the attribute of each credit card client in the data set to be divided until the left sub-tree and the right sub-tree only contain one credit card client data.
In some embodiments, the data subdivision unit 403 is specifically configured to:
setting the neighborhood radius and density of the DBSCAN algorithm;
searching a data set in a neighborhood radius of target data p in the credit card client data KD-tree through a KD-tree space search algorithm, wherein the target data p is any credit card client data in the target credit card client data set in an initial state;
if the number of the credit card customer data in the searched data set is smaller than the density, re-selecting the target data p;
if the number of the credit card customer data in the searched data set is not smaller than the density, marking the target data p as a core point, and starting a new subdivision object;
traversing each credit card client data in the neighborhood radius of the target data p, and adding the credit card client data of which the number of the credit card data in the data set in the neighborhood radius is not less than the density into the subdivision object;
and selecting any one credit card client data from the unprocessed credit card client data in the target credit card client data set as target data p, returning to execute the KD-tree space search algorithm to search the data set in the neighborhood radius of the target data p in the KD-tree of the credit card client data until all the credit card client data in the target credit card client data set are processed, and finishing subdivision.
In the credit card client data subdivision device disclosed in this embodiment, before the target credit card client data set is subdivided, a credit card client data KD-tree is created by using the target credit card client data set, so that the target credit card client data set is subdivided by using a space search algorithm in which the credit card client data KD-tree is combined with a DBSCAN algorithm. The KD-tree structure ensures that the core point in the DBSCAN algorithm can be determined only by traversing a limited number of search paths when the DBSCAN algorithm is executed, the time complexity of the final algorithm is O (nlogn), and compared with the time complexity O (n2) only using the DBSCAN algorithm, the time for traversing invalid data is saved, the credit card client data subdivision time is effectively reduced, and the credit card client data subdivision efficiency is greatly improved.
The invention also discloses a storage medium, which comprises a stored program, wherein the program executes the following credit card customer data subdivision method:
preprocessing original credit card client data according to subdivision requirements to obtain a target credit card client data set;
creating a credit card client data KD-tree using the target credit card client data set;
subdividing the target credit card client data set using a spatial search algorithm that combines the credit card client data KD-tree with a DBSCAN algorithm.
Further, the preprocessing the original credit card customer data according to the subdivision requirements to obtain a target credit card customer data set includes:
determining a plurality of credit card customer attributes according to the subdivision requirements;
extracting said target credit card customer data set comprising a plurality of said credit card customer attributes from said original credit card customer data.
Further, creating a credit card client data KD-tree using the target credit card client data set, comprising:
calculating the data variance corresponding to each credit card client attribute in the data set to be divided, determining the credit card client attribute corresponding to the maximum data variance as a division dimension, and taking the data set to be divided as the target credit card client data set in an initial state;
ranking the credit card client data in the target credit card client data set in the segmentation dimension and determining a median value in the ranking;
dividing credit card client data not greater than the median into a left sub-tree and dividing credit card client data greater than the median into a right sub-tree by taking the credit card client data corresponding to the median as a node;
and respectively determining credit card client data corresponding to the left sub-tree and the right sub-tree as a data set to be divided, and returning to calculate the data variance corresponding to each credit card client attribute in the data set to be divided until the left sub-tree and the right sub-tree only contain one credit card client data.
Further, subdividing the target credit card client data set using a spatial search algorithm of the credit card client data KD-tree in combination with a DBSCAN algorithm, comprising:
setting the neighborhood radius and density of the DBSCAN algorithm;
searching a data set in a neighborhood radius of target data p in the credit card client data KD-tree through a KD-tree space search algorithm, wherein the target data p is any credit card client data in the target credit card client data set in an initial state;
if the number of the credit card customer data in the searched data set is smaller than the density, re-selecting the target data p;
if the number of the credit card customer data in the searched data set is not smaller than the density, marking the target data p as a core point, and starting a new subdivision object;
traversing each credit card client data in the neighborhood radius of the target data p, and adding the credit card client data of which the number of the credit card data in the data set in the neighborhood radius is not less than the density into the subdivision object;
and selecting any one credit card client data from the unprocessed credit card client data in the target credit card client data set as target data p, returning to execute the KD-tree space search algorithm to search the data set in the neighborhood radius of the target data p in the KD-tree of the credit card client data until all the credit card client data in the target credit card client data set are processed, and finishing subdivision.
The embodiment of the invention also discloses electronic equipment, which comprises at least one processor, at least one memory and a bus, wherein the memory and the bus are connected with the processor;
the processor and the memory complete mutual communication through the bus;
the processor is configured to invoke program instructions in the memory to perform the following credit card customer data subdivision method:
preprocessing original credit card client data according to subdivision requirements to obtain a target credit card client data set;
creating a credit card client data KD-tree using the target credit card client data set;
subdividing the target credit card client data set using a spatial search algorithm that combines the credit card client data KD-tree with a DBSCAN algorithm.
Further, the preprocessing the original credit card customer data according to the subdivision requirements to obtain a target credit card customer data set includes:
determining a plurality of credit card customer attributes according to the subdivision requirements;
extracting said target credit card customer data set comprising a plurality of said credit card customer attributes from said original credit card customer data.
Further, creating a credit card client data KD-tree using the target credit card client data set, comprising:
calculating the data variance corresponding to each credit card client attribute in the data set to be divided, determining the credit card client attribute corresponding to the maximum data variance as a division dimension, and taking the data set to be divided as the target credit card client data set in an initial state;
ranking the credit card client data in the target credit card client data set in the segmentation dimension and determining a median value in the ranking;
dividing credit card client data not greater than the median into a left sub-tree and dividing credit card client data greater than the median into a right sub-tree by taking the credit card client data corresponding to the median as a node;
and respectively determining credit card client data corresponding to the left sub-tree and the right sub-tree as a data set to be divided, and returning to calculate the data variance corresponding to each credit card client attribute in the data set to be divided until the left sub-tree and the right sub-tree only contain one credit card client data.
Further, subdividing the target credit card client data set using a spatial search algorithm that combines the credit card client data KD-tree with a DBSCAN algorithm, comprises:
setting the neighborhood radius and density of the DBSCAN algorithm;
searching a data set in a neighborhood radius of target data p in the credit card client data KD-tree through a KD-tree space search algorithm, wherein the target data p is any credit card client data in the target credit card client data set in an initial state;
if the number of the credit card customer data in the searched data set is smaller than the density, re-selecting the target data p;
if the number of the credit card customer data in the searched data set is not smaller than the density, marking the target data p as a core point, and starting a new subdivision object;
traversing each credit card client data in the neighborhood radius of the target data p, and adding the credit card client data of which the number of the credit card data in the data set in the neighborhood radius is not less than the density into the subdivision object;
and selecting any one credit card client data from the unprocessed credit card client data in the target credit card client data set as target data p, returning to execute the KD-tree space search algorithm to search the data set in the neighborhood radius of the target data p in the KD-tree of the credit card client data until all the credit card client data in the target credit card client data set are processed, and finishing subdivision.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above embodiments can be combined arbitrarily, and the features described in the embodiments in the present specification can be replaced or combined with each other in the above description of the disclosed embodiments, so that those skilled in the art can implement or use the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A credit card customer data subdivision method, comprising:
preprocessing original credit card client data according to subdivision requirements to obtain a target credit card client data set;
creating a credit card client data KD-tree using the target credit card client data set;
subdividing the target credit card client data set using a spatial search algorithm that combines the credit card client data KD-tree with a DBSCAN algorithm.
2. The method of claim 1, wherein pre-processing the raw credit card customer data according to subdivision requirements to obtain a target credit card customer data set comprises:
determining a plurality of credit card customer attributes according to the subdivision requirements;
extracting said target credit card customer data set comprising a plurality of said credit card customer attributes from said original credit card customer data.
3. The method of claim 2, wherein creating a credit card client data KD-tree using the target credit card client data set comprises:
calculating the data variance corresponding to each credit card client attribute in the data set to be divided, determining the credit card client attribute corresponding to the maximum data variance as a division dimension, and taking the data set to be divided as the target credit card client data set in an initial state;
ranking the credit card client data in the target credit card client data set in the segmentation dimension and determining a median value in the ranking;
dividing credit card client data not greater than the median into a left sub-tree and dividing credit card client data greater than the median into a right sub-tree by taking the credit card client data corresponding to the median as a node;
and respectively determining credit card client data corresponding to the left sub-tree and the right sub-tree as a data set to be divided, and returning to calculate the data variance corresponding to each credit card client attribute in the data set to be divided until the left sub-tree and the right sub-tree only contain one credit card client data.
4. The method of claim 1, wherein subdividing the target credit card client data set using a spatial search algorithm of the credit card client data KD-tree in combination with a DBSCAN algorithm comprises:
setting the neighborhood radius and density of the DBSCAN algorithm;
searching a data set in a neighborhood radius of target data p in the credit card client data KD-tree through a KD-tree space search algorithm, wherein the target data p is any credit card client data in the target credit card client data set in an initial state;
if the number of the credit card customer data in the searched data set is smaller than the density, re-selecting the target data p;
if the number of the credit card customer data in the searched data set is not smaller than the density, marking the target data p as a core point, and starting a new subdivision object;
traversing each credit card client data in the neighborhood radius of the target data p, and adding the credit card client data of which the number of the credit card data in the data set in the neighborhood radius is not less than the density into the subdivision object;
and selecting any one credit card client data from the unprocessed credit card client data in the target credit card client data set as target data p, returning to execute the KD-tree space search algorithm to search the data set in the neighborhood radius of the target data p in the KD-tree of the credit card client data until all the credit card client data in the target credit card client data set are processed, and finishing subdivision.
5. A credit card customer data subdivision apparatus, comprising:
the data preprocessing unit is used for preprocessing the original credit card client data according to subdivision requirements to obtain a target credit card client data set;
a KD-Tree creation unit for creating a credit card client data KD-Tree using said target credit card client data set;
and the data subdivision unit is used for subdividing the target credit card client data set by using a space search algorithm combining the credit card client data KD-tree and the DBSCAN algorithm.
6. The apparatus according to claim 5, wherein the data preprocessing unit is specifically configured to:
determining a plurality of credit card customer attributes according to the subdivision requirements;
extracting said target credit card customer data set comprising a plurality of said credit card customer attributes from said original credit card customer data.
7. The apparatus according to claim 6, wherein the KD-tree creation unit is specifically configured to:
calculating the data variance corresponding to each credit card client attribute in the data set to be divided, determining the credit card client attribute corresponding to the maximum data variance as a division dimension, and taking the data set to be divided as the target credit card client data set in an initial state;
ranking the credit card client data in the target credit card client data set in the segmentation dimension and determining a median value in the ranking;
taking the credit card client data corresponding to the median value as nodes, dividing the credit card client data not greater than the median value into a left sub-tree, and dividing the credit card client data greater than the median value into a right sub-tree;
and respectively determining credit card client data corresponding to the left sub-tree and the right sub-tree as a data set to be divided, and returning to calculate the data variance corresponding to each credit card client attribute in the data set to be divided until the left sub-tree and the right sub-tree only contain one credit card client data.
8. The apparatus according to claim 5, wherein the data subdivision unit is specifically configured to:
setting the neighborhood radius and density of the DBSCAN algorithm;
searching a data set in the neighborhood radius of target data p in the KD-tree of the credit card client data through a KD-tree space search algorithm, wherein the target data p is any credit card client data in the target credit card client data set in an initial state;
if the number of the credit card customer data in the searched data set is smaller than the density, re-selecting the target data p;
if the number of the credit card customer data in the searched data set is not smaller than the density, marking the target data p as a core point, and starting a new subdivision object;
traversing each credit card client data in the neighborhood radius of the target data p, and adding the credit card client data of which the number of the credit card data in the data set in the neighborhood radius is not less than the density into the subdivision object;
and selecting any one credit card client data from the unprocessed credit card client data in the target credit card client data set as target data p, returning to execute the KD-tree space search algorithm to search the data set in the neighborhood radius of the target data p in the KD-tree of the credit card client data until all the credit card client data in the target credit card client data set are processed, and finishing subdivision.
9. A storage medium, characterized in that the storage medium includes a stored program, wherein the program executes the credit card customer data division method of any one of claims 1 to 4.
10. An electronic device, comprising at least one processor, and at least one memory connected to the processor, a bus;
the processor and the memory complete mutual communication through the bus;
the processor is configured to invoke program instructions in the memory to perform the credit card customer data subdivision method of any of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210355525.1A CN114722935A (en) | 2022-04-06 | 2022-04-06 | Credit card customer data subdivision method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210355525.1A CN114722935A (en) | 2022-04-06 | 2022-04-06 | Credit card customer data subdivision method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114722935A true CN114722935A (en) | 2022-07-08 |
Family
ID=82242203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210355525.1A Pending CN114722935A (en) | 2022-04-06 | 2022-04-06 | Credit card customer data subdivision method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114722935A (en) |
-
2022
- 2022-04-06 CN CN202210355525.1A patent/CN114722935A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109189991B (en) | Duplicate video identification method, device, terminal and computer readable storage medium | |
Cao et al. | An improved k-medoids clustering algorithm | |
JP6779231B2 (en) | Data processing method and system | |
CN109242002A (en) | High dimensional data classification method, device and terminal device | |
RU2556425C1 (en) | Method for automatic iterative clusterisation of electronic documents according to semantic similarity, method for search in plurality of documents clustered according to semantic similarity and computer-readable media | |
CN104077723B (en) | A kind of social networks commending system and method | |
US9619501B2 (en) | Index scan device and index scan method | |
US20210263903A1 (en) | Multi-level conflict-free entity clusters | |
CN115795061A (en) | Knowledge graph construction method and system based on word vectors and dependency syntax | |
CN116993513A (en) | Financial wind control model interpretation method and device and computer equipment | |
CN117648495B (en) | Data pushing method and system based on cloud primary vector data | |
CN108334532B (en) | Spark-based Eclat parallelization method, system and device | |
CN110503117A (en) | The method and apparatus of data clusters | |
CN110209895B (en) | Vector retrieval method, device and equipment | |
CN115795314B (en) | Key sample sampling method, system, electronic equipment and storage medium | |
CN108198084A (en) | A kind of complex network is overlapped community discovery method | |
CN111737461A (en) | Text processing method and device, electronic equipment and computer readable storage medium | |
CN114722935A (en) | Credit card customer data subdivision method and device | |
CN114268625B (en) | Feature selection method, device, equipment and storage medium | |
CN114238576A (en) | Data matching method and device, computer equipment and storage medium | |
Beavers et al. | Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure | |
CN115982634A (en) | Application program classification method and device, electronic equipment and computer program product | |
CN117610541B (en) | Author disambiguation method and device for large-scale data and readable storage medium | |
CN111291182A (en) | Hotspot event discovery method, device, equipment and storage medium | |
CN113392124B (en) | Structured language-based data query method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |