CN116881333A

CN116881333A - Method and device for mining potential guests, electronic equipment and readable storage medium

Info

Publication number: CN116881333A
Application number: CN202310897801.1A
Authority: CN
Inventors: 辛治运; 鹿群; 陈海雯; 黎豪; 陈圣松; 徐秋石
Original assignee: Gf Securities Co ltd
Current assignee: Gf Securities Co ltd
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-10-13

Abstract

The application provides a method and a device for mining a diver, electronic equipment and a readable storage medium, wherein the method comprises the following steps: and generating a first training sample set according to the historical product purchase record and the customer information set, training the initial mining model based on the first training template to obtain a first intermediate model, and obtaining a second training sample set based on the prediction result of the first intermediate model and the product clustering information. And training the first intermediate model based on the second training sample set to obtain a second intermediate model, and obtaining a third training sample set based on a prediction result of the second intermediate model. Training the second intermediate model based on the third training sample set to obtain a target potential passenger mining model, and then inputting the third positive sample set into the target potential passenger mining model for prediction to obtain potential passenger information. According to the application, the model is trained based on the training sample for multiple times, and finally the target submarine passenger mining model is obtained, so that the submarine passenger information is obtained, and the accuracy of the result is ensured.

Description

Method and device for mining potential guests, electronic equipment and readable storage medium

Technical Field

The application relates to the technical field of underwater passenger mining, in particular to an underwater passenger mining method, an underwater passenger mining device, electronic equipment and a readable storage medium.

Background

With the continued development of computer technology, more and more businesses begin to utilize computer technology to mine potential customers of the business.

Currently, potential customers of an enterprise are commonly mined in two ways. One way is to use a generic smart model only to mine the potential customers in general and apply to all products without distinguishing between the specific products. Another way is to divide the guest groups by using business rules, and recommending products according to the divided guest groups. For example, a product is divided into topics, and clients meeting rules are artificially designated for each topic as a guest group of the topic.

However, the current methods of mining potential customers all have problems of accuracy.

Disclosure of Invention

The application aims to provide a method, a device, electronic equipment and a readable storage medium for mining a potential customer, aiming at the defects in the prior art, so as to solve the problem of insufficient accuracy of mining results in the prior art.

In order to achieve the above purpose, the technical scheme adopted by the embodiment of the application is as follows:

In a first aspect, an embodiment of the present application provides a method for mining a diver, including:

generating a first training sample set according to the historical product purchase record and the client information set, wherein the first training sample set comprises: a first positive sample set including customer information of any product purchased, and a first negative sample set including customer information of any product not purchased;

training the initial mining model based on the first training sample set to obtain a first intermediate model, and obtaining a second training sample set based on a prediction result of the first intermediate model and product clustering information, wherein the second training sample set comprises: a second positive sample set including customer information of each product class purchased, and a second negative sample set including customer information of each product class not purchased;

training the first intermediate model based on the second training sample set to obtain a second intermediate model, and obtaining a third training sample set based on a prediction result of the second intermediate model, wherein the third training sample set comprises: a third positive sample set including customer information and product information of each product purchased, and a third negative sample set including customer and product information of each product not purchased;

Training the second intermediate model based on the third training sample set to obtain a target diving guest mining model;

and inputting the third positive sample set into the target potential passenger mining model to predict, so as to obtain the potential passenger information.

Optionally, the generating a first training sample set according to the historical product purchase record and the customer information set includes:

determining the purchased customers according to the historical product purchase records;

reading the customer information of the purchased customers from the customer information set according to the identification of the purchased customers, and adding the customer information of the purchased customers to the first positive sample set;

reading the customer information which does not belong to the purchased customer from the customer information set to obtain the customer information which is not purchased and is to be selected;

and undersampling the information of the to-be-selected non-purchased clients to obtain the first negative sample set.

Optionally, the obtaining a second training sample set based on the prediction result of the first intermediate model and the product clustering information includes:

inputting the client information in the first positive sample set into the first intermediate model to obtain a prediction result of the first intermediate model, wherein the prediction result of the first intermediate model is used for indicating the probability of each client purchasing any product;

Determining a plurality of target client information in the client information set according to the prediction result of the first intermediate model;

clustering products in the product set to obtain a plurality of product classes;

and obtaining the second training sample set according to the target client information and the product classes.

Optionally, the determining a plurality of target client information in the client information set according to the prediction result of the first intermediate model includes:

ordering all the client information in the client information set according to the probability indicated by the prediction result to obtain a client information sequence;

and screening a first preset number of client information in the client information sequence as the plurality of target client information according to the client information sequence in the client information sequence.

Optionally, the obtaining the second training sample set according to the target client information and the product classes includes:

determining at least one product class corresponding to each client according to each target client information;

generating the second positive sample set according to the information of each target customer and each product class corresponding to each customer;

Reading negative customer information which does not belong to the target customer information from the customer information set;

and generating the second negative sample set according to the negative client information.

Optionally, the obtaining a third training sample set based on the prediction result of the second intermediate model includes:

inputting the information of each customer in the second positive sample set into the second intermediate model to obtain a prediction result of the second intermediate model, wherein the prediction result of the second intermediate model is used for indicating the probability of each customer purchasing any one of the products;

determining a second preset number of customer information to be selected according to the prediction result of the second intermediate model;

and obtaining the third training sample set according to the information of each client to be selected.

Optionally, the obtaining the third training sample set according to the client information to be selected includes:

determining purchased products corresponding to the customer information to be selected according to the customer information to be selected and the historical product purchase records;

adding the purchased products corresponding to the customer information to be selected as a positive sample to the third positive sample set;

and reading the client information which does not belong to the client information to be selected from the client information set, and generating the third negative sample set according to the client information which does not belong to the client information to be selected.

In a second aspect, another embodiment of the present application provides a diver excavation apparatus, including:

the generation module is used for generating a first training sample set according to the historical product purchase record and the client information set, and the first training sample set comprises: a first positive sample set including customer information of any product purchased, and a first negative sample set including customer information of any product not purchased;

the first training module is configured to train the initial mining model based on the first training sample set to obtain a first intermediate model, and obtain a second training sample set based on a prediction result of the first intermediate model and product clustering information, where the second training sample set includes: a second positive sample set including customer information of each product class purchased, and a second negative sample set including customer information of each product class not purchased;

the second training module is configured to train the first intermediate model based on the second training sample set to obtain a second intermediate model, and obtain a third training sample set based on a prediction result of the second intermediate model, where the third training sample set includes: a third positive sample set including customer information and product information of each product purchased, and a third negative sample set including customer and product information of each product not purchased;

The third training module is used for training the second intermediate model based on the third training sample set to obtain a target diving guest mining model;

and the prediction module is used for inputting the third positive sample set into the target potential passenger mining model to perform prediction so as to obtain potential passenger information.

Optionally, the generating module is specifically configured to:

Optionally, the first training module is specifically configured to:

Optionally, the second training module is specifically configured to:

optionally, the third training sample set is obtained according to the information of each client to be selected.

The second training module is specifically configured to:

In a third aspect, another embodiment of the present application provides an electronic device, including: a processor, a storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over a bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the method as described in the first aspect above.

In a fourth aspect, another embodiment of the application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method according to the first aspect described above.

The beneficial effects of the application are as follows: according to the historical product purchase record and the customer information set, a first training sample set is generated, the initial mining model is trained based on the first training sample set to obtain a first intermediate model, so that a prediction result of the first intermediate model is obtained, and since the first positive sample set in the first training sample comprises customer information of any product purchased, and the first negative sample set comprises customer information of any product not purchased, the prediction result of the first intermediate model can represent customer information meeting the condition of the customer information of any product purchased in the customer information set, and preliminary potential customer mining is conducted.

Based on the prediction result of the first intermediate model and the product clustering information, a second training sample set is obtained, the first intermediate model is trained based on the second training sample set, and the second intermediate model is obtained, so that the prediction result of the second intermediate model is obtained.

And obtaining a third training sample set based on the prediction result of the second intermediate model, training the second intermediate model based on the third training sample set to obtain a target potential customer mining model, wherein the positive sample set in the third training sample set comprises customer information and product information of all products purchased, and the third negative sample set comprises customers and product information of all products not purchased, so that the final prediction result of the target potential customer mining model can represent the information of potential customers of all products, and further mining is performed on potential customers of all products to obtain the potential customer information of each product.

The target potential passenger mining model is obtained through training of an initial mining model, a first intermediate model and a second intermediate model, wherein training is conducted by utilizing the first sample set and prediction is conducted by utilizing the first intermediate model obtained through training, the target potential passenger mining model can be regarded as an integral coarse row, generating the second sample set based on the prediction result of the first intermediate model and prediction is conducted by utilizing the second intermediate model can be regarded as a large class coarse row, generating the third sample set based on the prediction result of the second intermediate model and prediction is conducted by utilizing the target potential passenger model can be regarded as a product fine row, and the range of the passenger group can be reduced stepwise through the three progressive steps of the integral coarse row, the large class coarse row and the product fine row, so that calculation resources are reduced to the greatest extent on the premise that prediction accuracy is not lost basically, potential passenger information of each product is obtained, accuracy is improved, and meanwhile calculation burden is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an application scenario schematic diagram of a method for mining a diver according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a method for mining a diver according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for obtaining a first training sample set according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for obtaining a second training sample set according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for obtaining information of a plurality of target clients according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for obtaining a second training sample set according to an embodiment of the present application;

FIG. 7 is a flowchart of a method for obtaining a third training sample set according to an embodiment of the present application;

FIG. 8 is a flowchart illustrating a method for obtaining a third training sample set according to an embodiment of the present application;

fig. 9 is a schematic structural view of a device for mining a diver according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.

In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.

Currently, potential customers of an enterprise are commonly mined in two ways. One way is to use a generic smart model only to mine the potential customers in general and apply to all products without distinguishing between the specific products. Another way is to divide the guest groups by using business rules, and recommending products according to the divided guest groups. For example, a product is divided into topics, and clients meeting rules are artificially designated for each topic as a guest group of the topic. Both of these methods have a problem of insufficient accuracy.

To improve accuracy, a generic directed diver mining model may be built to mine divers for each product. But this approach requires significant computational costs. For example, the need to mine for millions of customers and thousands of products, hundreds of thousands of computations per forecast, and limited computational resources may render the method infeasible.

Based on the problems, the method and the system can generate a first training sample set according to the historical product purchase records and the customer information set, train the initial mining model based on the first training template to obtain a first intermediate model, and obtain a second training sample set based on the prediction result of the first intermediate model and the product clustering information. And training the first intermediate model based on the second training sample set to obtain a second intermediate model, and obtaining a third training sample set based on a prediction result of the second intermediate model. Training the second intermediate model based on the third training sample set to obtain a target potential passenger mining model, and then inputting the third positive sample set into the target potential passenger mining model for prediction to obtain potential passenger information. According to the method, model training is carried out for multiple times according to different samples, and a final target diving passenger mining model is obtained, so that diving passenger information is obtained, accuracy is improved, and meanwhile, calculation difficulty is reduced.

Next, an application scenario of the method for mining a diver in the present application will be described. Referring to fig. 1, a database may be used to store customer information and product information. When performing the diver mining, the server may obtain customer information and product information from the database, thereby generating a first training sample set. And then the server trains the initial mining model based on the first training template to obtain a first intermediate model. And obtaining a second training sample set based on the prediction result of the first intermediate model and product clustering information obtained from the product information obtained from the database. And then the server trains the first intermediate model based on the second training sample set to obtain a second intermediate model, and obtains a third training sample set based on a prediction result of the second intermediate model. Training the second intermediate model based on the third training sample set to obtain a target potential passenger mining model, and then inputting the third positive sample set into the target potential passenger mining model for prediction to obtain potential passenger information.

The following describes the diver mining method with reference to fig. 2:

s201, generating a first training sample set according to historical product purchase records and a client information set, wherein the first training sample set comprises: the system comprises a first positive sample set and a first negative sample set, wherein the first positive sample set comprises customer information of any product purchased, and the first negative sample set comprises customer information of any product not purchased.

Optionally, the client information set may include massive client information, where the client information may include client basic information, client asset information, transaction holding information, and quotation information.

By way of example, the customer base information may include customer gender, customer age, location province, city, affiliated branch, business department, duration of opening an account, customer type, customer level, etc., and the customer base information is used to represent the customer's base attributes.

The customer asset information may include customer equity, asset balance, held product market value, on-site product market value, off-site product market value, held market value for different types of products, etc., and is used to characterize the customer's purchasing capabilities and product preferences, etc.

The trading in-house information may include information on customer history trades and held products, stocks, funds. For example, the product may be an ETF product, and taking transaction information of the ETF product as an example, the transaction information includes the number of times the customer purchases the ETF product in the last half year, average amount, average risk level, standard deviation, number of times the ETF purchase of each product type, number of times the ETF purchase of each asset type, average yield of similar funds, average value of the summer ratio of the product, average value of the maximum withdrawal rate of the product, average value of the alpha index of the product, average value of the beta index of the product, average value of net value of the unit of the product, and the like. The transaction in-house information is used to characterize the customer's product preferences and risk preference information.

It should be noted that the customer information may also include relevant information such as customer stocks, funds, and customer holding data.

Optionally, according to the historical product purchase record, the client information of all clients who have purchased any product in the client information set is used as a first positive sample set, and the client information of all clients who have not purchased any product in the historical product purchase record is used as a first negative sample set.

S202, training an initial mining model based on a first training sample set to obtain a first intermediate model, and obtaining a second training sample set based on a prediction result of the first intermediate model and product clustering information, wherein the second training sample set comprises: the system comprises a second positive sample set and a second negative sample set, wherein the second positive sample set comprises customer information of each product class purchased, and the second negative sample set comprises customer information of each product class not purchased.

Optionally, before training the initial mining model based on the first training sample set, the client information in the first training sample set may also be processed through feature engineering. By way of example, feature engineering may include hole value filling, discrete variable one-hot (one-hot) encoding, and the like. The gap value filling can fill the condition that part of characteristics are lost in the client information, and the stability of training results is ensured. The discrete variable one-hot coding can conduct numerical processing on discrete client information, and accuracy of training results is guaranteed.

Alternatively, the initial mining model may be a random forest model. The advantages of high accuracy and parallelism of the random forest model are utilized, the first training sample set is input into the random forest model for training to obtain a first intermediate model, the first intermediate model can sort probabilities from high to low by obtaining probability of each customer purchasing any product, and a prediction result of the first intermediate model can indicate a preset number of customers to screen before sorting to obtain a second training sample set. By way of example, the prediction results of the first intermediate model may indicate 50 ten thousand clients.

Optionally, the first intermediate model may be subjected to parameter adjustment through a random grid, so as to change the depth, the maximum feature number, the minimum number of samples of the leaf nodes, and the like of the first intermediate model, thereby reducing the computational complexity of the first intermediate model.

Optionally, based on the prediction result and the product clustering information in the first intermediate model, combining the client information to obtain a second training sample.

Alternatively, the product cluster information may be obtained by classifying a plurality of products. In particular, since some products have strong similarity among a plurality of products, the products having strong similarity can be classified into one product class, and the products in the product class have a certain commonality.

S203, training the first intermediate model based on the second training sample set to obtain a second intermediate model, and obtaining a third training sample set based on a prediction result of the second intermediate model, wherein the third training sample set comprises: a third positive sample set including customer information and product information for each product purchased, and a third negative sample set including customer and product information for each product not purchased.

Optionally, the first intermediate model can be used as a second intermediate model after training, and correspondingly, the second intermediate model can also be a random forest model. The second intermediate model can obtain the probability of each customer purchasing any product in each product class, order the probability from high to low, obtain the prediction result of the second intermediate model according to the probability of each customer purchasing each product class, and then determine the customer information of the potential customers of each product in each product class according to the prediction result.

Optionally, a third training sample set is determined based on the prediction result of the second intermediate model, where the third training sample set includes a third positive sample set and a third negative sample set, and the third positive sample set includes customer information and product information of each product purchased, and it is noted that the customer information and the product information of each product in the third positive sample set may refer to a customer-product pair having a transaction record in a preset time. For example, the preset time may be three months. The third negative set of samples includes a customer-product pair for each customer in the set of customer information with any one of the non-purchased products.

S204, training the second intermediate model based on the third training sample set to obtain a target potential passenger mining model.

Alternatively, a new customer-product pair may be obtained by performing an inner product calculation of the plurality of customer information and the corresponding product information in the prediction result of the second intermediate model, the new customer-product pair being capable of characterizing the relationship between the customer and the product. The client-product pair can be used as information in a third training sample set to train the second intermediate model, and accuracy of a model output result is improved.

Alternatively, the target diver mining model may be a random forest model.

S205, inputting the third positive sample set into a target potential passenger mining model for prediction to obtain potential passenger information.

Optionally, before the third positive sample set is input into the target potential customer mining model for prediction, customer information in the third positive sample set may be screened, features related to the product may be retained, and features unrelated to the product may be removed. And inputting the third positive sample set after screening into a target potential passenger mining model.

Optionally, the target potential-customer mining model may sort the probabilities from high to low by acquiring the probability of each customer purchasing each product, determine the prediction result of the target potential-customer mining model according to a preset number of customers before sorting, and use the customer information in the prediction result as the potential-customer information. By way of example, the customer information of the first 1 ten thousand customers may be ranked as customer information included in the prediction result of the target potential customer mining model.

In this embodiment, according to the historical product purchase record and the customer information set, a first training sample set is generated, and the initial mining model is trained based on the first training sample set to obtain a first intermediate model, so as to obtain a prediction result of the first intermediate model.

The following describes in detail the generation of the first training sample set according to the historical product purchase record and the customer information set in the step S201 with reference to fig. 3:

s301, determining the purchased clients according to the historical product purchase records.

Alternatively, historical product purchase records may be obtained from a database. Specifically, if any product is purchased by the customer, the corresponding identifiers of the customer and the product are in the historical product purchase record, and if any product is not purchased by the customer, the corresponding identifiers of the customer and the product are not stored in the historical product purchase record, or the product item corresponding to the customer is empty.

S302, according to the identification of the purchased customers, the customer information of the purchased customers is read from the customer information set, and the customer information of the purchased customers is added to the first positive sample set.

Optionally, according to the identifiers corresponding to the customers and the products in the historical product purchase records, the customer information of the purchased customers is read from the customer information set. Wherein the set of customer information contains information for all customers in the database. Illustratively, if potential customers are mined for various products of the A enterprise, the customer information set includes customer information for all customers of the A enterprise.

S303, reading the customer information which does not belong to the purchased customer from the customer information set to obtain the customer information which is not purchased to be selected.

Optionally, the corresponding identifiers of the clients and the products are not stored in the historical product purchase records, or the client information that the product items corresponding to the clients are empty is selected, so that the client information which is not purchased to be selected is obtained.

S304, undersampling treatment is carried out on the information of the to-be-selected non-purchased clients, and a first negative sample set is obtained.

Alternatively, the candidate non-purchased customer information may far exceed the first positive sample set, as customer information for any product not purchased may occur far beyond customer information for any product purchased. At this time, undersampling processing is required to be performed on the information of the non-purchased clients to be selected, so as to obtain a first negative sample set. In particular, the undersampling process may achieve class distribution of multiple sets of classified data sets by eliminating portions of the data. For example, the first positive sample set includes 100 pieces of customer information, and if the to-be-selected non-purchased customer information is 10000 pieces of customer information, undersampling processing is required to be performed on the to-be-selected non-purchased customer information, so that the to-be-selected non-purchased customer information is 1000 pieces, the proportion of the first positive sample set to the first negative sample set is guaranteed to be reasonable, and the calculation workload is reduced.

In this embodiment, a purchased customer is determined according to a historical product purchase record, customer information of the purchased customer is read from a customer information set according to an identifier of the purchased customer, the customer information of the purchased customer is added to a first positive sample set, customer information which does not belong to the purchased customer is read from the customer information set, non-purchased customer information to be selected is obtained, undersampling processing is performed on the non-purchased customer information to be selected, and a first negative sample set is obtained, so that a reasonable ratio of the first positive sample set to the first negative sample set is ensured, and calculation workload is reduced.

After describing how to obtain the first positive sample set and the first negative sample set, referring to fig. 4, a second training sample set is obtained based on the prediction result of the first intermediate model and the product clustering information in step S202, and description is made:

s401, inputting information of each customer in the first positive sample set into a first intermediate model to obtain a prediction result of the first intermediate model, wherein the prediction result of the first intermediate model is used for indicating the probability that each customer purchases any product.

Alternatively, each customer information in the first positive sample set may include customer information satisfying any of the products purchased.

Alternatively, the predictive outcome of the first intermediate model may indicate the probability that each customer purchased any one of the products. Illustratively, the probability of a customer purchasing any product is 90%, the probability of B customer purchasing any product is 80%, the probability of C customer purchasing any product is 40%, and the probability of D customer purchasing any product is 60%.

S402, determining a plurality of target client information in the client information set according to the prediction result of the first intermediate model.

Optionally, according to the prediction result of the first intermediate model, a preset number of customers with high probability of purchasing any product in each customer is used as potential customers who may purchase the product. Illustratively, referring to the example in step S401 above, if 2 potential customers need to be acquired, the target customers may be a-customers and B-customers.

Alternatively, the higher the preset number, the more complex the calculation, and the lower the preset number, the more relaxed the calculation.

S403, clustering the products in the product set to obtain a plurality of product classes.

Optionally, the products are clustered according to the similarity of each product in the product set, so as to obtain a plurality of product classes.

As one possible implementation, the products may be clustered using the K-means clustering method. According to the K-means clustering method, a partitioning scheme of K clusters (Cluster) is searched for through iteration according to the internal relation in each product information, so that a loss function corresponding to a clustering result is minimum, and the clustering result comprises all products in one product class.

For example, the K-means clustering method may divide products according to the following features in each product information: product risk level, asset type, average rate of return (day/month/year), recent year rate of return standard deviation, recent year like average rate of return, recent year average rate of summer, recent year average net value, management rate, release time, time to market, etc.

S404, obtaining a second training sample set according to the plurality of target client information and the plurality of product classes.

Optionally, the customer information of the customers who purchase any one of the product classes is distinguished as the second positive sample set of the second training sample set according to the customer information of the customers who purchase the product with a high probability and the plurality of product classes. Customer information of customers who have not purchased the product is taken as a second negative sample set.

Optionally, whether the customer purchases the product class or not may be determined according to the transaction record of the historical product of the customer in the customer information, the customer information of each product class purchased is used as a second positive sample set, and the customer information of each product class not purchased is used as a second negative sample set.

Illustratively, a customer purchases a product class, does not purchase B product class and C product class, B customer purchases a product class and C product class, does not purchase B product class, C customer purchases C product class, does not purchase a product class and B product class. Thus, the second positive sample set comprises: the product class a comprises a client A and a client B, the product class B does not comprise a client, and the product class C comprises a client B and a client C. The second negative set of samples may include: the product class a comprises a C client, the product class B comprises an A client, a B client and a C client, and the product class C comprises an A client.

It should be noted that, the customer information in the second negative sample set may be far more than the customer information in the second positive sample set, and the undersampling process may be performed on the second negative sample set, so that the second negative sample set and the second positive sample set are balanced.

In this embodiment, the prediction result of the first intermediate model is obtained by inputting the client information in the first positive sample set into the first intermediate model, determining a plurality of target client information in the client information set according to the prediction result of the first intermediate model, clustering the products in the product set to obtain a plurality of product classes, and obtaining the second training sample set according to the plurality of target client information and the plurality of product classes, thereby realizing further mining of potential clients who may purchase any product, and obtaining potential clients who may purchase each product class, and further improving accuracy while reducing calculation load.

The following describes in detail the determination of the plurality of target client information in the client information set according to the prediction result of the first intermediate model in step S402 with reference to fig. 5:

s501, sorting all the client information in the client information set according to the probability indicated by the prediction result to obtain a client information sequence.

Optionally, sorting the client information of each client in the client information set according to the probability that each client purchases any product indicated by the prediction result of the first intermediate model, so as to obtain a client information sequence.

Alternatively, the customers may be ordered from high to low according to the probability indicated by their predicted results, with the leading customer being more likely to purchase the product in the resulting customer information sequence.

S502, screening first preset quantity of client information in the client information sequence as a plurality of target client information according to the client information sequence in the client information sequence.

Alternatively, the larger the first preset number, the more target customer information, the more potential customers are initially screened out, but the higher the model calculation amount. The smaller the first preset data is, the less target client information is, the fewer potential clients which are screened out in a preliminary way are, and the smaller the calculated amount of the model is.

Illustratively, with reference to the example in step S401, the first preset number is set to 3, and the target clients may be a client, B client, and D client.

In this embodiment, the client information in the client information set is sequenced according to the probability indicated by the prediction result to obtain a client information sequence, and then a first preset number of client information in the client information sequence is screened as a plurality of target client information according to the client information sequence in the client information sequence, and small-order target client information is screened from the client information set with larger order, so that the target client has a large probability of purchasing a product, the burden is reduced for subsequent calculation, and the screening accuracy is ensured.

Next, referring to fig. 6, in the step S404, a second training sample set is obtained according to the plurality of target client information and the plurality of product classes, and description is made:

s601, determining at least one product class corresponding to each client according to the information of each target client.

Optionally, determining a product class to which at least one product historically purchased by each client belongs according to the transaction records of the historical products of the clients in the information of each target client.

S602, generating a second positive sample set according to the target client information and the product classes corresponding to the clients.

Optionally, the second positive sample set may or may not include the same customer in multiple product classes, where the product class is empty.

S603, negative client information which does not belong to the target client information is read from the client information set.

Alternatively, the customer information that has not purchased the product class may be used as negative customer information based on the transaction record of the historical product in the customer information set, or the customer information that does not belong to the target customer information in the customer information set may be selected as negative customer information.

S604, generating a second negative sample set according to the negative client information.

Optionally, all negative customer information is aggregated to obtain a second negative sample set. In the second negative set, multiple product classes may or may not include the same customer, where the product class is empty.

In this embodiment, at least one product class corresponding to each customer is determined according to each target customer information, a second positive sample set is generated according to each target customer information and each product class corresponding to each customer, the second positive sample set includes customer information of purchased each product class, then negative customer information which does not belong to the target customer information is read from the customer information set, and a second negative sample set is generated according to the negative customer information, wherein the second negative sample set includes customer information of not purchased each product class, so that a range of potential customers corresponding to each product class is determined, a foundation is laid for future potential customer excavation for each product class, and prediction accuracy is improved.

Next, a third training sample set is obtained based on the prediction result of the second intermediate model in step S203 described above with reference to fig. 7, and is explained:

s701, inputting the information of each customer in the second positive sample set into a second intermediate model to obtain a prediction result of the second intermediate model, wherein the prediction result of the second intermediate model is used for indicating the probability that each customer purchases any one of the products.

Alternatively, each customer information in the second positive sample set may include customer information satisfying any of the products in each of the product categories purchased.

Alternatively, the predictions of the second intermediate model may indicate the probability that each customer purchased each product class.

Illustratively, the probability that E customer purchases product class a is 20%, the probability that F customer purchases product class a is 35%, and the probability that G customer purchases product class a is 26%.

S702, determining a second preset number of customer information to be selected according to the prediction result of the second intermediate model.

Optionally, the probabilities indicated by the prediction results of the second intermediate model are ranked according to the prediction results of the second intermediate model, and in each product class, a second preset number of clients with high probability are used as potential clients, and client information of the potential clients is used as candidate client information.

Illustratively, if the second preset number is 2, then the potential customers in product class a are F customers and G customers.

S703, obtaining a third training sample set according to the information of each client to be selected.

Optionally, according to the historical transaction record in each piece of customer information to be selected, judging whether a record for purchasing the product in each product class exists in each piece of customer information to be selected, thereby obtaining a third training sample set.

In this embodiment, the prediction result of the second intermediate model is obtained by inputting the client information in the second positive sample set into the second intermediate model, where the prediction result of the second intermediate model is used to indicate the probability that each client purchases any one of the products, and the second preset number of client information to be selected is determined according to the prediction result of the second intermediate model, and the third training sample set is obtained according to the client information to be selected, so that the potential client of each product is determined according to the probability that each client purchases each product, and the accuracy of the submerged passenger mining is further improved.

In the following, a third training sample set is obtained according to each piece of customer information to be selected in step S703 with reference to fig. 8, which is described as follows:

s801, determining purchased products corresponding to the customer information to be selected according to the customer information to be selected and the historical product purchase records.

Alternatively, the customer information to be selected includes customer information of customers who may have purchased any of the various product categories.

S802, adding the purchased products corresponding to the customer information to be selected to a third positive sample set as one positive sample.

Optionally, each piece of customer information to be selected and the purchased product corresponding to each piece of customer information to be selected are taken as a customer-product pair, the customer-product pair is a positive sample, the customer information to be selected and the purchased product thereof in each product class are formed into a customer-product pair, and the customer-product pair and the purchased product thereof form a third positive sample set.

Illustratively, the candidate clients corresponding to the product class a include a P client, a Q client and an I client, and the product class a includes an X product, a Y product and a Z product. P customers purchased X products, Q customers purchased Y products, and I customers purchased Z products. Three positive samples may be generated, P client-X product, Q client-Y product, and I client-Z product, respectively. And taking the three positive samples as positive sample sets of the product class a, and adding the positive sample sets of all the product classes to form a third positive sample set.

S803, reading the client information which does not belong to the client information to be selected from the client information set, and generating a third negative sample set according to the client information which does not belong to the client information to be selected.

Alternatively, a product that has not been purchased may be randomly selected from each customer in the customer information set that does not belong to the customer information to be selected, and a customer-product pair may be formed as a negative sample, thereby generating a third negative sample set.

It should be noted that, since there may be far more negative samples than positive samples in the customer information set, the negative samples may be undersampled, and a third negative sample set may be generated according to the processed negative samples.

In this embodiment, according to each piece of customer information to be selected and the historical product purchase record, the purchased product corresponding to each piece of customer information to be selected is determined, each piece of customer information to be selected and the purchased product corresponding to each piece of customer information to be selected are added as a positive sample to a third positive sample set, then the customer information which does not belong to the customer information to be selected is read from the customer information set, and a third negative sample set is generated according to the customer information which does not belong to the customer information to be selected, so as to obtain a third training sample set, and a foundation is laid for training a second intermediate model, so that potential customers corresponding to each product are obtained.

Based on the same inventive concept, the embodiment of the application also provides a diving passenger excavating device corresponding to the diving passenger excavating method, and because the principle of solving the problem by the device in the embodiment of the application is similar to that of the diving passenger excavating method in the embodiment of the application, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.

Referring to fig. 9, a schematic diagram of a device for mining a diver according to the present application is provided, where the device includes: a generation module 901, a first training module 902, a second training module 903, a third training model 904, and a prediction module 905; wherein:

the generating module 901 is configured to generate a first training sample set according to a historical product purchase record and a customer information set, where the first training sample set includes: a first positive sample set including customer information of any product purchased, and a first negative sample set including customer information of any product not purchased;

the first training module 902 is configured to train the initial mining model based on the first training sample set to obtain a first intermediate model, and obtain a second training sample set based on a prediction result of the first intermediate model and product cluster information, where the second training sample set includes: a second positive sample set including customer information of each product class purchased, and a second negative sample set including customer information of each product class not purchased;

The second training module 903 is configured to train the first intermediate model based on the second training sample set to obtain a second intermediate model, and obtain a third training sample set based on a prediction result of the second intermediate model, where the third training sample set includes: a third positive sample set including customer information and product information of each product purchased, and a third negative sample set including customer and product information of each product not purchased;

a third training module 904, configured to train the second intermediate model based on the third training sample set, to obtain a target diver mining model;

and a prediction module 905, configured to input the third positive sample set into the target potential passenger mining model to perform prediction, so as to obtain potential passenger information.

The generating module 901 is specifically configured to:

The first training module 902 is specifically configured to:

The second training module 903 is specifically configured to:

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

The embodiment of the application also provides an electronic device, as shown in fig. 10, which is a schematic structural diagram of the electronic device provided by the embodiment of the application, and includes: the processor 1001, the memory 1002, and optionally, may also include a bus. The memory 1002 stores machine-readable instructions executable by the processor 1001 (e.g., execution instructions corresponding to the generation module 901, the first training module 902, the second training module 903, the third training module 904, and the prediction module 905 in the apparatus of fig. 9, etc.), and when the electronic device is running, the processor 1001 communicates with the memory 1002 through a bus, and the machine-readable instructions are executed by the processor 1001 to perform the aforementioned method of mining a diver.

The embodiment of the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by the processor 1001, performs the steps of the aforementioned method for mining a potential customer.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, and are not repeated in the present disclosure. In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application.

Claims

1. A method of mining a diver comprising:

2. The method of claim 1, wherein generating a first training sample set from the historical product purchase record and the set of customer information comprises:

3. The method of claim 1, wherein the obtaining a second training sample set based on the prediction result of the first intermediate model and product cluster information comprises:

4. A method according to claim 3, wherein said determining a plurality of target customer information in said set of customer information based on the prediction of said first intermediate model comprises:

5. The method of claim 3, wherein the obtaining the second training sample set based on the plurality of target customer information and the plurality of product classes comprises:

6. The method according to claim 1, wherein the deriving a third training sample set based on the prediction result of the second intermediate model comprises:

7. The method of claim 6, wherein the obtaining the third training sample set according to each candidate client information comprises:

8. A diver excavation apparatus, comprising:

9. An electronic device, comprising: a processor and a memory storing machine readable instructions executable by the processor to perform the steps of the diver mining method according to any of claims 1 to 7 when the electronic device is running.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the diver mining method according to any of claims 1 to 7.