CN115357802B

CN115357802B - Multi-business-state enterprise potential customer identification method

Info

Publication number: CN115357802B
Application number: CN202211279020.8A
Authority: CN
Inventors: 王强; 娄海凤; 王文雯
Original assignee: Shandong Commercial Group Co ltd
Current assignee: Shandong Commercial Group Co ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-04-07
Anticipated expiration: 2042-10-19
Also published as: CN115357802A

Abstract

The invention provides a method for identifying potential customers of a multi-state enterprise, which belongs to the technical field of data processing and comprises the following steps: performing data fusion on the member data of each business state; identifying fusion members of potential values of target states based on a machine learning algorithm technology; predicting the potential value of a single member in the target state based on deep learning; establishing a recommendation model, and sending recommendation information to the members in the target business state according to the value levels of the members; updating the value level of the member in the target state according to the feedback content and the feedback quantity of the user aiming at the recommendation information; and analyzing the content similarity of the fed back pieces of recommendation information according to the value grade change condition. The invention can identify the feasibility method of the vision of the potential value members of the target state, and help the industry to carry out more accurate member marketing.

Description

Multi-state enterprise potential customer identification method

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a method for identifying potential customers of multi-business enterprises.

Background

Along with the high-speed development of the internet information technology, the processing capacity of a computer is greatly improved, the data storage cost is greatly reduced, an enterprise can collect and store a large amount of data information of a client more easily, and meanwhile, the more abundant portrait data of the client can be calculated and stored. In order to achieve the aims of increasing the input-output ratio of enterprises and improving the satisfaction degree of customers, value analysis is carried out on the portrait information based on the customer consumption behaviors and the like, and then targeted and accurate marketing is carried out, so that the method is the most effective implementation way. However, currently, for a certain business-state customer group, the customers can be subdivided according to the label information of the value attribute class, so that a higher-value customer group and a lower-value customer group are identified. At present, for some enterprises which have stably stood in the market heel, the joint operation of multiple business state products becomes an important way for enlarging the asset scale, and for a converged member group, for example, a plurality of business state converged member groups in the same enterprise, how to judge whether a client has a potential value by means of the value performance of the client in other business states becomes a problem of the current key research.

Disclosure of Invention

In view of the above-mentioned deficiencies of the prior art, the present invention provides a method for identifying potential customers of a multi-business enterprise to solve the above-mentioned technical problems.

The invention provides a method for identifying potential customers of a multi-business enterprise, which comprises the following steps:

carrying out data fusion on the member data of each business state, comprising the following steps: performing data fusion on the member data of each business state by using a user unique identification technology, and determining fusion members which coexist in any two business states at least and a single member which exists in any one business state only; realizing label unification by using a text recognition technology, and constructing a full-state member all-dimensional picture according to a member label of a converged member;

identifying fusion members of potential values of target states based on a machine learning algorithm technology comprises the following steps: according to the omnibearing image of the fusion member, a value grade is distributed to the fusion member by adopting a K-Means clustering algorithm; determining fusion members with potential values in the target business state according to the value grades of the distributed fusion members;

predicting the potential value of a single member in a target state based on deep learning, comprising:

taking a fused member group as a training set and a test set, taking a member label of a target state as a characteristic variable, taking a value grade of a fused member in the target state as a response variable, establishing a multi-classification prediction model on the training set based on stacking, using XGboost, lightGBM and GDBT algorithms, carrying out modeling, fitting and prediction by using the training set, completing a first layer model, taking a prediction result of the first layer model as an input characteristic variable, and taking the prediction result of the first layer model as a prediction quantity of the value grade of the target state; the value grade is used as a response variable, a Bayes classifier is used as a meta classifier, and the Bayes classifier is trained to obtain a complete multi-classification prediction model; verifying the built prediction model on the test set, applying the prediction model passing the verification to a single member in the target state, and predicting the value level of the single member in the target state by taking the value level of the single member in the target state as a response variable and the member label in the target state as a characteristic variable;

establishing a recommendation model, and sending recommendation information to the members in the target business state according to the value levels of the members; updating the value level of the member in the target state according to the feedback content and the feedback quantity of the user aiming at the recommendation information; and analyzing the similarity of the fed back pieces of recommendation information according to the value grade change condition, and updating the member all-round portrait according to the similar recommendation information.

Further, utilize text recognition technology to realize that the label is unified, construct full-state member all-round portraits according to the member label that fuses the member, include:

constructing an all-round picture of a member based on all-state data, and setting a member label to comprise a fact label and an analysis label, wherein the fact label comprises an attribute label and a behavior label, and the attribute label comprises gender and age; the behavior label comprises consumption frequency and guest unit price; the analysis class label comprises category preference and activity sensitivity;

and performing text recognition on a database storing member information in each business state, acquiring content conforming to the member label in a character feature extraction mode, and filling the content into the corresponding member label.

Further, the assigning the value grade to the fusion member by using the K-Means clustering algorithm includes:

step 1, acquiring Data set Data of any one state ₁ The data is normalized by a normalization formula, wherein the normalization formula is as follows:

；

wherein X is Data ₁ Index value of any one of t tagsN is Data set Data ₁ The number of the middle samples is m after normalization processing ₁ ,m ₂ ,…,m _n Wherein m is _i =(m _i1 ,m _i2 ,...,m _it )；

Step 2, acquiring the number K of preset clustering categories;

step 3, randomly selecting a sample point as a first clustering center c ₁ The value is denoted as m ₁ Calculating the first cluster center c ₁ Euclidean distances to the remaining n-1 sample points:

；

selecting the sample point corresponding to the maximum distance from the n-1 distances

The corresponding sample point is used as a second clustering center c ₂ I.e. is->

The value is denoted as m ₂ Calculating c ₂ And n-2 samples m ₃ ,…,m _n Euclidean distance D (c) ₂ ,m _i ),i=3,4,...,n；

Compare m separately ₃ ,…,m _n To c ₁ And C ₂ Distance of (1), select m ₃ ,…,m _n To C ₁ And C ₂ Maximum value of the minimum distance of

The corresponding sample point is used as a third clustering center c ₃ And the steps are carried out in sequence until k clustering centers { c ] are selected ₁ ,c ₂ ,…,c _k Until it is reached, and the class corresponding to the k cluster centers is marked as { C } ₁ ,C ₂ ,…,C _k }；

Step 4, respectively calculating Euclidean distances D (c) between each sample point and k clustering centers _k ,m _i ) Pointing the sampleInto the category to which the minimum distance corresponds, i.e. into

Of the corresponding classes, the reassigned class is still marked as { C ₁ ,C ₂ ,…,C _k At this point, the number of samples in each class is denoted as { n } ₁ ,n ₂ ,…,n _n }；

Step 5, calculating the average value of all sample points in each category

And it is taken as a new cluster center, and still marked as { c ₁ ,c ₂ ,…,c _k }；

Step 6, giving a tolerance threshold epsilon, and calculating a cost function value:

；

wherein m is _i ^(k) The superscript (k) of (2) represents the kth class, the reduction amount of the cost function value of two iterations is calculated, if the reduction amount is lower than a tolerance threshold epsilon, the algorithm is judged to be converged, and the clustering algorithm is ended; if the tolerance threshold epsilon is not lower than the tolerance threshold epsilon, the algorithm is not converged, the step 4 is carried out for cycle iteration until the clustering algorithm is converged, and finally the { C is obtained ₁ ,C ₂ ,…,C _k As k fused membership value ranks.

Further, the establishing of the recommendation model, sending recommendation information to the member in the target business state according to the value rating of the member, includes:

establishing a recommendation model, and determining a sending channel and a sending cycle of recommendation information according to the value grade of members in a target state, wherein the recommendation model is S = W O + U P, S is the value grade, W is the weight value of a preset sending channel, and U is the weight value of the sending cycle; o is a preset value of an output sending channel, P is a preset value corresponding to an output sending period, and O and P are positive integers;

and obtaining O and P of at least one combined solution, and selecting one group with the minimum difference between O and P from the multiple combined solutions as final O and P for output.

Further, the updating the value level of the member in the target business state according to the feedback content and the feedback quantity of the user for the recommendation information includes:

setting at least one feedback level R for the feedback content, and setting a mark a of the feedback level R, wherein the mark a is +1 to indicate that the recommendation is received, and the mark a is-1 to indicate that the recommendation is not received;

periodically receiving feedback content and feedback times F of the user aiming at the recommended information in the previous period, determining a feedback grade R according to the feedback content, and inputting a feedback model: t = [ (a × R × F)/e ], where [ ] represents an integer function, e is a rating change threshold, and T is the amount of change in the value rating of the user in the target state for the period.

Further, the analyzing the content similarity of the fed back pieces of recommendation information according to the value level change condition includes:

recording the value grade variation, and analyzing the target recommendation information of which the variation or the variation times exceeds a preset range;

the content similarity comprises the similarity of a title of the recommendation information based on an n-gram language model and one-hot coding and the similarity of the content of the recommendation information based on an LSTM model for part-of-speech analysis;

the title similarity calculation method comprises the following steps: performing text segmentation on the title of the target recommendation information through an n-gram language model based on the short text characteristic of the title of the target recommendation information, and obtaining a sparse word vector based on a one-hot coding semantic dictionary; calculating the title similarity of the target recommendation information by calculating the cosine similarity between the word vectors;

utilizing the part of speech of a syntactic analysis tool to segment terms in the content of the target recommendation information, generating term vectors of the segmented terms and the syntactic positions thereof based on a one-hot coding semantic dictionary, inputting the term vectors into an LSTM model, and judging the part of speech of the term vectors by the LSTM model; and generating word sense vectors according to the word item vectors and the part-of-speech judging results thereof, and calculating the similarity between the word sense vectors as the result of the content similarity.

Further, the data fusion of the member data of each business state by using the user unique identification technology includes: and performing ID-Mapping by using an oneid technology based on business entity mobile phone numbers, identity card numbers, mailbox addresses and mobile PC (personal computer) end equipment IDs of all business states by combining business rules, machine learning and graph calculation algorithms, mapping all business state unique IDs to a unified ID, and associating data of all data islands through the unified ID.

The invention has the advantages that aiming at the enterprise management and operation mode of multi-state independent operation, the invention firstly utilizes the unique user identification technology to break a data island, realizes the fusion of multi-industry data and constructs the member omnibearing portrait by each state member label. Secondly, identifying or predicting potential value members of the target business state based on a machine learning algorithm technology, wherein the identification process is divided into two aspects: on one hand, classifying the members in all the business states by adopting a clustering algorithm based on the value grades of the fusion members, and screening out the members with potential values for the target business states according to the value performances of the members in all the business states for the fusion member group; and on the other hand, a multi-classification prediction model is established, the value grade of the fusion member in the target state is used as a response variable, the member labels of the fusion member in other states are used as characteristic variables, and the member grade of the target state is predicted. Therefore, the feasibility method capable of identifying the vision of the potential value members of the target business state is provided for the industry, and the industry is helped to carry out more accurate member marketing. In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Drawings

In order to more clearly illustrate the embodiments or prior art solutions of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of a method of one embodiment of the present invention;

FIG. 2 is a schematic diagram of a distribution of value classes for one embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention takes retail, tourism and hotel industries as examples, and explains the technical scheme provided by the embodiment of the invention.

As shown in fig. 1, an embodiment of the present invention provides a method for identifying a potential customer of a multi-business enterprise, including:

s1, performing data fusion on member data of each business state.

And S2, identifying the fusion members with the potential values of the target states based on a machine learning algorithm technology.

And S3, predicting the potential value of a single member in the target state based on deep learning.

And S4, establishing a recommendation model, and sending recommendation information to the members in the target state according to the value grades of the members.

And S5, updating the value level of the member in the target state according to the feedback content and the feedback quantity of the user aiming at the recommendation information.

And S6, analyzing the similarity of the fed back pieces of recommendation information according to the value grade change condition, and updating the member all-dimensional portrait according to the similar recommendation information.

Optionally, as an embodiment of the present invention, S1 includes:

the method specifically comprises the following steps: performing data fusion on the member data of each business state by using a user unique identification technology, and determining fusion members which coexist in any two business states at least and a single member which exists in any one business state only; realizing label unification by using a text recognition technology, and constructing all-round pictures of all-state members according to member labels of the converged members;

the method for fusing the data of the members in all the business states by using the unique user identification technology comprises the following steps: and performing ID-Mapping by using an oneid technology based on business entity mobile phone numbers, identity card numbers, mailbox addresses and mobile PC (personal computer) end equipment IDs of all business states by combining business rules, machine learning and graph calculation algorithms, mapping all business state unique IDs to a unified ID, and associating data of all data islands through the unified ID.

For example: under the scenes of retail business states and tourism business states, through business system data exploration, unique queID of retail business state members comprises member names, member card numbers, identity card numbers, mobile phone numbers, base numbers, mailbox addresses, family addresses and license plate numbers; the unique ID of the travel business member comprises a member name, a member card number, a member profile number, an identity card number, a mobile phone number, a private plane number, an email address, a QQ number, a home address and a driving license. After the two business state data are clouded, the unified ID is obtained through calculation of an oneid module of the dataphin in the Ali data by combining the importance degree of each unique ID, and the retail data and the travel data can be associated through the unified ID, so that data association between the retail business state and the travel business state is realized, and a converged member group which is both a retail member and a travel member can be identified.

Optionally, as an embodiment of the present invention, the implementing label unification by using a text recognition technology, and constructing an all-dimensional member all-around portrait according to a member label of a converged member includes: constructing an all-round picture of a member based on all-state data, and setting a member label to comprise a fact label and an analysis label, wherein the fact label comprises an attribute label and a behavior label, and the attribute label comprises gender and age; the behavior label comprises consumption frequency and guest unit price; the analysis class label comprises category preference and activity sensitivity; and performing text recognition on a database storing member information in each business state, acquiring content conforming to the member label in a character feature extraction mode, and filling the content into the corresponding member label.

After each industry data is communicated, a member all-round portrait can be constructed based on all-industry data. S2 specifically comprises the following steps: according to the omnibearing image of the fusion member, a value grade is distributed to the fusion member by adopting a K-Means clustering algorithm; and determining the fusion members with potential values in the target business state according to the value grades of the distributed fusion members.

Generally, member tags have three types, the first type is a fact tag, which can be divided into two categories, the first category is an attribute tag such as gender and age, and the second category is a behavior tag such as consumption frequency and guest unit price; the second class is an analysis class label such as category preference and activity sensitivity; the third category is a predictive category label of the next purchase time and the goods that are likely to be purchased, etc. Since the aim of the invention is to identify members of potential value, research considers only tags describing objective fact classes, namely first class fact tags and second class analysis class tags. The final label determined is shown below:

attribute tag of retail fact tag: age Rt ₁ Gender Rt ₂ And conference channel Rt ₃ And conference mode Rt ₄ Time of conference Rt ₅ Member rank Rt ₆ 。

Behavior tag of retail fact tag: <xnotran> , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ( ), ( ), ( ), ( ), , , , . </xnotran>

Retail analysis type tag: the supermarket prefers the item class in the last month, the department goods prefers the item class in the last month, the supermarket prefers the item class in the last three months, the department goods prefers the item class in the last three months, the supermarket prefers the item class in the last six months, the department goods prefers the item class in the last six months, the supermarket prefers the item class in the last year and the department goods prefers the item class in the last year.

Attribute label of travel fact label: gender, age group, membership grade, time of conference, year of registration.

The travel fact label acts as a label: member active status, annual financial contribution level, annual incoming frequency, cumulative house evening number, number of critiques, number of bad critiques, number of complaints, whether APP has been logged in, member available stored value, member available points, member value (approximately 24 months), last consumption time interval, whether WeChat is concerned.

Travel analysis type label: brand preference, consumer price band preference, predetermined channel preference, consumer price zone preference, resident house type.

The scheme uses two industries Com ₁ And Com ₂ For example, for a converged member population existing in two industries, the member is in Com ₁ And Com ₂ Respectively performing value analysis to Com ₁ The member taxonomy category in is marked as L from bottom to top according to its value level ₁₁ 、L ₁₂ 、L ₁₃ 、L ₁₄ And L ₁₅ (ii) a Will Com ₂ The member taxonomy category in is marked as L from bottom to top according to its value level ₂₁ 、L ₂₂ 、L ₂₃ 、L ₂₄ And L ₂₅ 。

Whereby each individual in the converged member population can be at L ₁₁ 、L ₁₂ 、L ₁₃ 、L ₁₄ And L ₁₅ Find the corresponding first grade C ₁ Can also be at L ₂₁ 、L ₂₂ 、L ₂₃ 、L ₂₄ And L ₂₅ Find the second level l corresponding to ₂ . Finally by comparison C ₁ And C ₂ Identifies groups of members having potential value. E.g. if a member individual m ₁ C of (A) ₁ Value of L ₁₅ ，C ₂ Value of L ₂₁ Then C is ₁ > C ₂ Thus to Com ₂ In other words, m ₁ Is a member of its potential value, as shown in FIG. 2, L _12、 L _13、 L ₁₄ And L ₁₅ Respectively corresponding 10 member groups, namely Com ₂ Has potential value.

Following member in Com ₁ And Com ₂ The strategy of classification is elaborated. Com in retail industry ₁ And Com travel industry ₂ In the scenario (2), 6 tags for measuring the value performance of retail business status members are extracted from a retail portrait tag system, and the tags are respectively the last consumption time interval R ₁ The consumption amount M of the last year ₁ Consumption frequency of the last year F ₁ The maximum single consumption amount P in the last year ₁ Consumption ratio S of special price commodities in recent year ₁ High contribution commodity consumption ratio in the last year D ₁ (ii) a The total number of tags for measuring the value performance of the tourism business state members is extracted from a tourism portrait tag system is 4: annual survival frequency F ₂ And accumulating the night number H ₂ Last consumption time interval R ₂ Member's consumption amount M of 24 months ₂ . Exporting a Data set of the converged member group, and recording the Data set related to the retail status of the converged member as Data ₁ Data is recorded as a Data set of related Data of tourism business state ₂ 。

In a conventional customer value analysis, only three tags of consumption time interval, consumption frequency and consumption amount are generally used. However, considering the limitations of the three labels in measuring the value of the member, the value attributes of the member cannot be described comprehensively, and considering the heterogeneity of the business between the business states, the labels for analyzing the traditional customer value are adjusted. The 6 tags R which finally define the retail state as described above ₁ F ₁ M ₁ P ₁ S ₁₁ 4 tags R for travel ₂ F ₂ M ₂ H ₂ . In the traditional customer segmentation, the purpose of classifying customers is generally achieved by dividing value intervals of all labels, but the classification results obtained by the method are too many, and the classification results increase exponentially with the increase of analysis labels, so that the difficulty of result analysis is increased. In addition, the method is subjective in determining the interval division critical value, and the boundary between clients is fuzzy, so that the method has little guiding significance on the practical application of enterprises. Therefore, the embodiment of the invention adopts the clustering algorithm to realize the classification of the value grades of the customers, the classification of the customer classes is carried out according to the self structure of the data and the approaching and separating degree of the data, the distance between the classes is clear, the classification result is relatively simple and clear, and the problems of the traditional customer classification method can be well overcome. In the selection of the clustering algorithm, a K-Means algorithm can be adopted. The steps of the conventional K-Means algorithm are:

initializing step (1): randomly selecting K sample points as initial clustering centers;

step (2), updating the partitions: for each data point, calculating the distance from the data point to the center of each cluster, and classifying the data point into the nearest class;

step (3), updating the clustering center: calculating the coordinate average value of all points in each category, and taking the average value as a new clustering center;

step (4) judging whether convergence occurs: if the change of the cluster center is not higher than a preset threshold value or the change of the cost function is not higher than the preset threshold value, convergence is carried out; otherwise, turning to the step (2). Its generationThe concept of the cost function is: let the dataset D = { x ₁ ,x ₂ ,...,x _m D into k classes, { C _i I =1, 2.., K } wherein n is contained, respectively ₁ ,n ₂ ,...,n _k A sample, and

. Assume that the cluster center for each class is Q _i Then>

Wherein x is _j ∈C _i And i =1, 2.. K, assuming s is c _i Then the cost function is ≥>

. Therefore, the cost function is the sum of squares of the total errors, and the smaller the value of the cost function is, the smaller the error is, and the better the clustering effect is proved.

The K-Means algorithm has remarkable advantages, such as simple algorithm, understandable principle and small calculation amount; the algorithm has high convergence speed and high efficiency; the method has the characteristics of high efficiency and scalability when processing a large data set, and the algorithm has strong adaptability; when the distribution state of the data set is spherical or protruding, the clustering effect is better. However, the K-Means algorithm has limitations, and has high sensitivity to initial clustering centers, different initial clustering centers may have different clustering results when selected, and if the initial clustering centers are not properly selected, the clustering effect is greatly reduced; the method is sensitive to points with high noise or isolated points, and because the average value of the points in the category is calculated in each iteration and is used as a new clustering center, even fewer abnormal points have large influence on the average value, so that the result is unstable; the algorithm needs to preset the number k of clusters, however, on the unsupervised task, the value of k is difficult to determine because how many categories the data set has are not known.

Considering the limitation of the K-Means algorithm and the particularity of the research scene, aiming at the selection of the initial clustering center, K points with clustering as far as possible are selected to be used asIs the cluster center. Since the ultimate goal of customer segment is to compare the value categories of each segment population of the two industries to identify potential value customer populations, the value of category k is given in advance for operability and interpretability of the comparison. In addition, in the scene of the fusion of retail and tourism business states, the retail business state label R is considered ₁ F ₁ M ₁ P ₁ S ₁ C ₁ And a travel industry status label R ₂ F ₂ M ₂ H ₂ The data dimension of (2) is different, so that the data needs to be normalized and mapped to [0,1 ]]Between the ranges.

In summary, the retail industry Com is given below ₁ And (5) the specific step of customer segmentation. Tourism industry Com ₂ The steps for customer segmentation are similar.

Optionally, as an embodiment of the present invention, the assigning the value rank to the converged member by using the K-Means clustering algorithm includes:

step 1, acquiring Data set Data of any one state ₁ And (3) carrying out normalization processing on the data by using a normalization formula, wherein the normalization formula is as follows:

；

wherein X is Data ₁ Index value of any one of t tags, n is Data in Data set ₁ The number of the medium samples is m after normalization processing ₁ ,m ₂ ,…,m _n Wherein m is _i =(m _i1 ,m _i2 ,...,m _it )；

Step 2, acquiring the number k =5 of preset clustering categories;

；/>

The corresponding sample point is used as the second clustering center c ₂ I.e. is->

The corresponding sample point is used as a third clustering center c ₃ And the process is carried out in sequence until 5 clustering centers { c ] are selected ₁ ,c ₂ ,…,c ₅ The corresponding class of the 5 cluster centers is marked as { C } ₁ ,C ₂ ,…,C ₅ }；

Step 4, respectively calculating Euclidean distances D (c) between each sample point and 5 clustering centers _k ,m _i ) Grouping the sample points into the category corresponding to the minimum distance, i.e. into the category corresponding to the minimum distance

Of the corresponding classes, the reassigned class is still marked as { C ₁ ,C ₂ ,…,C ₅ At this time, the number of samples in each class is recorded as { n } ₁ ,n ₂ ,…,n _n }；

Step 5, calculating the average value of all sample points in each category

；

wherein m is _i ^(k) The superscript (k) of (2) represents the kth class, the reduction amount of the cost function value of two iterations is calculated, if the reduction amount is lower than a tolerance threshold epsilon, the algorithm is judged to be converged, and the clustering algorithm is ended; if the tolerance threshold epsilon is not lower than the tolerance threshold epsilon, the algorithm is not converged, the step 4 is switched to for cycle iteration until the clustering algorithm is converged, and finally the obtained { C ₁ ,C ₂ ,…,C ₅ As 5 converged membership value ratings.

Finally, the categories obtained by clustering can be analyzed in detail through radar graphs, value marks are added to the categories, and the retail industry Com is completed ₁ Classification of the potential value of the customer.

Optionally, as an embodiment of the present invention, S3 includes: taking a fused member group as a training set and a test set, taking a member label of a target state as a characteristic variable, taking a value grade of a fused member in the target state as a response variable, establishing a multi-classification prediction model on the training set based on stacking, using XGboost, lightGBM and GDBT algorithms, carrying out modeling, fitting and prediction by using the training set, completing a first layer model, taking a prediction result of the first layer model as an input characteristic variable, and taking the prediction result of the first layer model as a prediction quantity of the value grade of the target state; the value grade is used as a response variable, a Bayes classifier is used as a meta classifier, and the Bayes classifier is trained to obtain a complete multi-classification prediction model; verifying the built prediction model on the test set, applying the prediction model passing the verification to a single member in the target state, and predicting the value level of the single member in the target state by taking the value level of the single member in the target state as a response variable and the member label in the target state as a characteristic variable;

for example, a multi-classification prediction model is established based on stacking with a member label of a converged member in a retail business state as a characteristic variable and a member label of a converged member in a hotel business state as a response variable. During modeling, a training set and a test set are divided into a data set, a multi-class prediction model is trained on the training set, and then verification is performed on the test set. And finally, applying the established multi-classification prediction model to a single member group which is only a retail-state member but not a hotel-state member, predicting the value level of the single member in the hotel state, and identifying a client with potential value.

Optionally, as an embodiment of the present invention, S4 includes: establishing a recommendation model, and determining a sending channel and a sending period of recommendation information according to the value grade of a member in a target state, wherein the recommendation model is S = W O + U P, S is the value grade, W is the weight value of a preset sending channel, and U is the weight value of the sending period; o is a preset value of an output sending channel, P is a preset value corresponding to an output sending period, and O and P are positive integers; and obtaining O and P of at least one combined solution, and selecting one group with the minimum difference between O and P from the multiple combined solutions as a final OP to output. When recommending information, different recommendation schemes are generated for different customers, for example, customers with high value levels need to use more key sending channels and more frequent sending cycles, so that the customers can receive the recommendation information in time, and the relationship maintenance of the customers is facilitated. The general recommendation scheme comprises a sending channel and a sending period, and the channel and the period for sending the recommendation information can be directly and automatically distributed according to the value grade through the recommendation model provided by the embodiment; even if a new transmission channel is introduced with the innovation of technology or the development of companies, the weight can be set for the new transmission channel without adjusting the scheme for transmitting the recommendation information in a large range.

For example: the preset values of the sending channels comprise: when O =1, the sending channel is short message, when O =2, the sending channel is WeChat, when O =3, the sending channel is artificial intelligent telephone, when O =4, the sending channel is artificial telephone, and when O =5, the sending channel is mailing; the preset values corresponding to the transmission period include P =1,2,3, 4, 5, 6, 7, and P =1 corresponds to 7 days, P =2 corresponds to 6 days. Assuming that the value class S =5, W =0.4, u =0.6,0.4O + 0.6P =5, let O =1,2, 3.. Logann, n, n be a non-zero natural number, O < S/W, or let P =1,2, 3.. Logann, n be a non-zero natural number, O < S/W, different combinations of O, P values are output, including: o =5, p =5; o =2, p =7; the group of differences O =5, p =5 is the smallest, and there is no preset value corresponding to O =7, so that O =5, p =5 is output as the final OP.

Optionally, as an embodiment of the present invention, S5 includes: setting at least one feedback level R for the feedback content, and setting a mark a of the feedback level R, wherein the mark a is +1 to indicate that the recommendation is received, and the mark a is-1 to indicate that the recommendation is not received; periodically receiving feedback content and feedback times F of the user aiming at the recommended information in the last period, determining a feedback grade R according to the feedback content, and inputting a feedback model: t = [ (a × R × F)/e ], where [ ] represents an integer function, e is a rating change threshold, and T is the amount of change in the value rating of the user in the target state for the period.

In this embodiment, the feedback content may be information indicating acceptance and non-acceptance, such as a short message, and the user may send "unsubscribe" indicating non-acceptance, such as during a call return visit, a user's key feedback, such as three options of "like", "general" and "dislike" of a questionnaire set in a public account number tweet, or other user's feedback information that may be obtained without destroying the privacy of the user. As for the relation between the feedback content and the feedback level R, it needs to be set in advance, and the way of determining the feedback level R according to the feedback content is set according to the form of the feedback content, for example, three options of "like", "general", and "dislike", and the three options correspond to the three feedback levels.

Optionally, as an embodiment of the present invention, S6 includes: and recording the value grade variation, and analyzing the target recommendation information of which the variation or the variation times exceed a preset range.

In this embodiment, the target recommendation information is further analyzed, so that the attention points of the clients to the recommendation information can be accurately aimed, and the images of the clients are full. The common content of the recommended information preferred by the client is analyzed for the recommended information whose feedback content indicates acceptance, and the common content of the recommended information disliked by the client is analyzed for the recommended information whose feedback content indicates non-acceptance. For example, the first target recommendation information: "members in daily commodity area of supermarket discount eight discount, xx commodity buy one send one" on the weekday ", second target recommendation information: the member in the food area of Saturday of the week has seven discount benefits, and can draw a lottery once when consuming a hundred yuan, the feedback contents of the two pieces of recommendation information are both received by the user, the common content of the recommendation information is discount benefits, and therefore the user is interested in the discount benefits.

In the embodiment, the recommendation information is divided into the title and the content, and the general title is more noticeable to the client, so the importance of the title and the content is different, and the similarity is adopted for the extraction of the common content for evaluation. So the similarity includes: similarity of a title of the recommendation information based on an n-gram language model and one-hot coding and similarity of a part of speech analysis of the content of the recommendation information based on an LSTM model.

On one hand, the method for calculating the title similarity of the target recommendation information comprises the following steps: performing text segmentation on the title of the target recommendation information through an n-gram language model based on the short text characteristic of the title of the target recommendation information, and obtaining a sparse word vector based on a one-hot coding semantic dictionary; and calculating the cosine similarity between the word vectors and calculating the similarity of the titles of the recommendation information of any two discourse targets.

On the other hand, the method for calculating the content similarity of the target recommendation information comprises the following steps: utilizing a syntactic analysis tool to segment terms in the content of the target recommendation information to obtain segmented terms and syntactic positions thereof, generating term vectors from the segmented terms and the syntactic positions thereof based on a one-hot coding semantic dictionary, inputting the term vectors into an LSTM model, and outputting a part-of-speech judgment result of the term vectors by the LSTM model; and generating word sense vectors based on one-hot coding according to the word item vectors and the part-of-speech judgment results of the full text, and calculating cosine similarity among the word sense vectors to serve as a result of recommending information content similarity. For example, the first target recommendation information: "members in daily commodity area of supermarket of this weekday have eight discount preferential effects, xx commodity buy one to one", planThe method is divided into 'this saturday/food zone/member/seven/discount/, member/consumption/full/one hundred, namely, available/lucky draw/one time', the syntactic position comprises 'main, predicate, object, definite, form and complement', and can be directly obtained by a syntactic analysis tool, for example, 'this saturday' is 'fixed language', 'food zone' is subject, and 'seven discount' is a form language. The part of speech is divided into conventional parts of speech such as noun, verb, number word and quantitative word, and the lexical item vector of the number word is a ₁ 、a ₂ 、a ₃ Part of speech "a" from the LSTM model ₁ -number "," a ₂ -quantifier "," a ₃ -noun ", further deriving a sense vector as a ₁ 、A ₂ 、A ₃ And converting all the target recommendation information in a word vector space to obtain respective word meaning vectors. The word sense vector set of the first target recommendation information is obtained as A = { A = { (A) ₁ ，A ₂ ，A ₃ ，...，A _n And obtaining a word sense vector set of the second target recommendation information in the same way, wherein the word sense vector set is B = { B = } ₁ ，B ₂ ，B ₃ ，...，B _n }; calculating the content similarity of the two pieces of recommended information according to a formula of word vector cosine, wherein the formula is as follows:

。

optionally, as an embodiment of the present invention, the LSTM model training process is as follows: and performing lexical item segmentation on the content of the historical recommendation information by using the part of speech of a syntactic analysis tool, generating lexical item vectors by using the segmented lexical items and syntactic positions thereof based on a one-hot coding semantic dictionary, inputting the lexical item vectors into an LSTM (least squares metric) model, wherein response vectors of the LSTM model are part of speech judgment results.

After the similarity of any two pieces of recommended content is determined, target recommendation information with the content similarity higher than the preset similarity is fed back to an enterprise, so that workers can manually analyze the content with the high similarity, common points of manual analysis are converted into customer characteristics and marked on a customer omnibearing picture, for example, the similarity of the recommendation information of the three sections of the A, B and C of a certain client is higher than the preset similarity, and if the similarity is found to include the information of discount and new on new product through manual analysis, the characteristics of the client are 'favorite discount' and 'concern new', and the recommendation information is conveniently sent according to the characteristics of the client.

Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or alterations to the embodiments of the present invention may be made by those skilled in the art without departing from the spirit and scope of the present invention, and such modifications or alterations should be considered as being within the scope of the present invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A multi-business enterprise potential customer identification method is characterized by comprising the following steps:

carrying out data fusion on the member data of each business state, comprising the following steps: performing data fusion on the member data of each business state by using a user unique identification technology, and determining fusion members which coexist in any two business states at least and a single member which exists in any one business state only; realizing label unification by using a text recognition technology, and constructing all-round pictures of all-state members according to member labels of the converged members;

for Com existing in two industries ₁ And Com ₂ The converged member group in (1), the member in Com ₁ And Com ₂ Respectively performing value analysis to Com ₁ The member in the system marks Com from bottom to top according to the value grade ₂ Members inMarking from bottom to top according to the value grade; whereby each individual in the converged member population can be in Com ₁ Finding the corresponding first grade C in the value grades ₁ Or in Com ₂ Finding out the second grade C corresponding to the value grade ₂ (ii) a A certain member individual m ₁ In Com ₁ First grade C in ₁ And in Com ₂ Second grade C in ₂ If C is ₁ > C ₂ To Com ₂ In other words, m ₁ Is a member of its potential value;

clustering fusion members in different industries, directly inputting member characteristics into a clustering algorithm, or determining the fusion members with potential values in target states by adopting grade comparison of respective industries;

taking a fused member group as a training set and a test set, taking a member label of a target state as a characteristic variable, taking a value grade of a fused member in the target state as a response variable, establishing a multi-classification prediction model on the training set based on stacking, using XGboost, lightGBM and GDBT algorithms, carrying out modeling, fitting and prediction by using the training set, completing a first layer model, taking a prediction result of the first layer model as an input characteristic variable, and taking the prediction result of the first layer model as a prediction quantity of the value grade of the target state; the value grade is used as a response variable, a Bayes classifier is used as a meta classifier, and a complete multi-classification prediction model is obtained after the Bayes classifier is trained; verifying the built prediction model on the test set, applying the prediction model passing the verification to a single member in the target state, and predicting the value level of the single member in the target state by taking the value level of the single member in the target state as a response variable and taking a member label of the target state as a characteristic variable;

establishing a recommendation model, and sending recommendation information to the members in the target business state according to the value levels of the members; updating the value level of the member in the target state according to the feedback content and the feedback quantity of the user aiming at the recommendation information; and analyzing the similarity of the fed back pieces of recommendation information according to the value grade change condition, and updating the member all-round picture according to the common point of the similar recommendation information.

2. The method of claim 1, wherein the tag unification is realized by using a text recognition technology, and the full-industry member full-range portrait is constructed according to the member tags of the converged members, and the method comprises the following steps:

constructing a member omnibearing picture based on all-state data, and setting a member label to comprise a fact label and an analysis label, wherein the fact label comprises an attribute label and a behavior label;

3. The method of claim 1, wherein assigning value ratings to converged members using a K-Means clustering algorithm comprises:

；/>

Step 2, acquiring the number K of preset clustering categories;

；

The value of which is denoted m ₂ Calculating c ₂ And n-2 samples m ₃ ,…,m _n Euclidean distance D (c) ₂ ,m _i ),i=3,4,...,n；

The corresponding sample point is used as a third clustering center c ₃ And the steps are carried out in sequence until k clustering centers { c ] are selected ₁ ,c ₂ ,…,c _k All the k cluster centers are marked as { C } ₁ ,C ₂ ,…,C _k }；

Step 4, respectively calculating Euclidean distances D (c) between each sample point and k clustering centers _k ,m _i ) Grouping the sample points into the category corresponding to the minimum distance, i.e. into the category corresponding to the minimum distance

Of the corresponding classes, the reassigned class remains labeled as { C ₁ ,C ₂ ,…,C _k At this point, the number of samples in each class is denoted as { n } ₁ ,n ₂ ,…,n _n }；

Step 5, calculating the average value of all sample points in each category

And the new clustering center is still marked as { c ₁ ,c ₂ ,…,c _k }；

；

4. The method of claim 1, wherein the establishing a recommendation model for sending recommendation information to the member in the target business state according to the value rating of the member comprises:

establishing a recommendation model, and determining a sending channel and a sending period of recommendation information according to the value grade of a member in a target state, wherein the recommendation model is S = W O + U P, S is the value grade, W is the weight value of a preset sending channel, and U is the weight value of the sending period; o is a preset value of an output sending channel, P is a preset value corresponding to an output sending period, and O and P are positive integers;

5. The method of claim 1, wherein updating the value rating of the member in the target business state according to the feedback content and the feedback amount of the user for the recommendation information comprises:

periodically receiving feedback content and feedback times F of the user aiming at the recommended information in the previous period, determining a feedback grade R according to the feedback content, and inputting a feedback model: t = [ (a x R x F)/e ], where [ ] represents an integer function, e is the rating change threshold, and T is the amount of change in the value rating of the periodic user in the target state.

6. The method according to claim 1, wherein the analyzing the content similarity of the fed-back pieces of recommendation information according to the value level variation comprises:

utilizing a syntactic analysis tool to segment terms in the content of the target recommendation information, generating term vectors of the segmented terms and the syntactic positions thereof based on a one-hot coding semantic dictionary, inputting the term vectors into an LSTM model, and outputting a part-of-speech judgment result of the term vectors by the LSTM model; and generating a word sense vector according to the term vector and the part-of-speech judgment result thereof, and calculating the cosine similarity of the word sense vector between any two pieces of target recommendation information as a result of the content similarity.

7. The method of claim 1, wherein the fusing the member data of each business state by using the user unique identification technology comprises: and performing ID-Mapping by using oneid technology based on the business entity mobile phone number, the identity card number, the mailbox address and the mobile terminal PC terminal equipment ID of each business state in combination with business rules, machine learning and graph calculation algorithms, mapping each business state unique ID to a unified ID, and associating the data of each data island through the unified ID.