CN113763032A - Commodity purchase intention identification method and device - Google Patents
Commodity purchase intention identification method and device Download PDFInfo
- Publication number
- CN113763032A CN113763032A CN202110885513.5A CN202110885513A CN113763032A CN 113763032 A CN113763032 A CN 113763032A CN 202110885513 A CN202110885513 A CN 202110885513A CN 113763032 A CN113763032 A CN 113763032A
- Authority
- CN
- China
- Prior art keywords
- features
- input
- model
- recognition model
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Abstract
The invention provides a commodity purchase intention identification method and device. The method comprises the following steps: determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of customer site communication records; screening the features based on the correlation between the features; establishing a recognition model which takes the screened features as input and the purchasing intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one; and predicting the commodity purchasing intention of the customer by using the trained recognition model. According to the invention, the field communication records of the salesman and the customer are used as the data source to establish the identification model, and the multi-dimensional characteristics influencing the buying desire are screened and model optimized, so that the buying desire of the customer can be accurately identified without other external information data sources, and the communication content between the salesman and the customer is more targeted, deeper and more efficient, thereby promoting the deal and improving the sales performance.
Description
Technical Field
The invention belongs to the technical field of intelligent identification, and particularly relates to a commodity purchase intention identification method and device.
Background
The purchase intention refers to the tendency of the consumer to purchase the goods and an indication signal of the actual shopping behavior of the consumer. Typically measured in terms of the speed, direction and size of the consumer's diversion of the purchase probability for a particular good over a period of time. The existing purchase intention identification method is mainly based on subjective judgment and mainly adopts the following method. In the communication with the consumers, the salesperson identifies the purchase demands of the consumers through answering and analyzing questions such as questioning content, product selection, visiting times, bargaining and price-paying conditions, payment methods, loan and the like of the consumers; and then, through a plurality of times of communication with the consumers, the consumers with higher purchase intentions are identified, and tracking return visits are performed regularly to promote the transaction. The mode of identifying the higher purchasing intention is judged and recognized through the repeated communication and exchange between the salesperson and the consumer, and the method is mainly suitable for commodities such as real estate, automobiles, jewelry, bulk household appliances and the like, or consumer goods with longer communication chains such as financial management, insurance and the like.
The existing identification method mainly based on subjective judgment mainly has the following problems: based on subjective judgment of people, the randomness is strong, the curing is difficult, and the stability is lacked; the subjective judgment is difficult to effectively grasp the selling opportunity due to the timeliness problem; aiming at the weak guidance of the judged transaction promoting measures; the consumer behavior needs to be additionally judged by means of other external information data, the data acquisition and summary analysis difficulty is high, and privacy violation is suspected.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a method and an apparatus for identifying a purchase intention of a product.
In order to achieve the above object, the present invention adopts the following technical solutions.
In a first aspect, the present invention provides a method for identifying a purchase intention of a commodity, comprising the steps of:
determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of customer site communication records;
screening the features based on the correlation between the features;
establishing a recognition model which takes the screened features as input and the purchasing intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and predicting the commodity purchasing intention of the customer by using the trained recognition model.
Further, the method also comprises the step of constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, the value of each characteristic value is a group of continuous integers, the size of each characteristic value represents the favorable degree of the deal, and the larger the characteristic value is, the more favorable the deal is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
Further, the recognition model is a naive bayes classifier, and the classifier output y is:
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
Further, a method of screening features based on correlations between features includes:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
Still further, the method for optimizing the recognition model comprises:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
In a second aspect, the present invention provides an article purchase intention identifying device, comprising:
the characteristic determining module is used for determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of the customer site communication records;
the characteristic screening module is used for screening the characteristics based on the correlation among the characteristics;
the model optimization module is used for establishing a recognition model which takes the screened features as input and the purchase intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and the intention recognition module is used for predicting the commodity purchasing intention of the customer by using the trained recognition model.
The device further comprises a database construction module for constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, the value of each characteristic value is a group of continuous integers, the size of each characteristic value represents the favorable degree of a deal, and the larger the characteristic value is, the more favorable the deal is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
Further, the recognition model is a naive bayes classifier, and the classifier output y is:
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
Further, a method of screening features based on correlations between features includes:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
Still further, the method for optimizing the recognition model comprises:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
Compared with the prior art, the invention has the following beneficial effects.
The invention determines the multidimensional characteristics related to the commodity purchase intention based on the analysis of the on-site communication records of the customer, screens the characteristics based on the correlation among the characteristics, establishes the recognition model taking the screened characteristics as input and the purchase intention as output, carries out model optimization by screening the influence degree of each input characteristic on the model one by one, predicts the commodity purchase intention of the customer by using the trained recognition model, and realizes the multidimensional automatic recognition of the commodity purchase intention of the customer. According to the invention, the field communication records of the salesman and the customer are used as the data source to establish the identification model, and the multi-dimensional characteristics influencing the buying desire are screened and model optimized, so that the buying desire of the customer can be accurately identified without other external information data sources, and the communication content between the salesman and the customer is more targeted, deeper and more efficient, thereby promoting the deal and improving the sales performance.
Drawings
Fig. 1 is a flowchart of a method for identifying a purchase intention of a product according to an embodiment of the present invention.
Fig. 2 is a block diagram of an apparatus for recognizing a purchase intention of an article according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described below with reference to the accompanying drawings and the detailed description. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a method for identifying a purchase intention of a product according to an embodiment of the present invention, including the following steps:
102, screening the features based on the correlation among the features;
103, establishing a recognition model which takes the screened features as input and the purchase intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and step 104, predicting the commodity purchasing intention of the customer by using the trained recognition model.
In this embodiment, step 101 is mainly used to determine characteristics related to the purchase intention of the product. The purchase intention of the goods is a subjective intention of the customer objectively existing in the sales context of the goods. Through sorting the communication content with the customers in the sales scene, a multi-level and multi-dimensional index frame and system are constructed, and a foundation is laid for data modeling and quantitative analysis. Taking automobile sales as an example, by analyzing the communication content with customers in an automobile sales scene, firstly, 5 primary indexes such as price, procedure, service, configuration and personal demand and 12 subordinate secondary indexes are sorted and selected; then arranging and selecting three core dimensions of high efficiency, pertinence and depth; finally, through further cross analysis of the indexes and the dimensionality, 27 characteristic indexes are selected, wherein the effectiveness indexes comprise 4 overall effectiveness, client effectiveness, advisor effectiveness and relative effectiveness; the pertinence indexes are 13: price pertinence, offer pertinence, other price pertinence, procedure pertinence, payment procedure pertinence, other procedure pertinence, service pertinence, pre-sale service pertinence, post-sale service pertinence, service other pertinence, configuration pertinence, customer pertinence; the depth indexes are 10: price depth, offer depth, other price depth, procedure payment depth, other procedure depth, service depth, configuration depth, customer depth. The obtained characteristic indexes are all derived from the content and the mode in the communication process of the salesman and the customer, other external information and data are not involved, and a better basis is provided for the establishment, the optimization and the like of the model.
In this embodiment, step 102 is mainly used to perform feature screening. In order to avoid missing important features, the preliminarily determined features related to purchasing will generally be many, such as the aforementioned 27 features, which, if all are used as input features for identifying the model without distinction, will greatly affect the accuracy and complexity of the model, and therefore, must be screened first. In fact, many of these features have strong or even strong correlation between them, and some of them can be deleted for the features with more significant correlation. The present embodiment screens features based primarily on correlations between features. The following embodiment will provide a specific technical solution for feature screening.
In this embodiment, step 103 is mainly used to establish an identification model and model optimization. The purchase intention of the goods can be divided into different number of categories (grades or levels) according to specific application scenarios, such as three categories of high, medium and low, which are common, and two categories of low and high. This embodiment is to classify the buying will into two categories. Since there is no specific specification as to what is "low" and what is "high", to avoid causing ambiguity in the concept, the present implementation refers to the two categories as purchase willingless and purchase willingless, which are actually equivalent to the purchase willingless and purchase willingless, but the call is more normative. In the embodiment, the screened features are used as input, and whether purchase intention exists or not is used as output to construct the recognition model. The recognition model is actually a two-classifier, and there are many technical schemes capable of implementing two-classification, and this embodiment does not limit the specific recognition model, and the following embodiments will provide a specific recognition model. In order to improve the accuracy of identifying the model, the model is further optimized after the model is established. The optimization method comprises the steps of screening input features one by one, respectively detecting the influence of the input features on model precision, and deleting the features which are not obvious in improving the model precision, so that the recognition effect of the model is optimal. The latter embodiment will give a specific model optimization scheme.
In this embodiment, step 104 is mainly used to predict (identify) the purchase intention of the product of the customer. In the embodiment, the trained recognition model is used, and the characteristic value obtained based on the communication record of the customer is used as the input of the recognition model, so that the commodity purchasing intention of the customer can be conveniently obtained. The data set of the training model is from a database previously built based on customer communication records. An embodiment of creating a database will be given later.
As an optional embodiment, the method further comprises the step of constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, the value of each characteristic value is a group of continuous integers, the size of each characteristic value represents the favorable degree of the deal, and the larger the characteristic value is, the more favorable the deal is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
The embodiment provides a technical scheme for constructing the database. The database is mainly used for constructing a model training data set, a model testing data set and the like. As mentioned above, the data resources of the database come from the communication records between the sales personnel and the customers. In order to construct a high-quality database, the inventor has arranged about 3000 parts of automobile sales site communication records summarizing luxury brands, middle and high-end brands, mainstream joint-investment brands and partial independent brands, text-converted the records by language-text conversion and semantic recognition technology, and established a database consisting of characteristic values of each characteristic and communication results (whether the communication is successful or not after two weeks) namely tags according to the determined 27 characteristic indexes. The value of the label is only 0 and 1, 0 represents that the transaction is not completed within 2 weeks, and 1 represents that the transaction is completed within 2 weeks; the value of each characteristic value is a group of continuous integers, such as 1-5, the size of each characteristic value represents the favorable degree of the transaction, and the larger the characteristic value is, the higher the possibility of the transaction is.
As an alternative embodiment, the recognition model is a naive bayes classifier, and the classifier output y is:
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
The embodiment provides a technical scheme of the recognition model. As described above, the naive bayes classifier is adopted as the recognition model in the embodiment, because the naive bayes algorithm is discovered to have the advantages that the considered factors are more diversified, the factors synchronously influence the result rather than a layer-by-layer progressive mode through the matching discussion and argument with the real sales scene, the output result is obtained according to the influence probability comparison of the factors, the operation is relatively simpler and more efficient, and the like, and the naive bayes classifier is more suitable for judging the purchase intention of the customer in the sales scene.
The naive Bayes algorithm is a simple and efficient classification algorithm which is recognized at present and is a classification method based on Bayes theorem and independent hypothesis of characteristic conditions. For a given training data set, first learning a joint probability distribution of input/output based on feature condition independent assumptions; based on this model, the output y with the highest posterior probability is then found for a given input x using bayesian theorem. Bayesian theorem is a theorem on the conditional probabilities of random events Y and X. Where P (Y | X) is the probability of Y occurring in the event X occurs. The bayesian theorem equation is expressed as:
when classifying by naive Bayes method, for given input x, the posterior probability distribution P (Y ═ Y) is calculated by the learned modelkI X ═ X), the class with the highest posterior probability is output as the class of X. The posterior probability calculation is performed according to Bayes' theorem:
ykrespectively representing k categories of the target variable Y. Since the denominator is for all ykAre all the same, and conditional independence assumptions are made on conditional probability distributions, so a naive bayes classifier can be expressed as:
in the following, the application of the naive bayes algorithm is briefly described by taking 3 feature data and labels as training samples. The feature X of the training sample is { a4, B12, C6}, a4, B12, C6 each represent a feature; the output variable Y has two values {0,1}, which respectively represent no purchase intention and purchase intention. The training sample data is shown in tables 1,2 and 3.
TABLE 1
TABLE 2
TABLE 3
According to the training sample data table, the following probabilities are calculated:
P(Y=0)=2112/2970,P(Y=1)=858/2970
P(X1=2|Y=0)=506/2112,P(X2=3|Y=0)=440/2112,P(X3=4|Y=0)=484/2112
P(X1=2|Y=1)=88/858,P(X2=3|Y=1)=198/858,P(X3=4|Y=1)=374/858
therefore, the first and second electrodes are formed on the substrate,
P(Y=0|X)=P(Y=0)*P(X1=2|Y=0)*P(X2=3|Y=0)*P(X3=4|Y=0)
=(2112/2970)*(506/2112)*(440/2112)*(484/2112)=0.008
P(Y=1|X)=P(Y=1)*P(X1=2|Y=1)*P(X2=3|Y=1)*P(X3=4|Y=1)
=(858/2970)*(88/858)*(198/858)*(374/858)=0.003
since P (Y ═ 0| X) ═ 0.008> P (Y ═ 1| X) ═ 0.003, the predicted value of the test sample target variable was 0, representing no purchase intention.
As an alternative embodiment, the method of screening features based on correlations between features includes:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
This embodiment provides a technical solution for screening features based on correlations between features. The technical scheme adopted by the embodiment is as follows: firstly, sequencing all the characteristics; and then calculating the correlation coefficient between any two features, if the absolute value of the correlation coefficient of a certain two features exceeds a set threshold, indicating that the two features have remarkable correlation, and deleting the next feature. Since the correlation coefficient has a positive or negative score (positive means positive correlation, and negative means negative correlation), the strength of the correlation should be determined according to the absolute value of the correlation coefficient. The threshold value may be empirically determined, and may be 0.6, for example. Feature ordering is based on the magnitude of the F1 value for a recognition model with only one input feature, with larger F1 ordering higher. To improve accuracy, F1 takes multiple calculated F1 means. When the F1 values are the same, the F1 coefficient of variation is considered, and the row with the smaller F1 coefficient of variation is in the front. The calculation formula of F1 is the same as the previous formula and will not be described in detail here.
As an alternative embodiment, the method for optimizing the recognition model includes:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimized models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
The embodiment provides a technical scheme for optimizing the recognition model. In the embodiment, the recognition model is optimized by adopting a method for screening the input features one by one, and the specific method is as follows: deleting an input feature, and calculating F1 mean value and F1 variation coefficient of a model which takes the residual feature as input after the feature is deleted; deleting another feature after restoring the previously deleted feature, and calculating the F1 mean value and the F1 coefficient of variation after deleting the feature; the above steps are repeated until the same operation is performed once for each input feature. If the F1 mean value corresponding to a certain feature is the largest, or the F1 coefficient of variation is the smallest when the parallel is the largest, and the F1 mean value is larger than the F1 mean value before optimization (without deleting any feature), the model can be optimized by deleting the feature and is better than deleting other features, so that the model after deleting the feature is taken as a primary optimization model. And then, performing second and third optimization on the primary optimization model by adopting the same method until the maximum value of the F1 mean value of the model after one or more characteristics are deleted is smaller than the F1 mean value of the previous model, so as to obtain the final optimization model.
Fig. 2 is a schematic composition diagram of an apparatus for recognizing a purchase intention of a product according to an embodiment of the present invention, the apparatus including:
the characteristic determining module 11 is used for determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of the customer site communication records;
a feature screening module 12 for screening features based on correlations between the features;
the model optimization module 13 is used for establishing a recognition model which takes the screened features as input and the purchase intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and the intention recognition module 14 is used for predicting the commodity purchasing intention of the customer by using the trained recognition model.
The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again. The same applies to the following embodiments, which are not further described.
As an optional embodiment, the apparatus further includes a database construction module, configured to construct a database, where each data record of the database corresponds to an effective customer communication record, each data record includes a feature value and a result, a value of each feature value is a set of continuous integers, a size of each feature value indicates a degree of benefit for a deal, and a larger feature value is more beneficial for the deal; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
As an alternative embodiment, the recognition model is a naive bayes classifier, and the classifier output y is:
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
As an alternative embodiment, the method of screening features based on correlations between features includes:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
As an alternative embodiment, the method for optimizing the recognition model includes:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A commodity purchase intention identification method is characterized by comprising the following steps:
determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of customer site communication records;
screening the features based on the correlation between the features;
establishing a recognition model which takes the screened features as input and the purchasing intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and predicting the commodity purchasing intention of the customer by using the trained recognition model.
2. The method according to claim 1, further comprising the step of constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, each characteristic value is a set of continuous integers, the size of each integer represents the degree of the advantage of the transaction, and the larger the characteristic value is, the more the transaction is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
3. The method according to claim 2, wherein the recognition model is a naive bayes classifier, and the classifier output y is:
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
4. The method of recognizing a purchase intention of a commodity according to claim 3, wherein the method of screening the features based on the correlation between the features comprises:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
5. The method as claimed in claim 4, wherein the method for optimizing the recognition model comprises:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
6. An article purchase intention recognition apparatus, comprising:
the characteristic determining module is used for determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of the customer site communication records;
the characteristic screening module is used for screening the characteristics based on the correlation among the characteristics;
the model optimization module is used for establishing a recognition model which takes the screened features as input and the purchase intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and the intention recognition module is used for predicting the commodity purchasing intention of the customer by using the trained recognition model.
7. The apparatus according to claim 6, further comprising a database construction module for constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, each characteristic value is a set of continuous integers, the size of each integer represents a favorable degree for a deal, and the larger the characteristic value is, the more favorable the deal is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
8. The apparatus according to claim 7, wherein the recognition model is a naive bayes classifier, and the classifier output y is:
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
9. The apparatus of claim 8, wherein the method of screening the characteristics based on the correlation between the characteristics comprises:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
10. The apparatus of claim 9, wherein the method for optimizing the recognition model comprises:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110885513.5A CN113763032B (en) | 2021-08-03 | 2021-08-03 | Commodity purchase intention recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110885513.5A CN113763032B (en) | 2021-08-03 | 2021-08-03 | Commodity purchase intention recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113763032A true CN113763032A (en) | 2021-12-07 |
CN113763032B CN113763032B (en) | 2023-08-04 |
Family
ID=78788422
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110885513.5A Active CN113763032B (en) | 2021-08-03 | 2021-08-03 | Commodity purchase intention recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113763032B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127546A (en) * | 2016-06-20 | 2016-11-16 | 重庆房慧科技有限公司 | A kind of Method of Commodity Recommendation based on the big data in intelligence community |
CN106447384A (en) * | 2016-08-31 | 2017-02-22 | 五八同城信息技术有限公司 | Method and apparatus for determining object user |
CN110555717A (en) * | 2019-07-29 | 2019-12-10 | 华南理工大学 | method for mining potential purchased goods and categories of users based on user behavior characteristics |
CN111681051A (en) * | 2020-06-08 | 2020-09-18 | 上海汽车集团股份有限公司 | Purchasing intention degree prediction method, device, storage medium and terminal |
CN113095861A (en) * | 2020-01-08 | 2021-07-09 | 浙江大搜车软件技术有限公司 | Method, device and equipment for predicting target object transaction probability and storage medium |
-
2021
- 2021-08-03 CN CN202110885513.5A patent/CN113763032B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127546A (en) * | 2016-06-20 | 2016-11-16 | 重庆房慧科技有限公司 | A kind of Method of Commodity Recommendation based on the big data in intelligence community |
CN106447384A (en) * | 2016-08-31 | 2017-02-22 | 五八同城信息技术有限公司 | Method and apparatus for determining object user |
CN110555717A (en) * | 2019-07-29 | 2019-12-10 | 华南理工大学 | method for mining potential purchased goods and categories of users based on user behavior characteristics |
CN113095861A (en) * | 2020-01-08 | 2021-07-09 | 浙江大搜车软件技术有限公司 | Method, device and equipment for predicting target object transaction probability and storage medium |
CN111681051A (en) * | 2020-06-08 | 2020-09-18 | 上海汽车集团股份有限公司 | Purchasing intention degree prediction method, device, storage medium and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN113763032B (en) | 2023-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cho et al. | A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy prediction | |
Oh et al. | Analyzing stock market tick data using piecewise nonlinear model | |
US10552735B1 (en) | Applied artificial intelligence technology for processing trade data to detect patterns indicative of potential trade spoofing | |
CN108921602B (en) | User purchasing behavior prediction method based on integrated neural network | |
CN110852881B (en) | Risk account identification method and device, electronic equipment and medium | |
CN113469730A (en) | Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
CN114612251A (en) | Risk assessment method, device, equipment and storage medium | |
CA3165582A1 (en) | Data processing method and system based on similarity model | |
CN111754317A (en) | Financial investment data evaluation method and system | |
CN111882420A (en) | Generation method of response rate, marketing method, model training method and device | |
CN115526652A (en) | Client loss early warning method and system based on machine learning | |
CN112561320A (en) | Training method of mechanism risk prediction model, mechanism risk prediction method and device | |
CN107133862A (en) | Dynamic produces the method and system of the detailed transaction payment experience of enhancing credit evaluation | |
CN115545886A (en) | Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium | |
Kumar et al. | Credit score prediction system using deep learning and k-means algorithms | |
Kim et al. | Predicting corporate defaults using machine learning with geometric-lag variables | |
CN114118793A (en) | Local exchange risk early warning method, device and equipment | |
Cao et al. | Financial crisis forecasting via coupled market state analysis | |
CN113763032A (en) | Commodity purchase intention identification method and device | |
CN116502813A (en) | Abnormal order detection method based on ensemble learning | |
EP3493082A1 (en) | A method of exploring databases of time-stamped data in order to discover dependencies between the data and predict future trends | |
Ragapriya et al. | Machine Learning Based House Price Prediction Using Modified Extreme Boosting | |
Niknya et al. | Financial distress prediction of Tehran Stock Exchange companies using support vector machine | |
CN112232945A (en) | Method and device for determining personal customer credit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |