CN113763032A - Commodity purchase intention identification method and device - Google Patents

Commodity purchase intention identification method and device Download PDF

Info

Publication number
CN113763032A
CN113763032A CN202110885513.5A CN202110885513A CN113763032A CN 113763032 A CN113763032 A CN 113763032A CN 202110885513 A CN202110885513 A CN 202110885513A CN 113763032 A CN113763032 A CN 113763032A
Authority
CN
China
Prior art keywords
features
input
model
recognition model
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110885513.5A
Other languages
Chinese (zh)
Other versions
CN113763032B (en
Inventor
邱琰
许青江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zebra C Data Technology Co ltd
Original Assignee
Beijing Zebra C Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zebra C Data Technology Co ltd filed Critical Beijing Zebra C Data Technology Co ltd
Priority to CN202110885513.5A priority Critical patent/CN113763032B/en
Publication of CN113763032A publication Critical patent/CN113763032A/en
Application granted granted Critical
Publication of CN113763032B publication Critical patent/CN113763032B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

The invention provides a commodity purchase intention identification method and device. The method comprises the following steps: determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of customer site communication records; screening the features based on the correlation between the features; establishing a recognition model which takes the screened features as input and the purchasing intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one; and predicting the commodity purchasing intention of the customer by using the trained recognition model. According to the invention, the field communication records of the salesman and the customer are used as the data source to establish the identification model, and the multi-dimensional characteristics influencing the buying desire are screened and model optimized, so that the buying desire of the customer can be accurately identified without other external information data sources, and the communication content between the salesman and the customer is more targeted, deeper and more efficient, thereby promoting the deal and improving the sales performance.

Description

Commodity purchase intention identification method and device
Technical Field
The invention belongs to the technical field of intelligent identification, and particularly relates to a commodity purchase intention identification method and device.
Background
The purchase intention refers to the tendency of the consumer to purchase the goods and an indication signal of the actual shopping behavior of the consumer. Typically measured in terms of the speed, direction and size of the consumer's diversion of the purchase probability for a particular good over a period of time. The existing purchase intention identification method is mainly based on subjective judgment and mainly adopts the following method. In the communication with the consumers, the salesperson identifies the purchase demands of the consumers through answering and analyzing questions such as questioning content, product selection, visiting times, bargaining and price-paying conditions, payment methods, loan and the like of the consumers; and then, through a plurality of times of communication with the consumers, the consumers with higher purchase intentions are identified, and tracking return visits are performed regularly to promote the transaction. The mode of identifying the higher purchasing intention is judged and recognized through the repeated communication and exchange between the salesperson and the consumer, and the method is mainly suitable for commodities such as real estate, automobiles, jewelry, bulk household appliances and the like, or consumer goods with longer communication chains such as financial management, insurance and the like.
The existing identification method mainly based on subjective judgment mainly has the following problems: based on subjective judgment of people, the randomness is strong, the curing is difficult, and the stability is lacked; the subjective judgment is difficult to effectively grasp the selling opportunity due to the timeliness problem; aiming at the weak guidance of the judged transaction promoting measures; the consumer behavior needs to be additionally judged by means of other external information data, the data acquisition and summary analysis difficulty is high, and privacy violation is suspected.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a method and an apparatus for identifying a purchase intention of a product.
In order to achieve the above object, the present invention adopts the following technical solutions.
In a first aspect, the present invention provides a method for identifying a purchase intention of a commodity, comprising the steps of:
determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of customer site communication records;
screening the features based on the correlation between the features;
establishing a recognition model which takes the screened features as input and the purchasing intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and predicting the commodity purchasing intention of the customer by using the trained recognition model.
Further, the method also comprises the step of constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, the value of each characteristic value is a group of continuous integers, the size of each characteristic value represents the favorable degree of the deal, and the larger the characteristic value is, the more favorable the deal is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
Further, the recognition model is a naive bayes classifier, and the classifier output y is:
Figure BDA0003193939980000021
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
Further, a method of screening features based on correlations between features includes:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
Figure BDA0003193939980000022
Figure BDA0003193939980000023
Figure BDA0003193939980000031
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
Still further, the method for optimizing the recognition model comprises:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
In a second aspect, the present invention provides an article purchase intention identifying device, comprising:
the characteristic determining module is used for determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of the customer site communication records;
the characteristic screening module is used for screening the characteristics based on the correlation among the characteristics;
the model optimization module is used for establishing a recognition model which takes the screened features as input and the purchase intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and the intention recognition module is used for predicting the commodity purchasing intention of the customer by using the trained recognition model.
The device further comprises a database construction module for constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, the value of each characteristic value is a group of continuous integers, the size of each characteristic value represents the favorable degree of a deal, and the larger the characteristic value is, the more favorable the deal is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
Further, the recognition model is a naive bayes classifier, and the classifier output y is:
Figure BDA0003193939980000041
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
Further, a method of screening features based on correlations between features includes:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
Figure BDA0003193939980000042
Figure BDA0003193939980000043
Figure BDA0003193939980000044
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
Still further, the method for optimizing the recognition model comprises:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
Compared with the prior art, the invention has the following beneficial effects.
The invention determines the multidimensional characteristics related to the commodity purchase intention based on the analysis of the on-site communication records of the customer, screens the characteristics based on the correlation among the characteristics, establishes the recognition model taking the screened characteristics as input and the purchase intention as output, carries out model optimization by screening the influence degree of each input characteristic on the model one by one, predicts the commodity purchase intention of the customer by using the trained recognition model, and realizes the multidimensional automatic recognition of the commodity purchase intention of the customer. According to the invention, the field communication records of the salesman and the customer are used as the data source to establish the identification model, and the multi-dimensional characteristics influencing the buying desire are screened and model optimized, so that the buying desire of the customer can be accurately identified without other external information data sources, and the communication content between the salesman and the customer is more targeted, deeper and more efficient, thereby promoting the deal and improving the sales performance.
Drawings
Fig. 1 is a flowchart of a method for identifying a purchase intention of a product according to an embodiment of the present invention.
Fig. 2 is a block diagram of an apparatus for recognizing a purchase intention of an article according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described below with reference to the accompanying drawings and the detailed description. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a method for identifying a purchase intention of a product according to an embodiment of the present invention, including the following steps:
step 101, determining multi-dimensional characteristics related to commodity purchase intention based on analysis of customer site communication records;
102, screening the features based on the correlation among the features;
103, establishing a recognition model which takes the screened features as input and the purchase intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and step 104, predicting the commodity purchasing intention of the customer by using the trained recognition model.
In this embodiment, step 101 is mainly used to determine characteristics related to the purchase intention of the product. The purchase intention of the goods is a subjective intention of the customer objectively existing in the sales context of the goods. Through sorting the communication content with the customers in the sales scene, a multi-level and multi-dimensional index frame and system are constructed, and a foundation is laid for data modeling and quantitative analysis. Taking automobile sales as an example, by analyzing the communication content with customers in an automobile sales scene, firstly, 5 primary indexes such as price, procedure, service, configuration and personal demand and 12 subordinate secondary indexes are sorted and selected; then arranging and selecting three core dimensions of high efficiency, pertinence and depth; finally, through further cross analysis of the indexes and the dimensionality, 27 characteristic indexes are selected, wherein the effectiveness indexes comprise 4 overall effectiveness, client effectiveness, advisor effectiveness and relative effectiveness; the pertinence indexes are 13: price pertinence, offer pertinence, other price pertinence, procedure pertinence, payment procedure pertinence, other procedure pertinence, service pertinence, pre-sale service pertinence, post-sale service pertinence, service other pertinence, configuration pertinence, customer pertinence; the depth indexes are 10: price depth, offer depth, other price depth, procedure payment depth, other procedure depth, service depth, configuration depth, customer depth. The obtained characteristic indexes are all derived from the content and the mode in the communication process of the salesman and the customer, other external information and data are not involved, and a better basis is provided for the establishment, the optimization and the like of the model.
In this embodiment, step 102 is mainly used to perform feature screening. In order to avoid missing important features, the preliminarily determined features related to purchasing will generally be many, such as the aforementioned 27 features, which, if all are used as input features for identifying the model without distinction, will greatly affect the accuracy and complexity of the model, and therefore, must be screened first. In fact, many of these features have strong or even strong correlation between them, and some of them can be deleted for the features with more significant correlation. The present embodiment screens features based primarily on correlations between features. The following embodiment will provide a specific technical solution for feature screening.
In this embodiment, step 103 is mainly used to establish an identification model and model optimization. The purchase intention of the goods can be divided into different number of categories (grades or levels) according to specific application scenarios, such as three categories of high, medium and low, which are common, and two categories of low and high. This embodiment is to classify the buying will into two categories. Since there is no specific specification as to what is "low" and what is "high", to avoid causing ambiguity in the concept, the present implementation refers to the two categories as purchase willingless and purchase willingless, which are actually equivalent to the purchase willingless and purchase willingless, but the call is more normative. In the embodiment, the screened features are used as input, and whether purchase intention exists or not is used as output to construct the recognition model. The recognition model is actually a two-classifier, and there are many technical schemes capable of implementing two-classification, and this embodiment does not limit the specific recognition model, and the following embodiments will provide a specific recognition model. In order to improve the accuracy of identifying the model, the model is further optimized after the model is established. The optimization method comprises the steps of screening input features one by one, respectively detecting the influence of the input features on model precision, and deleting the features which are not obvious in improving the model precision, so that the recognition effect of the model is optimal. The latter embodiment will give a specific model optimization scheme.
In this embodiment, step 104 is mainly used to predict (identify) the purchase intention of the product of the customer. In the embodiment, the trained recognition model is used, and the characteristic value obtained based on the communication record of the customer is used as the input of the recognition model, so that the commodity purchasing intention of the customer can be conveniently obtained. The data set of the training model is from a database previously built based on customer communication records. An embodiment of creating a database will be given later.
As an optional embodiment, the method further comprises the step of constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, the value of each characteristic value is a group of continuous integers, the size of each characteristic value represents the favorable degree of the deal, and the larger the characteristic value is, the more favorable the deal is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
The embodiment provides a technical scheme for constructing the database. The database is mainly used for constructing a model training data set, a model testing data set and the like. As mentioned above, the data resources of the database come from the communication records between the sales personnel and the customers. In order to construct a high-quality database, the inventor has arranged about 3000 parts of automobile sales site communication records summarizing luxury brands, middle and high-end brands, mainstream joint-investment brands and partial independent brands, text-converted the records by language-text conversion and semantic recognition technology, and established a database consisting of characteristic values of each characteristic and communication results (whether the communication is successful or not after two weeks) namely tags according to the determined 27 characteristic indexes. The value of the label is only 0 and 1, 0 represents that the transaction is not completed within 2 weeks, and 1 represents that the transaction is completed within 2 weeks; the value of each characteristic value is a group of continuous integers, such as 1-5, the size of each characteristic value represents the favorable degree of the transaction, and the larger the characteristic value is, the higher the possibility of the transaction is.
As an alternative embodiment, the recognition model is a naive bayes classifier, and the classifier output y is:
Figure BDA0003193939980000081
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
The embodiment provides a technical scheme of the recognition model. As described above, the naive bayes classifier is adopted as the recognition model in the embodiment, because the naive bayes algorithm is discovered to have the advantages that the considered factors are more diversified, the factors synchronously influence the result rather than a layer-by-layer progressive mode through the matching discussion and argument with the real sales scene, the output result is obtained according to the influence probability comparison of the factors, the operation is relatively simpler and more efficient, and the like, and the naive bayes classifier is more suitable for judging the purchase intention of the customer in the sales scene.
The naive Bayes algorithm is a simple and efficient classification algorithm which is recognized at present and is a classification method based on Bayes theorem and independent hypothesis of characteristic conditions. For a given training data set, first learning a joint probability distribution of input/output based on feature condition independent assumptions; based on this model, the output y with the highest posterior probability is then found for a given input x using bayesian theorem. Bayesian theorem is a theorem on the conditional probabilities of random events Y and X. Where P (Y | X) is the probability of Y occurring in the event X occurs. The bayesian theorem equation is expressed as:
Figure BDA0003193939980000091
when classifying by naive Bayes method, for given input x, the posterior probability distribution P (Y ═ Y) is calculated by the learned modelkI X ═ X), the class with the highest posterior probability is output as the class of X. The posterior probability calculation is performed according to Bayes' theorem:
Figure BDA0003193939980000092
ykrespectively representing k categories of the target variable Y. Since the denominator is for all ykAre all the same, and conditional independence assumptions are made on conditional probability distributions, so a naive bayes classifier can be expressed as:
Figure BDA0003193939980000093
in the following, the application of the naive bayes algorithm is briefly described by taking 3 feature data and labels as training samples. The feature X of the training sample is { a4, B12, C6}, a4, B12, C6 each represent a feature; the output variable Y has two values {0,1}, which respectively represent no purchase intention and purchase intention. The training sample data is shown in tables 1,2 and 3.
TABLE 1
Figure BDA0003193939980000094
TABLE 2
Figure BDA0003193939980000101
TABLE 3
Figure BDA0003193939980000102
According to the training sample data table, the following probabilities are calculated:
P(Y=0)=2112/2970,P(Y=1)=858/2970
P(X1=2|Y=0)=506/2112,P(X2=3|Y=0)=440/2112,P(X3=4|Y=0)=484/2112
P(X1=2|Y=1)=88/858,P(X2=3|Y=1)=198/858,P(X3=4|Y=1)=374/858
therefore, the first and second electrodes are formed on the substrate,
P(Y=0|X)=P(Y=0)*P(X1=2|Y=0)*P(X2=3|Y=0)*P(X3=4|Y=0)
=(2112/2970)*(506/2112)*(440/2112)*(484/2112)=0.008
P(Y=1|X)=P(Y=1)*P(X1=2|Y=1)*P(X2=3|Y=1)*P(X3=4|Y=1)
=(858/2970)*(88/858)*(198/858)*(374/858)=0.003
since P (Y ═ 0| X) ═ 0.008> P (Y ═ 1| X) ═ 0.003, the predicted value of the test sample target variable was 0, representing no purchase intention.
As an alternative embodiment, the method of screening features based on correlations between features includes:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
Figure BDA0003193939980000111
Figure BDA0003193939980000112
Figure BDA0003193939980000113
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
This embodiment provides a technical solution for screening features based on correlations between features. The technical scheme adopted by the embodiment is as follows: firstly, sequencing all the characteristics; and then calculating the correlation coefficient between any two features, if the absolute value of the correlation coefficient of a certain two features exceeds a set threshold, indicating that the two features have remarkable correlation, and deleting the next feature. Since the correlation coefficient has a positive or negative score (positive means positive correlation, and negative means negative correlation), the strength of the correlation should be determined according to the absolute value of the correlation coefficient. The threshold value may be empirically determined, and may be 0.6, for example. Feature ordering is based on the magnitude of the F1 value for a recognition model with only one input feature, with larger F1 ordering higher. To improve accuracy, F1 takes multiple calculated F1 means. When the F1 values are the same, the F1 coefficient of variation is considered, and the row with the smaller F1 coefficient of variation is in the front. The calculation formula of F1 is the same as the previous formula and will not be described in detail here.
As an alternative embodiment, the method for optimizing the recognition model includes:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimized models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
The embodiment provides a technical scheme for optimizing the recognition model. In the embodiment, the recognition model is optimized by adopting a method for screening the input features one by one, and the specific method is as follows: deleting an input feature, and calculating F1 mean value and F1 variation coefficient of a model which takes the residual feature as input after the feature is deleted; deleting another feature after restoring the previously deleted feature, and calculating the F1 mean value and the F1 coefficient of variation after deleting the feature; the above steps are repeated until the same operation is performed once for each input feature. If the F1 mean value corresponding to a certain feature is the largest, or the F1 coefficient of variation is the smallest when the parallel is the largest, and the F1 mean value is larger than the F1 mean value before optimization (without deleting any feature), the model can be optimized by deleting the feature and is better than deleting other features, so that the model after deleting the feature is taken as a primary optimization model. And then, performing second and third optimization on the primary optimization model by adopting the same method until the maximum value of the F1 mean value of the model after one or more characteristics are deleted is smaller than the F1 mean value of the previous model, so as to obtain the final optimization model.
Fig. 2 is a schematic composition diagram of an apparatus for recognizing a purchase intention of a product according to an embodiment of the present invention, the apparatus including:
the characteristic determining module 11 is used for determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of the customer site communication records;
a feature screening module 12 for screening features based on correlations between the features;
the model optimization module 13 is used for establishing a recognition model which takes the screened features as input and the purchase intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and the intention recognition module 14 is used for predicting the commodity purchasing intention of the customer by using the trained recognition model.
The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again. The same applies to the following embodiments, which are not further described.
As an optional embodiment, the apparatus further includes a database construction module, configured to construct a database, where each data record of the database corresponds to an effective customer communication record, each data record includes a feature value and a result, a value of each feature value is a set of continuous integers, a size of each feature value indicates a degree of benefit for a deal, and a larger feature value is more beneficial for the deal; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
As an alternative embodiment, the recognition model is a naive bayes classifier, and the classifier output y is:
Figure BDA0003193939980000131
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
As an alternative embodiment, the method of screening features based on correlations between features includes:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
Figure BDA0003193939980000132
Figure BDA0003193939980000133
Figure BDA0003193939980000134
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
As an alternative embodiment, the method for optimizing the recognition model includes:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A commodity purchase intention identification method is characterized by comprising the following steps:
determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of customer site communication records;
screening the features based on the correlation between the features;
establishing a recognition model which takes the screened features as input and the purchasing intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and predicting the commodity purchasing intention of the customer by using the trained recognition model.
2. The method according to claim 1, further comprising the step of constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, each characteristic value is a set of continuous integers, the size of each integer represents the degree of the advantage of the transaction, and the larger the characteristic value is, the more the transaction is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
3. The method according to claim 2, wherein the recognition model is a naive bayes classifier, and the classifier output y is:
Figure FDA0003193939970000011
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
4. The method of recognizing a purchase intention of a commodity according to claim 3, wherein the method of screening the features based on the correlation between the features comprises:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
Figure FDA0003193939970000021
Figure FDA0003193939970000022
Figure FDA0003193939970000023
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
5. The method as claimed in claim 4, wherein the method for optimizing the recognition model comprises:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
6. An article purchase intention recognition apparatus, comprising:
the characteristic determining module is used for determining multi-dimensional characteristics related to commodity purchasing willingness based on analysis of the customer site communication records;
the characteristic screening module is used for screening the characteristics based on the correlation among the characteristics;
the model optimization module is used for establishing a recognition model which takes the screened features as input and the purchase intention as output, and performing model optimization by screening the influence degree of each input feature on the model one by one;
and the intention recognition module is used for predicting the commodity purchasing intention of the customer by using the trained recognition model.
7. The apparatus according to claim 6, further comprising a database construction module for constructing a database, wherein each data record of the database corresponds to an effective customer communication record, each data record comprises a characteristic value and a result, each characteristic value is a set of continuous integers, the size of each integer represents a favorable degree for a deal, and the larger the characteristic value is, the more favorable the deal is; the result only takes two values of 0 and 1, which respectively represent that no deal and no deal are found within the set time threshold after communication.
8. The apparatus according to claim 7, wherein the recognition model is a naive bayes classifier, and the classifier output y is:
Figure FDA0003193939970000031
wherein P (-) represents the probability of evaluation, X, Y represents the input/output variable, xiIs the ith characteristic value, k is 1,2, y1=0,y21, i-1, 2, …, N being the number of features.
9. The apparatus of claim 8, wherein the method of screening the characteristics based on the correlation between the characteristics comprises:
first, all features are sorted as follows:
selecting positive samples and negative samples with the same quantity from a database, wherein the positive samples are data with a result of 1, and the negative samples are data with a result of 0;
establishing a recognition model aiming at each single feature, wherein the output of the recognition model is 1 and 0, the recognition model respectively represents purchase intention and purchase intention, and the prediction results of the model on the positive type samples and the negative type samples are counted;
the evaluation index F1 was calculated from the statistical results according to the following formula:
Figure FDA0003193939970000032
Figure FDA0003193939970000033
Figure FDA0003193939970000041
in the formula, P is the accuracy rate, R is the recall rate, TP is the quantity predicted to have purchase intention for the positive type sample, FN is the quantity predicted to have no purchase intention for the positive type sample, FP is the quantity predicted to have purchase intention for the negative type sample;
repeatedly executing the steps, and calculating the F1 mean value and the F1 coefficient of variation of each feature;
sorting each feature according to the sequence from large to small of the F1 mean value, and sorting the features with the same F1 mean value according to the sequence from small to large of the F1 coefficient of variation to obtain the final sorting of the features;
then, a correlation coefficient of any two features is calculated, and for the two features of which the absolute value of the correlation coefficient is larger than a set threshold, the one feature after the correlation coefficient is deleted.
10. The apparatus of claim 9, wherein the method for optimizing the recognition model comprises:
calculating F1 mean J _0 and F1 coefficient of variation B _0 of the original recognition model of the N input features;
deleting one feature from N input features one by one, establishing an identification model taking the remaining N-1 features as input, and calculating F1 mean J _1 and F1 coefficient of variation B _ 1; if J _1 is the largest after the characteristic A is deleted, or B _1 is the smallest when J _1 is the largest in parallel, and J _1 is larger than J _0, the recognition model after the characteristic A is deleted is an optimized model of N-1 input characteristics;
carrying out the same operation on the optimization models of the N-1 input features to obtain the optimization models of the N-2 input features, wherein the maximum value of the F1 mean value is J _2> J _ 1;
repeating the steps to obtain N-m optimization models of the input features, wherein the maximum value of the F1 mean value is J _ m > J _ (m-1); and if the maximum value of the F1 mean value after one or more characteristics are deleted is smaller than J _ m, the optimization models of the N-m input characteristics are the final optimization models, and m is more than or equal to 0 and less than N.
CN202110885513.5A 2021-08-03 2021-08-03 Commodity purchase intention recognition method and device Active CN113763032B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110885513.5A CN113763032B (en) 2021-08-03 2021-08-03 Commodity purchase intention recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110885513.5A CN113763032B (en) 2021-08-03 2021-08-03 Commodity purchase intention recognition method and device

Publications (2)

Publication Number Publication Date
CN113763032A true CN113763032A (en) 2021-12-07
CN113763032B CN113763032B (en) 2023-08-04

Family

ID=78788422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110885513.5A Active CN113763032B (en) 2021-08-03 2021-08-03 Commodity purchase intention recognition method and device

Country Status (1)

Country Link
CN (1) CN113763032B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127546A (en) * 2016-06-20 2016-11-16 重庆房慧科技有限公司 A kind of Method of Commodity Recommendation based on the big data in intelligence community
CN106447384A (en) * 2016-08-31 2017-02-22 五八同城信息技术有限公司 Method and apparatus for determining object user
CN110555717A (en) * 2019-07-29 2019-12-10 华南理工大学 method for mining potential purchased goods and categories of users based on user behavior characteristics
CN111681051A (en) * 2020-06-08 2020-09-18 上海汽车集团股份有限公司 Purchasing intention degree prediction method, device, storage medium and terminal
CN113095861A (en) * 2020-01-08 2021-07-09 浙江大搜车软件技术有限公司 Method, device and equipment for predicting target object transaction probability and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127546A (en) * 2016-06-20 2016-11-16 重庆房慧科技有限公司 A kind of Method of Commodity Recommendation based on the big data in intelligence community
CN106447384A (en) * 2016-08-31 2017-02-22 五八同城信息技术有限公司 Method and apparatus for determining object user
CN110555717A (en) * 2019-07-29 2019-12-10 华南理工大学 method for mining potential purchased goods and categories of users based on user behavior characteristics
CN113095861A (en) * 2020-01-08 2021-07-09 浙江大搜车软件技术有限公司 Method, device and equipment for predicting target object transaction probability and storage medium
CN111681051A (en) * 2020-06-08 2020-09-18 上海汽车集团股份有限公司 Purchasing intention degree prediction method, device, storage medium and terminal

Also Published As

Publication number Publication date
CN113763032B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
Cho et al. A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy prediction
Oh et al. Analyzing stock market tick data using piecewise nonlinear model
US10552735B1 (en) Applied artificial intelligence technology for processing trade data to detect patterns indicative of potential trade spoofing
CN108921602B (en) User purchasing behavior prediction method based on integrated neural network
CN110852881B (en) Risk account identification method and device, electronic equipment and medium
CN113469730A (en) Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN114612251A (en) Risk assessment method, device, equipment and storage medium
CA3165582A1 (en) Data processing method and system based on similarity model
CN111754317A (en) Financial investment data evaluation method and system
CN111882420A (en) Generation method of response rate, marketing method, model training method and device
CN115526652A (en) Client loss early warning method and system based on machine learning
CN112561320A (en) Training method of mechanism risk prediction model, mechanism risk prediction method and device
CN107133862A (en) Dynamic produces the method and system of the detailed transaction payment experience of enhancing credit evaluation
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
Kumar et al. Credit score prediction system using deep learning and k-means algorithms
Kim et al. Predicting corporate defaults using machine learning with geometric-lag variables
CN114118793A (en) Local exchange risk early warning method, device and equipment
Cao et al. Financial crisis forecasting via coupled market state analysis
CN113763032A (en) Commodity purchase intention identification method and device
CN116502813A (en) Abnormal order detection method based on ensemble learning
EP3493082A1 (en) A method of exploring databases of time-stamped data in order to discover dependencies between the data and predict future trends
Ragapriya et al. Machine Learning Based House Price Prediction Using Modified Extreme Boosting
Niknya et al. Financial distress prediction of Tehran Stock Exchange companies using support vector machine
CN112232945A (en) Method and device for determining personal customer credit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant