CN111275459A

CN111275459A - Cigarette brand recommendation algorithm based on consumer modeling

Info

Publication number: CN111275459A
Application number: CN202010064585.9A
Authority: CN
Inventors: 韩冬; 刘培江; 韩慧健; 贾可亮; 张锐; 刘峥; 韩凤
Original assignee: Shandong Tobacco Institute Co ltd
Current assignee: Shandong Tobacco Institute Co ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-12

Abstract

A cigarette brand recommendation algorithm based on consumer modeling sequentially comprises the following steps: according to the consumer consumption record, associating the cigarette information and the cigarette brand information, counting and extracting the characteristics reflecting the preference information of the consumer, calculating the weight of different characteristic values of each characteristic based on a TF-IDF (word frequency-inverse document frequency) algorithm idea, and constructing a consumer consumption preference characteristic model; extracting the same characteristics from the cigarette brand and cigarette information to establish a cigarette brand characteristic vector; and then defining a consumer consumption preference similarity calculation method, and calculating to realize a cigarette brand recommendation algorithm based on consumer consumption preference based on the cigarette brand feature vector and the consumer consumption preference feature similarity.

Description

Cigarette brand recommendation algorithm based on consumer modeling

Technical Field

The invention relates to the field of commodity brand recommendation research, in particular to consumer consumption preference feature modeling and cigarette brand recommendation and popularization research in the tobacco industry.

Background

The characteristic of tobacco monopoly results in that a customer manager in the tobacco industry can only visit retail customers, a tobacco sales enterprise can only obtain sales information of the retail customers, the information of final consumers is difficult to obtain, and the consumers are difficult to analyze so as to make a more targeted sales strategy. According to the invention, the two-dimension code information is set through the cigarette external package, a consumer can scan the two-dimension code through a mobile phone to obtain the purchased cigarette information so as to identify the authenticity, and meanwhile, a tobacco sales company can also obtain the consumer consumption record. The consumption records of the consumers reflect the consumption preference and interest of the consumers, so that the user representation of the consumption preference of the consumers is realized by analyzing the consumption records of the consumers and mining the contained characteristics, and the method has important research significance for cigarette brand recommendation and new brand market promotion.

The classic basic idea based on the content recommendation algorithm is to complete the personalized recommendation process according to the matching of the feature tag of the item to be recommended and the user preference attribute, specifically, the recommendation is completed according to the similarity by calculating the similarity between the personal preference information feature of the user and the description feature of the item to be recommended, and the item with high similarity is recommended to the user.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for obtaining cigarette consumer information and establishing a consumer consumption preference model to realize cigarette brand recommendation. According to the method, the consumption records of the consumers in the tobacco industry are researched and analyzed, the consumption records of the consumers are connected with cigarette information, 14 characteristics capable of reflecting the consumption preference of the consumers are extracted, weights of different characteristic values in the consumption preference characteristics of different consumers are calculated by using a TF-IDF algorithm for reference, and a consumption preference vector model of the consumers is established; establishing a cigarette brand feature vector model according to the preference features of the consumers, and realizing similarity calculation between consumer vectors and between the consumers and the cigarette brand vectors; the minimum spanning tree algorithm is utilized to realize the cigarette brand recommendation of consumers, and the acceptance rate of the cigarette brand recommendation is improved.

The invention provides a cigarette brand recommendation method based on consumer modeling, which is characterized by sequentially comprising the following steps of:

step 1: designing two-dimension code information of the cigarette external package, wherein a consumer can scan the two-dimension code through a mobile phone to acquire the information of the purchased cigarette so as to identify the authenticity, and meanwhile, a tobacco sales company can acquire consumer consumption records;

step 2: based on TF-IDF (word frequency-inverse document frequency) consumer modeling, a consumer set is analogized to a document set, a consumption record of each consumer in the consumer set is equal to a document in the document set, and different feature values of various features in the consumption record of the consumer are analogized to words in the document in the same way, so that a consumer consumption preference feature vector is constructed;

and step 3: modeling the cigarette brand feature vector, and extracting features the same as the consumer feature vector to construct the cigarette brand feature vector;

and 4, step 4: the similarity of the characteristic vectors of the consumers is calculated as the average value of the similarity of the corresponding characteristics of the preference vectors of the two consumers, and the calculation method of the similarity of the characteristic vectors of the consumers and the characteristic vectors of the cigarette brands is the same as the calculation method of the similarity between the characteristic vectors of the two consumers;

and 5: and calculating to realize a cigarette brand recommendation algorithm based on the consumer consumption preference based on the similarity of the cigarette brand feature vector and the consumer consumption preference feature. The algorithm comprises the steps of firstly constructing a consumer consumption preference model according to consumer consumption record data, carrying out cluster analysis on consumer groups, dividing consumers into a plurality of group sets, calculating cluster centroids, constructing cigarette brand feature vectors for cigarette brands to be promoted, calculating the similarity between the cigarette brand feature vectors and each cluster centroid, and taking the consumer cluster group with the maximum similarity as the consumer group to be recommended.

Preferably, the step 2 specifically comprises:

step 2.1: the consumer collection is analogized to the document collection, the consumption record of each consumer in the consumer collection is equal to the document in the document collection, 14 characteristics of cigarette production place, enterprise name, trademark, aroma type, grade and the like in the consumption record of the consumer are extracted, and a consumer consumption preference characteristic vector is constructed, such as a formula

（1）

Step 2.2: each feature in the consumer consumption record

Comprising a plurality of eigenvalues, the different eigenvalues being analogous to words in the document and themselves constituting an eigenvalue vector, e.g. a formula

（2）

Step 2.3: in the case of step 1.2,

representation feature

The value of the j-th characteristic of (c),

the weight of the characteristic value is expressed by the following calculation formula

（3）

Wherein,

representing characteristic values

At the consumer

The number of times of occurrence in the consumption record is larger, the characteristic can reflect the preference of the consumer,

representing characteristic values

Maximum among all consumers for weighting

And (4) normalization processing.

The number of the total consumers is the number of the total consumers,

to include the characteristic value

The number of the consumers of (1),

the larger the value is, the characteristic value is represented

The more popular, it is difficult to express consumer preferences.

Preferably, the step 3 specifically comprises:

step 3.1: for cigarette brands, 14 characteristics such as production places, enterprise names, commodity names, specification models, odor types, grades and the like are selected to construct cigarette brand characteristic vectors

；

Step 3.2: for each feature in the cigarette brand

The characteristic value is uniqueAnd (4) setting the value, wherein the weight is 1 without statistical characteristics, and the weight is 0 when the rest characteristic values do not exist. For example, Yuxi (soft) has a faint scent type, the corresponding weight of a faint scent type characteristic value in the scent type characteristic is 1, other scent type characteristic values do not exist in the brand, and in order to facilitate similarity calculation of a cigarette brand characteristic vector and a consumer characteristic vector, the characteristic value vector of each characteristic of the cigarette brand is expanded to have the same dimension as that of each characteristic value vector of a consumer preference characteristic, so that the weight of the characteristic value which does not exist in a certain characteristic of the cigarette brand is set to be 0.

Preferably, the step 4 specifically includes:

step 4.1: two consumer customers

And

the feature vector of (a) is:

and

then, then

And

the similarity of (2) is the average value of the similarity values of the corresponding features of the two consumer consumption preference vectors, namely:

（4）

step 4.2:

calculating the similarity of corresponding features by utilizing the cosine value of the included angle of the feature value vector:

（5）

wherein,

and

is characterized in that

And

the weight of the corresponding feature value.

Preferably, the step 4 specifically includes:

step 5.1: the method comprises the following steps of obtaining consumer clusters by adopting a minimum spanning tree algorithm:

step 5.1.1: in the consumer vector space, the vertex of each consumer forming graph and the similarity value of two consumer vectors forming graph edges, and further all consumers are formed in a weighted undirected graph

In the middle, in

The independent consumers are in an initial state, and no edge connection exists among the consumers;

step 5.1.2: for all consumer vectors, calculating the similarity between any two vectors, and sorting the vectors in a descending order according to the similarity, namely sorting the vectors in a descending order according to the weight of the edges between two consumers;

step 5.1.3: selecting the edge with the maximum similarity value from the sorting queue obtained in the step 5.1.2, dequeuing, if the two consumer vertexes connected with the edge are positioned in different connected components, linking and merging the two connected components into a new connected component by the edge, and otherwise, neglecting the edge and not processing;

step 5.1.4: iteratively processing the process in step 5.1.3 until used

The minimum edges are connected with all the consumer vertexes, namely, a minimum spanning tree is generated;

step 5.1.5: deleting the edge with the minimum weight value in the minimum spanning tree (namely deleting the connection between the least similar consumers) in sequence until deleting the connection between the least similar consumers

Edges, then all vertices and edge components remain

A connected component, this

A connected component is

An initial consumer cluster;

step 5.1.6: for each initial consumer cluster, averaging all paired vertex consumer vectors in the connected components to obtain

Centroid of each initial consumer cluster:

（6）

wherein

Representing a cluster

The number of consumers in;

step 5.1.7: recalculating the distance between the vector of each consumer and the centroid vector of each cluster, and classifying the distance into the closest consumer cluster;

step 5.1.8: after the adjustment of all the consumer clusters is completed, the centroid of each consumer cluster is recalculated;

step 5.1.9: repeating the calculating steps 5.1.7 and 5.1.8 until the error function converges;

step 5.2: calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid;

step 5.2.1: for a pair of corresponding features in the cigarette brand feature vector and each consumer clustering centroid vector, calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid vector by adopting a formula (5);

step 5.2.2: repeating the step 5.2.1 to obtain similarity values of all corresponding features, and calculating the similarity of the cigarette brand features and the clustering mass centers of each consumer by adopting the average value of the similarity of the corresponding features of the cigarette brand feature vector and the clustering mass center feature vector of the consumer, such as a formula (4);

step 5.2.3: and (5.2.2) repeating the step to obtain the similarity value between the cigarette brand feature vector and all the consumer cluster centroids, namely the similarity between the cigarette brand and the consumption preference features of all the consumer cluster groups.

Step 5.3: and taking the consumer clustering group with the maximum similarity value as a new brand promotion group.

Preferably, the extracted consumer consumption preferences are respectively cigarette production area, enterprise name, trademark, commodity name, specification and model, odor type, grade, draw resistance, length, hardness, circumference, tar content, smoke nicotine content, smoke carbon monoxide content.

Preferably, the two-dimensional code scanning device is a mobile phone, and the identification software is a special App or a wechat applet.

Detailed Description

Reference will now be made in detail to the embodiments of the present invention, the following examples of which are intended to be illustrative only and are not to be construed as limiting the scope of the invention.

1. Modeling consumer consumption preferences

The consumption record connects the consumer, the cigarette brand information and the cigarette information together, and implies the preference of the consumer to a certain brand, a certain model or cigarettes with certain characteristics, so that 14 characteristics of the cigarette brand information and the cigarette information characteristics are extracted to form a consumer cigarette preference characteristic vector.

Each feature in the consumer preference feature vector includes a plurality of feature values, for example, the name of a product may be: yuxi (soft), hadenmen (pure incense), taishan (white general), taishan (red general), taishan (hope Yue), etc., and each feature value appears in different consumer consumption records at different times, where the number of times of feature values represents a consumer's preference for a cigarette brand having such feature value.

In order to accurately describe the importance of different feature values to the preference of the consumer, the invention calculates the weights of different features in the preference feature vector of the consumer by using the classic TF-IDF algorithm thought in text similarity calculation. The invention analogizes the consumer set to the document set, the consumption record of each consumer in the consumer set is equal to the document in the document set, and analogizes different feature values of each feature in the consumer consumption record to the words in the document in the same way, so as to construct the consumer consumption preference feature vector, such as a formula

（1）

Wherein each feature

Comprising a plurality of eigenvalues which in turn form an eigenvalue vector, e.g. a formula

（2）

Wherein,

representation feature

The value of the j-th characteristic of (c),

（3）

Wherein,

representing characteristic values

At the consumer

representing characteristic values

Maximum among all consumers for weighting

And (4) normalization processing.

The number of the total consumers is the number of the total consumers,

to include the characteristic value

The number of the consumers of (1),

the larger the value is, the characteristic value is represented

The more popular, it is difficult to express consumer preferences.

2. Cigarette brand modeling

For cigarette brands, 14 characteristics such as production places, enterprise names, commodity names, specification models, odor types, grades and the like are selected to construct cigarette brand characteristic vectors

. For each feature in the cigarette brand

The feature value is a unique determination value, has no statistical property, and has a weight of 1, and the rest of feature values are absent, and have a weight of 0. For example, Yuxi (soft) has a faint scent type, the corresponding weight of a faint scent type characteristic value in the scent type characteristic is 1, other scent type characteristic values do not exist in the brand, and in order to facilitate similarity calculation of a cigarette brand characteristic vector and a consumer characteristic vector, the characteristic value vector of each characteristic of the cigarette brand is expanded to have the same dimension as that of each characteristic value vector of a consumer preference characteristic, so that the weight of the characteristic value which does not exist in a certain characteristic of the cigarette brand is set to be 0.

3. Consumer consumption preference similarity calculation

Two consumer customers

And

the feature vector of (a) is:

and

then, then

And

（4）

wherein,

（5）

wherein,

and

is characterized in that

And

the weight of the corresponding feature value.

4. Consumer group recommendation algorithm

The consumer group recommendation algorithm comprises the steps of firstly constructing a consumer consumption preference model according to consumer consumption record data, carrying out cluster analysis on consumer groups, dividing consumers into a plurality of group sets, calculating cluster centroids, constructing cigarette brand feature vectors for cigarette brands to be promoted, calculating the similarity between the cigarette brand feature vectors and each cluster centroid, and taking the consumer cluster group with the maximum similarity as the consumer group to be recommended. The algorithm comprises the following specific steps:

step 1: the method comprises the following steps of obtaining consumer clusters by adopting a minimum spanning tree algorithm:

step 1.1: in the consumer vector space, the vertex of each consumer forming graph and the similarity value of two consumer vectors forming graph edges, and further all consumers are formed in a weighted undirected graph

In the middle, in

step 1.2: for all consumer vectors, calculating the similarity between any two vectors, and sorting the vectors in a descending order according to the similarity, namely sorting the vectors in a descending order according to the weight of the edges between two consumers;

step 1.3: selecting the edge with the maximum similarity value from the sorting queue obtained in the step 1.2, dequeuing, if the two consumer vertexes connected with the edge are positioned in different connected components, linking and merging the two connected components into a new connected component by the edge, and otherwise, neglecting the edge and not processing;

step 1.4: iteratively processing the process in step 1.3 until used

step 1.5: deleting the edge with the minimum weight value in the minimum spanning tree (namely deleting the connection between the least similar consumers) in sequence until deleting the connection between the least similar consumers

Edges, then all vertices and edge components remain

A connected component, this

A connected component is

An initial consumer cluster;

step 1.6: for each initial consumer cluster, averaging all paired vertex consumer vectors in the connected components to obtain

Centroid of each initial consumer cluster:

（6）

wherein

Representing a cluster

The number of consumers in;

step 1.7: recalculating the distance between the vector of each consumer and the centroid vector of each cluster, and classifying the distance into the closest consumer cluster;

step 1.8: after the adjustment of all the consumer clusters is completed, the centroid of each consumer cluster is recalculated;

step 1.9: repeating the calculation steps 1.7 and 1.8 until the error function converges;

step 2: calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid;

step 2.1: for a pair of corresponding features in the cigarette brand feature vector and each consumer clustering centroid vector, calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid vector by adopting a formula (5);

step 2.2: repeating the step 2.1 to obtain similarity values of all corresponding features, and calculating the similarity of the cigarette brand features and the clustering mass centers of each consumer by adopting the average value of the similarity of the corresponding features of the cigarette brand feature vectors and the clustering mass centers of the consumers, as shown in a formula (4);

step 2.3: and (3) repeating the step 2.2 to obtain similarity values between the cigarette brand feature vector and all the consumer clustering centroids, namely the similarity between the cigarette brand and the consumption preference features of all the consumer clustering groups.

And step 3: and taking the consumer clustering group with the maximum similarity value as a new brand promotion group.

By the algorithm, a proper consumer group can be recommended for the cigarette brand promotion, and a reference basis is provided for brand promotion planning.

Although exemplary embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions, substitutions and the like can be made in form and detail without departing from the scope and spirit of the invention as disclosed in the accompanying claims, all of which are intended to fall within the scope of the claims, and that various steps in the various sections and methods of the claimed product can be combined together in any combination. Therefore, the description of the embodiments disclosed in the present invention is not intended to limit the scope of the present invention, but to describe the present invention. Accordingly, the scope of the present invention is not limited by the above embodiments, but is defined by the claims or their equivalents.

Claims

1. A cigarette brand recommendation algorithm based on consumer modeling is characterized by sequentially comprising the following steps:

and 5: the method comprises the steps of calculating and realizing a cigarette brand recommendation algorithm based on consumer preference based on cigarette brand feature vectors and consumer preference feature similarity, firstly constructing a consumer preference model according to consumer consumption record data, carrying out cluster analysis on consumer groups, dividing consumers into a plurality of group sets, calculating clustering centroids, constructing cigarette brand feature vectors for cigarette brands to be promoted, calculating the similarity between the cigarette brand feature vectors and each clustering centroid, and taking the consumer clustering group with the maximum similarity as the consumer group to be recommended.

2. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 2 are as follows:

U＝{F_i|1≤i≤14} (1)

Step 2.2: each feature F in the consumer consumption record_iContaining a plurality of eigenvalues, these different eigenvalues being analogous to words in the document, itselfAnd forming a vector of eigenvalues, e.g. formula

F_i＝{(f_ij,w_ij)|1≤j≤m} (2)

Step 2.3: for step 1.2, f_ijRepresents a feature F_iJ-th characteristic value of (1), w_ijThe weight of the characteristic value is expressed by the following calculation formula

Wherein, tf_ijRepresenting the characteristic value f_ijThe number of times of occurrence in the consumption record of the consumer U is larger, the value is larger, the characteristic can reflect the preference of the consumer, maxtf_ijRepresenting the characteristic value f_ijMaximum among all consumers for the weight w_ijN is the total number of consumers, N is_ijTo contain the characteristic value f_ijThe number of consumers, n_ijThe larger the value is, the characteristic value f is represented_ijThe more popular, it is difficult to express consumer preferences.

3. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 3 are as follows:

step 3.1: for cigarette brands, 14 features such as production places, enterprise names, commodity names, specification models, odor types, grades and the like are selected, and cigarette brand feature vectors B ═ F are constructed_i|1≤i≤14}；

Step 3.2: for each feature F in a brand of cigarettes_iThe feature value is a unique determination value, has no statistical property, and has a weight of 1, and the rest of feature values are absent, and have a weight of 0. For example, Yuxi (soft) has a faint scent type, the weight corresponding to the characteristic value of the faint scent type in the scent type characteristics is 1, the characteristic values of other scent types do not exist in the brand, and in order to calculate the similarity between the characteristic vector of the brand of cigarette and the characteristic vector of consumers, the characteristic value vector of each characteristic of the brand of cigarette is expanded to have the same dimension as that of each characteristic value vector of the preference characteristic of consumersAnd the weight value of the characteristic value which does not exist in certain characteristic of the cigarette brand is set to be 0.

4. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 4 are as follows:

step 4.1: two consumer clients U are setⁱAnd U^jThe feature vector of (a) is:

and

then U isⁱAnd U^jThe similarity of (2) is the average value of the similarity values of the corresponding features of the two consumer consumption preference vectors, namely:

step 4.2:

wherein, w_1iAnd w_2iIs characterized by F₁And F₂The weight of the corresponding feature value.

5. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 5 are as follows:

step 5.1.1: in the consumer vector space, forming the graph edges by the vertexes of each consumer forming the graph and the similarity values of two consumer vectors, further forming all consumers in a weighted undirected graph G (V, E), taking n independent consumers as an initial state, and having no edge connection between the consumers;

step 5.1.4: iteratively processing the process in step 5.1.3 until all consumer vertices are connected with the least n-1 edges, i.e. generating a minimum spanning tree;

step 5.1.5: deleting the edge with the minimum weight value in the minimum spanning tree (namely deleting the connection between the least similar consumers) in sequence until the M-1 edge is deleted, and then, forming M connected components by remaining all vertexes and edges, wherein the M connected components are M initial consumer clusters;

step 5.1.6: for each initial consumer cluster, averaging all paired vertex consumer vectors in the connected components to obtain the centroids of the M initial consumer clusters:

wherein | C_iI represents a cluster C_iThe number of consumers in;

step 5.2.3: repeating the step 5.2.2 to obtain similarity values between the cigarette brand feature vectors and all the consumer clustering centroids, namely the similarity between the cigarette brand and the consumption preference features of all the consumer clustering groups;

6. The cigarette brand recommendation algorithm based on consumer modeling of claim 1, further characterized by: and constructing a corresponding consumer consumption preference characteristic vector according to the consumer consumption record by combining the tobacco brand information and the tobacco information, specifically constructing a 14-dimensional vector which is respectively a cigarette production place, an enterprise name, a trademark, a commodity name, a specification model, a fragrance type, a grade, a suction resistance, a length, a hardness, a circumference, tar content, a smoke nicotine content and a smoke carbon monoxide content.

7. The cigarette brand recommendation algorithm based on consumer modeling of claim 1, further characterized by: the two-dimensional code scanning device is a mobile phone, and the identification software is a special App or a WeChat applet.