CN111275459A - Cigarette brand recommendation algorithm based on consumer modeling - Google Patents

Cigarette brand recommendation algorithm based on consumer modeling Download PDF

Info

Publication number
CN111275459A
CN111275459A CN202010064585.9A CN202010064585A CN111275459A CN 111275459 A CN111275459 A CN 111275459A CN 202010064585 A CN202010064585 A CN 202010064585A CN 111275459 A CN111275459 A CN 111275459A
Authority
CN
China
Prior art keywords
consumer
cigarette
similarity
characteristic
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010064585.9A
Other languages
Chinese (zh)
Inventor
韩冬
刘培江
韩慧健
贾可亮
张锐
刘峥
韩凤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Tobacco Institute Co ltd
Original Assignee
Shandong Tobacco Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Tobacco Institute Co ltd filed Critical Shandong Tobacco Institute Co ltd
Priority to CN202010064585.9A priority Critical patent/CN111275459A/en
Publication of CN111275459A publication Critical patent/CN111275459A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A cigarette brand recommendation algorithm based on consumer modeling sequentially comprises the following steps: according to the consumer consumption record, associating the cigarette information and the cigarette brand information, counting and extracting the characteristics reflecting the preference information of the consumer, calculating the weight of different characteristic values of each characteristic based on a TF-IDF (word frequency-inverse document frequency) algorithm idea, and constructing a consumer consumption preference characteristic model; extracting the same characteristics from the cigarette brand and cigarette information to establish a cigarette brand characteristic vector; and then defining a consumer consumption preference similarity calculation method, and calculating to realize a cigarette brand recommendation algorithm based on consumer consumption preference based on the cigarette brand feature vector and the consumer consumption preference feature similarity.

Description

Cigarette brand recommendation algorithm based on consumer modeling
Technical Field
The invention relates to the field of commodity brand recommendation research, in particular to consumer consumption preference feature modeling and cigarette brand recommendation and popularization research in the tobacco industry.
Background
The characteristic of tobacco monopoly results in that a customer manager in the tobacco industry can only visit retail customers, a tobacco sales enterprise can only obtain sales information of the retail customers, the information of final consumers is difficult to obtain, and the consumers are difficult to analyze so as to make a more targeted sales strategy. According to the invention, the two-dimension code information is set through the cigarette external package, a consumer can scan the two-dimension code through a mobile phone to obtain the purchased cigarette information so as to identify the authenticity, and meanwhile, a tobacco sales company can also obtain the consumer consumption record. The consumption records of the consumers reflect the consumption preference and interest of the consumers, so that the user representation of the consumption preference of the consumers is realized by analyzing the consumption records of the consumers and mining the contained characteristics, and the method has important research significance for cigarette brand recommendation and new brand market promotion.
The classic basic idea based on the content recommendation algorithm is to complete the personalized recommendation process according to the matching of the feature tag of the item to be recommended and the user preference attribute, specifically, the recommendation is completed according to the similarity by calculating the similarity between the personal preference information feature of the user and the description feature of the item to be recommended, and the item with high similarity is recommended to the user.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for obtaining cigarette consumer information and establishing a consumer consumption preference model to realize cigarette brand recommendation. According to the method, the consumption records of the consumers in the tobacco industry are researched and analyzed, the consumption records of the consumers are connected with cigarette information, 14 characteristics capable of reflecting the consumption preference of the consumers are extracted, weights of different characteristic values in the consumption preference characteristics of different consumers are calculated by using a TF-IDF algorithm for reference, and a consumption preference vector model of the consumers is established; establishing a cigarette brand feature vector model according to the preference features of the consumers, and realizing similarity calculation between consumer vectors and between the consumers and the cigarette brand vectors; the minimum spanning tree algorithm is utilized to realize the cigarette brand recommendation of consumers, and the acceptance rate of the cigarette brand recommendation is improved.
The invention provides a cigarette brand recommendation method based on consumer modeling, which is characterized by sequentially comprising the following steps of:
step 1: designing two-dimension code information of the cigarette external package, wherein a consumer can scan the two-dimension code through a mobile phone to acquire the information of the purchased cigarette so as to identify the authenticity, and meanwhile, a tobacco sales company can acquire consumer consumption records;
step 2: based on TF-IDF (word frequency-inverse document frequency) consumer modeling, a consumer set is analogized to a document set, a consumption record of each consumer in the consumer set is equal to a document in the document set, and different feature values of various features in the consumption record of the consumer are analogized to words in the document in the same way, so that a consumer consumption preference feature vector is constructed;
and step 3: modeling the cigarette brand feature vector, and extracting features the same as the consumer feature vector to construct the cigarette brand feature vector;
and 4, step 4: the similarity of the characteristic vectors of the consumers is calculated as the average value of the similarity of the corresponding characteristics of the preference vectors of the two consumers, and the calculation method of the similarity of the characteristic vectors of the consumers and the characteristic vectors of the cigarette brands is the same as the calculation method of the similarity between the characteristic vectors of the two consumers;
and 5: and calculating to realize a cigarette brand recommendation algorithm based on the consumer consumption preference based on the similarity of the cigarette brand feature vector and the consumer consumption preference feature. The algorithm comprises the steps of firstly constructing a consumer consumption preference model according to consumer consumption record data, carrying out cluster analysis on consumer groups, dividing consumers into a plurality of group sets, calculating cluster centroids, constructing cigarette brand feature vectors for cigarette brands to be promoted, calculating the similarity between the cigarette brand feature vectors and each cluster centroid, and taking the consumer cluster group with the maximum similarity as the consumer group to be recommended.
Preferably, the step 2 specifically comprises:
step 2.1: the consumer collection is analogized to the document collection, the consumption record of each consumer in the consumer collection is equal to the document in the document collection, 14 characteristics of cigarette production place, enterprise name, trademark, aroma type, grade and the like in the consumption record of the consumer are extracted, and a consumer consumption preference characteristic vector is constructed, such as a formula
Figure DEST_PATH_IMAGE002
(1)
Step 2.2: each feature in the consumer consumption record
Figure DEST_PATH_IMAGE004
Comprising a plurality of eigenvalues, the different eigenvalues being analogous to words in the document and themselves constituting an eigenvalue vector, e.g. a formula
Figure DEST_PATH_IMAGE006
(2)
Step 2.3: in the case of step 1.2,
Figure DEST_PATH_IMAGE008
representation feature
Figure DEST_PATH_IMAGE004A
The value of the j-th characteristic of (c),
Figure DEST_PATH_IMAGE011
the weight of the characteristic value is expressed by the following calculation formula
Figure DEST_PATH_IMAGE013
(3)
Wherein,
Figure DEST_PATH_IMAGE015
representing characteristic values
Figure DEST_PATH_IMAGE008A
At the consumer
Figure DEST_PATH_IMAGE018
The number of times of occurrence in the consumption record is larger, the characteristic can reflect the preference of the consumer,
Figure DEST_PATH_IMAGE020
representing characteristic values
Figure DEST_PATH_IMAGE008AA
Maximum among all consumers for weighting
Figure DEST_PATH_IMAGE011A
And (4) normalization processing.
Figure DEST_PATH_IMAGE024
The number of the total consumers is the number of the total consumers,
Figure DEST_PATH_IMAGE026
to include the characteristic value
Figure DEST_PATH_IMAGE008AAA
The number of the consumers of (1),
Figure DEST_PATH_IMAGE026A
the larger the value is, the characteristic value is represented
Figure DEST_PATH_IMAGE008AAAA
The more popular, it is difficult to express consumer preferences.
Preferably, the step 3 specifically comprises:
step 3.1: for cigarette brands, 14 characteristics such as production places, enterprise names, commodity names, specification models, odor types, grades and the like are selected to construct cigarette brand characteristic vectors
Figure DEST_PATH_IMAGE031
Step 3.2: for each feature in the cigarette brand
Figure DEST_PATH_IMAGE004AA
The characteristic value is uniqueAnd (4) setting the value, wherein the weight is 1 without statistical characteristics, and the weight is 0 when the rest characteristic values do not exist. For example, Yuxi (soft) has a faint scent type, the corresponding weight of a faint scent type characteristic value in the scent type characteristic is 1, other scent type characteristic values do not exist in the brand, and in order to facilitate similarity calculation of a cigarette brand characteristic vector and a consumer characteristic vector, the characteristic value vector of each characteristic of the cigarette brand is expanded to have the same dimension as that of each characteristic value vector of a consumer preference characteristic, so that the weight of the characteristic value which does not exist in a certain characteristic of the cigarette brand is set to be 0.
Preferably, the step 4 specifically includes:
step 4.1: two consumer customers
Figure DEST_PATH_IMAGE034
And
Figure DEST_PATH_IMAGE036
the feature vector of (a) is:
Figure DEST_PATH_IMAGE038
and
Figure DEST_PATH_IMAGE040
then, then
Figure DEST_PATH_IMAGE034A
And
Figure DEST_PATH_IMAGE036A
the similarity of (2) is the average value of the similarity values of the corresponding features of the two consumer consumption preference vectors, namely:
Figure DEST_PATH_IMAGE044
(4)
step 4.2:
Figure DEST_PATH_IMAGE046
calculating the similarity of corresponding features by utilizing the cosine value of the included angle of the feature value vector:
Figure DEST_PATH_IMAGE048
(5)
wherein,
Figure DEST_PATH_IMAGE050
and
Figure DEST_PATH_IMAGE052
is characterized in that
Figure DEST_PATH_IMAGE054
And
Figure DEST_PATH_IMAGE056
the weight of the corresponding feature value.
Preferably, the step 4 specifically includes:
step 5.1: the method comprises the following steps of obtaining consumer clusters by adopting a minimum spanning tree algorithm:
step 5.1.1: in the consumer vector space, the vertex of each consumer forming graph and the similarity value of two consumer vectors forming graph edges, and further all consumers are formed in a weighted undirected graph
Figure DEST_PATH_IMAGE058
In the middle, in
Figure DEST_PATH_IMAGE060
The independent consumers are in an initial state, and no edge connection exists among the consumers;
step 5.1.2: for all consumer vectors, calculating the similarity between any two vectors, and sorting the vectors in a descending order according to the similarity, namely sorting the vectors in a descending order according to the weight of the edges between two consumers;
step 5.1.3: selecting the edge with the maximum similarity value from the sorting queue obtained in the step 5.1.2, dequeuing, if the two consumer vertexes connected with the edge are positioned in different connected components, linking and merging the two connected components into a new connected component by the edge, and otherwise, neglecting the edge and not processing;
step 5.1.4: iteratively processing the process in step 5.1.3 until used
Figure DEST_PATH_IMAGE062
The minimum edges are connected with all the consumer vertexes, namely, a minimum spanning tree is generated;
step 5.1.5: deleting the edge with the minimum weight value in the minimum spanning tree (namely deleting the connection between the least similar consumers) in sequence until deleting the connection between the least similar consumers
Figure DEST_PATH_IMAGE064
Edges, then all vertices and edge components remain
Figure DEST_PATH_IMAGE066
A connected component, this
Figure DEST_PATH_IMAGE066A
A connected component is
Figure DEST_PATH_IMAGE066AA
An initial consumer cluster;
step 5.1.6: for each initial consumer cluster, averaging all paired vertex consumer vectors in the connected components to obtain
Figure DEST_PATH_IMAGE066AAA
Centroid of each initial consumer cluster:
Figure DEST_PATH_IMAGE071
(6)
wherein
Figure DEST_PATH_IMAGE073
Representing a cluster
Figure DEST_PATH_IMAGE075
The number of consumers in;
step 5.1.7: recalculating the distance between the vector of each consumer and the centroid vector of each cluster, and classifying the distance into the closest consumer cluster;
step 5.1.8: after the adjustment of all the consumer clusters is completed, the centroid of each consumer cluster is recalculated;
step 5.1.9: repeating the calculating steps 5.1.7 and 5.1.8 until the error function converges;
step 5.2: calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid;
step 5.2.1: for a pair of corresponding features in the cigarette brand feature vector and each consumer clustering centroid vector, calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid vector by adopting a formula (5);
step 5.2.2: repeating the step 5.2.1 to obtain similarity values of all corresponding features, and calculating the similarity of the cigarette brand features and the clustering mass centers of each consumer by adopting the average value of the similarity of the corresponding features of the cigarette brand feature vector and the clustering mass center feature vector of the consumer, such as a formula (4);
step 5.2.3: and (5.2.2) repeating the step to obtain the similarity value between the cigarette brand feature vector and all the consumer cluster centroids, namely the similarity between the cigarette brand and the consumption preference features of all the consumer cluster groups.
Step 5.3: and taking the consumer clustering group with the maximum similarity value as a new brand promotion group.
Preferably, the extracted consumer consumption preferences are respectively cigarette production area, enterprise name, trademark, commodity name, specification and model, odor type, grade, draw resistance, length, hardness, circumference, tar content, smoke nicotine content, smoke carbon monoxide content.
Preferably, the two-dimensional code scanning device is a mobile phone, and the identification software is a special App or a wechat applet.
Detailed Description
Reference will now be made in detail to the embodiments of the present invention, the following examples of which are intended to be illustrative only and are not to be construed as limiting the scope of the invention.
1. Modeling consumer consumption preferences
The consumption record connects the consumer, the cigarette brand information and the cigarette information together, and implies the preference of the consumer to a certain brand, a certain model or cigarettes with certain characteristics, so that 14 characteristics of the cigarette brand information and the cigarette information characteristics are extracted to form a consumer cigarette preference characteristic vector.
Each feature in the consumer preference feature vector includes a plurality of feature values, for example, the name of a product may be: yuxi (soft), hadenmen (pure incense), taishan (white general), taishan (red general), taishan (hope Yue), etc., and each feature value appears in different consumer consumption records at different times, where the number of times of feature values represents a consumer's preference for a cigarette brand having such feature value.
In order to accurately describe the importance of different feature values to the preference of the consumer, the invention calculates the weights of different features in the preference feature vector of the consumer by using the classic TF-IDF algorithm thought in text similarity calculation. The invention analogizes the consumer set to the document set, the consumption record of each consumer in the consumer set is equal to the document in the document set, and analogizes different feature values of each feature in the consumer consumption record to the words in the document in the same way, so as to construct the consumer consumption preference feature vector, such as a formula
Figure DEST_PATH_IMAGE002A
(1)
Wherein each feature
Figure DEST_PATH_IMAGE004AAA
Comprising a plurality of eigenvalues which in turn form an eigenvalue vector, e.g. a formula
Figure DEST_PATH_IMAGE006A
(2)
Wherein,
Figure DEST_PATH_IMAGE008AAAAA
representation feature
Figure DEST_PATH_IMAGE004AAAA
The value of the j-th characteristic of (c),
Figure DEST_PATH_IMAGE011AA
the weight of the characteristic value is expressed by the following calculation formula
Figure DEST_PATH_IMAGE013A
(3)
Wherein,
Figure DEST_PATH_IMAGE015A
representing characteristic values
Figure DEST_PATH_IMAGE008AAAAAA
At the consumer
Figure DEST_PATH_IMAGE018A
The number of times of occurrence in the consumption record is larger, the characteristic can reflect the preference of the consumer,
Figure DEST_PATH_IMAGE020A
representing characteristic values
Figure DEST_PATH_IMAGE008AAAAAAA
Maximum among all consumers for weighting
Figure DEST_PATH_IMAGE011AAA
And (4) normalization processing.
Figure DEST_PATH_IMAGE024A
The number of the total consumers is the number of the total consumers,
Figure DEST_PATH_IMAGE026AA
to include the characteristic value
Figure DEST_PATH_IMAGE008AAAAAAAA
The number of the consumers of (1),
Figure DEST_PATH_IMAGE026AAA
the larger the value is, the characteristic value is represented
Figure DEST_PATH_IMAGE008AAAAAAAAA
The more popular, it is difficult to express consumer preferences.
2. Cigarette brand modeling
For cigarette brands, 14 characteristics such as production places, enterprise names, commodity names, specification models, odor types, grades and the like are selected to construct cigarette brand characteristic vectors
Figure DEST_PATH_IMAGE031A
. For each feature in the cigarette brand
Figure DEST_PATH_IMAGE004AAAAA
The feature value is a unique determination value, has no statistical property, and has a weight of 1, and the rest of feature values are absent, and have a weight of 0. For example, Yuxi (soft) has a faint scent type, the corresponding weight of a faint scent type characteristic value in the scent type characteristic is 1, other scent type characteristic values do not exist in the brand, and in order to facilitate similarity calculation of a cigarette brand characteristic vector and a consumer characteristic vector, the characteristic value vector of each characteristic of the cigarette brand is expanded to have the same dimension as that of each characteristic value vector of a consumer preference characteristic, so that the weight of the characteristic value which does not exist in a certain characteristic of the cigarette brand is set to be 0.
3. Consumer consumption preference similarity calculation
Two consumer customers
Figure DEST_PATH_IMAGE097
And
Figure DEST_PATH_IMAGE036AA
the feature vector of (a) is:
Figure DEST_PATH_IMAGE100
and
Figure DEST_PATH_IMAGE040A
then, then
Figure DEST_PATH_IMAGE097A
And
Figure DEST_PATH_IMAGE036AAA
the similarity of (2) is the average value of the similarity values of the corresponding features of the two consumer consumption preference vectors, namely:
Figure DEST_PATH_IMAGE044A
(4)
wherein,
Figure DEST_PATH_IMAGE106
calculating the similarity of corresponding features by utilizing the cosine value of the included angle of the feature value vector:
Figure DEST_PATH_IMAGE048A
(5)
wherein,
Figure DEST_PATH_IMAGE050A
and
Figure DEST_PATH_IMAGE052A
is characterized in that
Figure DEST_PATH_IMAGE054A
And
Figure DEST_PATH_IMAGE056A
the weight of the corresponding feature value.
4. Consumer group recommendation algorithm
The consumer group recommendation algorithm comprises the steps of firstly constructing a consumer consumption preference model according to consumer consumption record data, carrying out cluster analysis on consumer groups, dividing consumers into a plurality of group sets, calculating cluster centroids, constructing cigarette brand feature vectors for cigarette brands to be promoted, calculating the similarity between the cigarette brand feature vectors and each cluster centroid, and taking the consumer cluster group with the maximum similarity as the consumer group to be recommended. The algorithm comprises the following specific steps:
step 1: the method comprises the following steps of obtaining consumer clusters by adopting a minimum spanning tree algorithm:
step 1.1: in the consumer vector space, the vertex of each consumer forming graph and the similarity value of two consumer vectors forming graph edges, and further all consumers are formed in a weighted undirected graph
Figure DEST_PATH_IMAGE058A
In the middle, in
Figure DEST_PATH_IMAGE060A
The independent consumers are in an initial state, and no edge connection exists among the consumers;
step 1.2: for all consumer vectors, calculating the similarity between any two vectors, and sorting the vectors in a descending order according to the similarity, namely sorting the vectors in a descending order according to the weight of the edges between two consumers;
step 1.3: selecting the edge with the maximum similarity value from the sorting queue obtained in the step 1.2, dequeuing, if the two consumer vertexes connected with the edge are positioned in different connected components, linking and merging the two connected components into a new connected component by the edge, and otherwise, neglecting the edge and not processing;
step 1.4: iteratively processing the process in step 1.3 until used
Figure DEST_PATH_IMAGE062A
The minimum edges are connected with all the consumer vertexes, namely, a minimum spanning tree is generated;
step 1.5: deleting the edge with the minimum weight value in the minimum spanning tree (namely deleting the connection between the least similar consumers) in sequence until deleting the connection between the least similar consumers
Figure DEST_PATH_IMAGE064A
Edges, then all vertices and edge components remain
Figure DEST_PATH_IMAGE066AAAA
A connected component, this
Figure DEST_PATH_IMAGE066AAAAA
A connected component is
Figure DEST_PATH_IMAGE066AAAAAA
An initial consumer cluster;
step 1.6: for each initial consumer cluster, averaging all paired vertex consumer vectors in the connected components to obtain
Figure DEST_PATH_IMAGE066AAAAAAA
Centroid of each initial consumer cluster:
Figure DEST_PATH_IMAGE071A
(6)
wherein
Figure DEST_PATH_IMAGE073A
Representing a cluster
Figure DEST_PATH_IMAGE075A
The number of consumers in;
step 1.7: recalculating the distance between the vector of each consumer and the centroid vector of each cluster, and classifying the distance into the closest consumer cluster;
step 1.8: after the adjustment of all the consumer clusters is completed, the centroid of each consumer cluster is recalculated;
step 1.9: repeating the calculation steps 1.7 and 1.8 until the error function converges;
step 2: calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid;
step 2.1: for a pair of corresponding features in the cigarette brand feature vector and each consumer clustering centroid vector, calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid vector by adopting a formula (5);
step 2.2: repeating the step 2.1 to obtain similarity values of all corresponding features, and calculating the similarity of the cigarette brand features and the clustering mass centers of each consumer by adopting the average value of the similarity of the corresponding features of the cigarette brand feature vectors and the clustering mass centers of the consumers, as shown in a formula (4);
step 2.3: and (3) repeating the step 2.2 to obtain similarity values between the cigarette brand feature vector and all the consumer clustering centroids, namely the similarity between the cigarette brand and the consumption preference features of all the consumer clustering groups.
And step 3: and taking the consumer clustering group with the maximum similarity value as a new brand promotion group.
By the algorithm, a proper consumer group can be recommended for the cigarette brand promotion, and a reference basis is provided for brand promotion planning.
Although exemplary embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions, substitutions and the like can be made in form and detail without departing from the scope and spirit of the invention as disclosed in the accompanying claims, all of which are intended to fall within the scope of the claims, and that various steps in the various sections and methods of the claimed product can be combined together in any combination. Therefore, the description of the embodiments disclosed in the present invention is not intended to limit the scope of the present invention, but to describe the present invention. Accordingly, the scope of the present invention is not limited by the above embodiments, but is defined by the claims or their equivalents.

Claims (7)

1. A cigarette brand recommendation algorithm based on consumer modeling is characterized by sequentially comprising the following steps:
step 1: designing two-dimension code information of the cigarette external package, wherein a consumer can scan the two-dimension code through a mobile phone to acquire the information of the purchased cigarette so as to identify the authenticity, and meanwhile, a tobacco sales company can acquire consumer consumption records;
step 2: based on TF-IDF (word frequency-inverse document frequency) consumer modeling, a consumer set is analogized to a document set, a consumption record of each consumer in the consumer set is equal to a document in the document set, and different feature values of various features in the consumption record of the consumer are analogized to words in the document in the same way, so that a consumer consumption preference feature vector is constructed;
and step 3: modeling the cigarette brand feature vector, and extracting features the same as the consumer feature vector to construct the cigarette brand feature vector;
and 4, step 4: the similarity of the characteristic vectors of the consumers is calculated as the average value of the similarity of the corresponding characteristics of the preference vectors of the two consumers, and the calculation method of the similarity of the characteristic vectors of the consumers and the characteristic vectors of the cigarette brands is the same as the calculation method of the similarity between the characteristic vectors of the two consumers;
and 5: the method comprises the steps of calculating and realizing a cigarette brand recommendation algorithm based on consumer preference based on cigarette brand feature vectors and consumer preference feature similarity, firstly constructing a consumer preference model according to consumer consumption record data, carrying out cluster analysis on consumer groups, dividing consumers into a plurality of group sets, calculating clustering centroids, constructing cigarette brand feature vectors for cigarette brands to be promoted, calculating the similarity between the cigarette brand feature vectors and each clustering centroid, and taking the consumer clustering group with the maximum similarity as the consumer group to be recommended.
2. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 2 are as follows:
step 2.1: the consumer collection is analogized to the document collection, the consumption record of each consumer in the consumer collection is equal to the document in the document collection, 14 characteristics of cigarette production place, enterprise name, trademark, aroma type, grade and the like in the consumption record of the consumer are extracted, and a consumer consumption preference characteristic vector is constructed, such as a formula
U={Fi|1≤i≤14} (1)
Step 2.2: each feature F in the consumer consumption recordiContaining a plurality of eigenvalues, these different eigenvalues being analogous to words in the document, itselfAnd forming a vector of eigenvalues, e.g. formula
Fi={(fij,wij)|1≤j≤m} (2)
Step 2.3: for step 1.2, fijRepresents a feature FiJ-th characteristic value of (1), wijThe weight of the characteristic value is expressed by the following calculation formula
Figure RE-FDA0002427923050000011
Wherein, tfijRepresenting the characteristic value fijThe number of times of occurrence in the consumption record of the consumer U is larger, the value is larger, the characteristic can reflect the preference of the consumer, maxtfijRepresenting the characteristic value fijMaximum among all consumers for the weight wijN is the total number of consumers, N isijTo contain the characteristic value fijThe number of consumers, nijThe larger the value is, the characteristic value f is representedijThe more popular, it is difficult to express consumer preferences.
3. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 3 are as follows:
step 3.1: for cigarette brands, 14 features such as production places, enterprise names, commodity names, specification models, odor types, grades and the like are selected, and cigarette brand feature vectors B ═ F are constructedi|1≤i≤14};
Step 3.2: for each feature F in a brand of cigarettesiThe feature value is a unique determination value, has no statistical property, and has a weight of 1, and the rest of feature values are absent, and have a weight of 0. For example, Yuxi (soft) has a faint scent type, the weight corresponding to the characteristic value of the faint scent type in the scent type characteristics is 1, the characteristic values of other scent types do not exist in the brand, and in order to calculate the similarity between the characteristic vector of the brand of cigarette and the characteristic vector of consumers, the characteristic value vector of each characteristic of the brand of cigarette is expanded to have the same dimension as that of each characteristic value vector of the preference characteristic of consumersAnd the weight value of the characteristic value which does not exist in certain characteristic of the cigarette brand is set to be 0.
4. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 4 are as follows:
step 4.1: two consumer clients U are setiAnd UjThe feature vector of (a) is:
Figure RE-FDA0002427923050000021
and
Figure RE-FDA0002427923050000022
then U isiAnd UjThe similarity of (2) is the average value of the similarity values of the corresponding features of the two consumer consumption preference vectors, namely:
Figure RE-FDA0002427923050000023
step 4.2:
Figure RE-FDA0002427923050000024
calculating the similarity of corresponding features by utilizing the cosine value of the included angle of the feature value vector:
Figure RE-FDA0002427923050000025
wherein, w1iAnd w2iIs characterized by F1And F2The weight of the corresponding feature value.
5. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 5 are as follows:
step 5.1: the method comprises the following steps of obtaining consumer clusters by adopting a minimum spanning tree algorithm:
step 5.1.1: in the consumer vector space, forming the graph edges by the vertexes of each consumer forming the graph and the similarity values of two consumer vectors, further forming all consumers in a weighted undirected graph G (V, E), taking n independent consumers as an initial state, and having no edge connection between the consumers;
step 5.1.2: for all consumer vectors, calculating the similarity between any two vectors, and sorting the vectors in a descending order according to the similarity, namely sorting the vectors in a descending order according to the weight of the edges between two consumers;
step 5.1.3: selecting the edge with the maximum similarity value from the sorting queue obtained in the step 5.1.2, dequeuing, if the two consumer vertexes connected with the edge are positioned in different connected components, linking and merging the two connected components into a new connected component by the edge, and otherwise, neglecting the edge and not processing;
step 5.1.4: iteratively processing the process in step 5.1.3 until all consumer vertices are connected with the least n-1 edges, i.e. generating a minimum spanning tree;
step 5.1.5: deleting the edge with the minimum weight value in the minimum spanning tree (namely deleting the connection between the least similar consumers) in sequence until the M-1 edge is deleted, and then, forming M connected components by remaining all vertexes and edges, wherein the M connected components are M initial consumer clusters;
step 5.1.6: for each initial consumer cluster, averaging all paired vertex consumer vectors in the connected components to obtain the centroids of the M initial consumer clusters:
Figure RE-FDA0002427923050000031
wherein | CiI represents a cluster CiThe number of consumers in;
step 5.1.7: recalculating the distance between the vector of each consumer and the centroid vector of each cluster, and classifying the distance into the closest consumer cluster;
step 5.1.8: after the adjustment of all the consumer clusters is completed, the centroid of each consumer cluster is recalculated;
step 5.1.9: repeating the calculating steps 5.1.7 and 5.1.8 until the error function converges;
step 5.2: calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid;
step 5.2.1: for a pair of corresponding features in the cigarette brand feature vector and each consumer clustering centroid vector, calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid vector by adopting a formula (5);
step 5.2.2: repeating the step 5.2.1 to obtain similarity values of all corresponding features, and calculating the similarity of the cigarette brand features and the clustering mass centers of each consumer by adopting the average value of the similarity of the corresponding features of the cigarette brand feature vector and the clustering mass center feature vector of the consumer, such as a formula (4);
step 5.2.3: repeating the step 5.2.2 to obtain similarity values between the cigarette brand feature vectors and all the consumer clustering centroids, namely the similarity between the cigarette brand and the consumption preference features of all the consumer clustering groups;
step 5.3: and taking the consumer clustering group with the maximum similarity value as a new brand promotion group.
6. The cigarette brand recommendation algorithm based on consumer modeling of claim 1, further characterized by: and constructing a corresponding consumer consumption preference characteristic vector according to the consumer consumption record by combining the tobacco brand information and the tobacco information, specifically constructing a 14-dimensional vector which is respectively a cigarette production place, an enterprise name, a trademark, a commodity name, a specification model, a fragrance type, a grade, a suction resistance, a length, a hardness, a circumference, tar content, a smoke nicotine content and a smoke carbon monoxide content.
7. The cigarette brand recommendation algorithm based on consumer modeling of claim 1, further characterized by: the two-dimensional code scanning device is a mobile phone, and the identification software is a special App or a WeChat applet.
CN202010064585.9A 2020-01-20 2020-01-20 Cigarette brand recommendation algorithm based on consumer modeling Pending CN111275459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010064585.9A CN111275459A (en) 2020-01-20 2020-01-20 Cigarette brand recommendation algorithm based on consumer modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010064585.9A CN111275459A (en) 2020-01-20 2020-01-20 Cigarette brand recommendation algorithm based on consumer modeling

Publications (1)

Publication Number Publication Date
CN111275459A true CN111275459A (en) 2020-06-12

Family

ID=71003438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010064585.9A Pending CN111275459A (en) 2020-01-20 2020-01-20 Cigarette brand recommendation algorithm based on consumer modeling

Country Status (1)

Country Link
CN (1) CN111275459A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742533A (en) * 2021-08-05 2021-12-03 北京思特奇信息技术股份有限公司 Prim algorithm based recommendation method, system and recommendation device
CN113781175A (en) * 2021-09-14 2021-12-10 广西中烟工业有限责任公司 New cigarette product recommendation method and system
WO2022164626A1 (en) * 2021-02-01 2022-08-04 Mastercard International Incorporated Audience recommendation using node similarity in combined contextual graph embeddings
CN115018588A (en) * 2022-06-24 2022-09-06 平安普惠企业管理有限公司 Product recommendation method and device, electronic equipment and readable storage medium
CN118365383A (en) * 2024-06-18 2024-07-19 昆明学院 Consumer flow identifying method based on CGAN and deep CNN

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528716A (en) * 2015-12-03 2016-04-27 山东烟草研究院有限公司 Tobacco brand remote intelligent recommendation method facing retailer individual need
CN107103488A (en) * 2017-03-02 2017-08-29 江苏省烟草公司常州市公司 Cigarette consumption analysis method based on collaborative filtering and clustering algorithm
CN109885688A (en) * 2019-03-05 2019-06-14 湖北亿咖通科技有限公司 File classification method, device, computer readable storage medium and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528716A (en) * 2015-12-03 2016-04-27 山东烟草研究院有限公司 Tobacco brand remote intelligent recommendation method facing retailer individual need
CN107103488A (en) * 2017-03-02 2017-08-29 江苏省烟草公司常州市公司 Cigarette consumption analysis method based on collaborative filtering and clustering algorithm
CN109885688A (en) * 2019-03-05 2019-06-14 湖北亿咖通科技有限公司 File classification method, device, computer readable storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DONG HAN ET AL.: "Cigarette Brand Recommendation Based on Consumer Modeling", 《2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI)》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022164626A1 (en) * 2021-02-01 2022-08-04 Mastercard International Incorporated Audience recommendation using node similarity in combined contextual graph embeddings
US11727422B2 (en) 2021-02-01 2023-08-15 Mastercard International Incorporated Audience recommendation using node similarity in combined contextual graph embeddings
CN113742533A (en) * 2021-08-05 2021-12-03 北京思特奇信息技术股份有限公司 Prim algorithm based recommendation method, system and recommendation device
CN113781175A (en) * 2021-09-14 2021-12-10 广西中烟工业有限责任公司 New cigarette product recommendation method and system
CN113781175B (en) * 2021-09-14 2023-11-28 广西中烟工业有限责任公司 Cigarette new product recommending method and recommending system
CN115018588A (en) * 2022-06-24 2022-09-06 平安普惠企业管理有限公司 Product recommendation method and device, electronic equipment and readable storage medium
CN118365383A (en) * 2024-06-18 2024-07-19 昆明学院 Consumer flow identifying method based on CGAN and deep CNN

Similar Documents

Publication Publication Date Title
CN111275459A (en) Cigarette brand recommendation algorithm based on consumer modeling
US11734725B2 (en) Information sending method, apparatus and system, and computer-readable storage medium
CN105701191B (en) Pushed information click rate estimation method and device
WO2018010591A1 (en) Information push method and apparatus, server, and storage medium
CN107332910B (en) Information pushing method and device
CN103617230B (en) Method and system for advertisement recommendation based microblog
CN105574067A (en) Item recommendation device and item recommendation method
JP6635587B2 (en) Advertising sentence selection device and program
CN106611344A (en) Method and device for mining potential customers
CN113268656A (en) User recommendation method and device, electronic equipment and computer storage medium
US20140288999A1 (en) Social character recognition (scr) system
CN112102029B (en) Knowledge graph-based long-tail recommendation calculation method
CN110348906B (en) Improved commodity recommendation method based on multi-type implicit feedback
Sudrajat et al. Analysis of data mining classification by comparison of C4. 5 and ID algorithms
CN107193832A (en) Similarity method for digging and device
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
KR102351879B1 (en) Method and device for classifying unstructured item data automatically for goods or services
CN103577472B (en) Personal information acquisition, presumption, the classification of commodity, search method and system
Sasank et al. Credit card fraud detection using various classification and sampling techniques: a comparative study
CN112437053A (en) Intrusion detection method and device
CN111651678A (en) Knowledge graph-based personalized recommendation method
CN118013120B (en) Method, medium and equipment for optimizing products recommended to users based on cluster labels
CN116611897B (en) Message reminding method and system based on artificial intelligence
CN113326432A (en) Model optimization method based on decision tree and recommendation method
CN114282119B (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200612