CN111275459A - Cigarette brand recommendation algorithm based on consumer modeling - Google Patents
Cigarette brand recommendation algorithm based on consumer modeling Download PDFInfo
- Publication number
- CN111275459A CN111275459A CN202010064585.9A CN202010064585A CN111275459A CN 111275459 A CN111275459 A CN 111275459A CN 202010064585 A CN202010064585 A CN 202010064585A CN 111275459 A CN111275459 A CN 111275459A
- Authority
- CN
- China
- Prior art keywords
- consumer
- cigarette
- similarity
- characteristic
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 235000019504 cigarettes Nutrition 0.000 title claims abstract description 103
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims abstract description 117
- 238000004364 calculation method Methods 0.000 claims abstract description 14
- 238000000034 method Methods 0.000 claims description 13
- 241000208125 Nicotiana Species 0.000 claims description 10
- 235000002637 Nicotiana tabacum Nutrition 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 8
- 238000004519 manufacturing process Methods 0.000 claims description 7
- 239000000779 smoke Substances 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000007621 cluster analysis Methods 0.000 claims description 3
- SNICXCGAKADSCV-JTQLQIEISA-N (-)-Nicotine Chemical compound CN1CCC[C@H]1C1=CC=CN=C1 SNICXCGAKADSCV-JTQLQIEISA-N 0.000 claims description 2
- UGFAIRIUMAVXCW-UHFFFAOYSA-N Carbon monoxide Chemical compound [O+]#[C-] UGFAIRIUMAVXCW-UHFFFAOYSA-N 0.000 claims description 2
- 229910002091 carbon monoxide Inorganic materials 0.000 claims description 2
- 229960002715 nicotine Drugs 0.000 claims description 2
- SNICXCGAKADSCV-UHFFFAOYSA-N nicotine Natural products CN1CCCC1C1=CC=CN=C1 SNICXCGAKADSCV-UHFFFAOYSA-N 0.000 claims description 2
- 239000003205 fragrance Substances 0.000 claims 1
- 238000011160 research Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Databases & Information Systems (AREA)
- Development Economics (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Entrepreneurship & Innovation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A cigarette brand recommendation algorithm based on consumer modeling sequentially comprises the following steps: according to the consumer consumption record, associating the cigarette information and the cigarette brand information, counting and extracting the characteristics reflecting the preference information of the consumer, calculating the weight of different characteristic values of each characteristic based on a TF-IDF (word frequency-inverse document frequency) algorithm idea, and constructing a consumer consumption preference characteristic model; extracting the same characteristics from the cigarette brand and cigarette information to establish a cigarette brand characteristic vector; and then defining a consumer consumption preference similarity calculation method, and calculating to realize a cigarette brand recommendation algorithm based on consumer consumption preference based on the cigarette brand feature vector and the consumer consumption preference feature similarity.
Description
Technical Field
The invention relates to the field of commodity brand recommendation research, in particular to consumer consumption preference feature modeling and cigarette brand recommendation and popularization research in the tobacco industry.
Background
The characteristic of tobacco monopoly results in that a customer manager in the tobacco industry can only visit retail customers, a tobacco sales enterprise can only obtain sales information of the retail customers, the information of final consumers is difficult to obtain, and the consumers are difficult to analyze so as to make a more targeted sales strategy. According to the invention, the two-dimension code information is set through the cigarette external package, a consumer can scan the two-dimension code through a mobile phone to obtain the purchased cigarette information so as to identify the authenticity, and meanwhile, a tobacco sales company can also obtain the consumer consumption record. The consumption records of the consumers reflect the consumption preference and interest of the consumers, so that the user representation of the consumption preference of the consumers is realized by analyzing the consumption records of the consumers and mining the contained characteristics, and the method has important research significance for cigarette brand recommendation and new brand market promotion.
The classic basic idea based on the content recommendation algorithm is to complete the personalized recommendation process according to the matching of the feature tag of the item to be recommended and the user preference attribute, specifically, the recommendation is completed according to the similarity by calculating the similarity between the personal preference information feature of the user and the description feature of the item to be recommended, and the item with high similarity is recommended to the user.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for obtaining cigarette consumer information and establishing a consumer consumption preference model to realize cigarette brand recommendation. According to the method, the consumption records of the consumers in the tobacco industry are researched and analyzed, the consumption records of the consumers are connected with cigarette information, 14 characteristics capable of reflecting the consumption preference of the consumers are extracted, weights of different characteristic values in the consumption preference characteristics of different consumers are calculated by using a TF-IDF algorithm for reference, and a consumption preference vector model of the consumers is established; establishing a cigarette brand feature vector model according to the preference features of the consumers, and realizing similarity calculation between consumer vectors and between the consumers and the cigarette brand vectors; the minimum spanning tree algorithm is utilized to realize the cigarette brand recommendation of consumers, and the acceptance rate of the cigarette brand recommendation is improved.
The invention provides a cigarette brand recommendation method based on consumer modeling, which is characterized by sequentially comprising the following steps of:
step 1: designing two-dimension code information of the cigarette external package, wherein a consumer can scan the two-dimension code through a mobile phone to acquire the information of the purchased cigarette so as to identify the authenticity, and meanwhile, a tobacco sales company can acquire consumer consumption records;
step 2: based on TF-IDF (word frequency-inverse document frequency) consumer modeling, a consumer set is analogized to a document set, a consumption record of each consumer in the consumer set is equal to a document in the document set, and different feature values of various features in the consumption record of the consumer are analogized to words in the document in the same way, so that a consumer consumption preference feature vector is constructed;
and step 3: modeling the cigarette brand feature vector, and extracting features the same as the consumer feature vector to construct the cigarette brand feature vector;
and 4, step 4: the similarity of the characteristic vectors of the consumers is calculated as the average value of the similarity of the corresponding characteristics of the preference vectors of the two consumers, and the calculation method of the similarity of the characteristic vectors of the consumers and the characteristic vectors of the cigarette brands is the same as the calculation method of the similarity between the characteristic vectors of the two consumers;
and 5: and calculating to realize a cigarette brand recommendation algorithm based on the consumer consumption preference based on the similarity of the cigarette brand feature vector and the consumer consumption preference feature. The algorithm comprises the steps of firstly constructing a consumer consumption preference model according to consumer consumption record data, carrying out cluster analysis on consumer groups, dividing consumers into a plurality of group sets, calculating cluster centroids, constructing cigarette brand feature vectors for cigarette brands to be promoted, calculating the similarity between the cigarette brand feature vectors and each cluster centroid, and taking the consumer cluster group with the maximum similarity as the consumer group to be recommended.
Preferably, the step 2 specifically comprises:
step 2.1: the consumer collection is analogized to the document collection, the consumption record of each consumer in the consumer collection is equal to the document in the document collection, 14 characteristics of cigarette production place, enterprise name, trademark, aroma type, grade and the like in the consumption record of the consumer are extracted, and a consumer consumption preference characteristic vector is constructed, such as a formula
Step 2.2: each feature in the consumer consumption recordComprising a plurality of eigenvalues, the different eigenvalues being analogous to words in the document and themselves constituting an eigenvalue vector, e.g. a formula
Step 2.3: in the case of step 1.2,representation featureThe value of the j-th characteristic of (c),the weight of the characteristic value is expressed by the following calculation formula
Wherein,representing characteristic valuesAt the consumerThe number of times of occurrence in the consumption record is larger, the characteristic can reflect the preference of the consumer,representing characteristic valuesMaximum among all consumers for weightingAnd (4) normalization processing.The number of the total consumers is the number of the total consumers,to include the characteristic valueThe number of the consumers of (1),the larger the value is, the characteristic value is representedThe more popular, it is difficult to express consumer preferences.
Preferably, the step 3 specifically comprises:
step 3.1: for cigarette brands, 14 characteristics such as production places, enterprise names, commodity names, specification models, odor types, grades and the like are selected to construct cigarette brand characteristic vectors;
Step 3.2: for each feature in the cigarette brandThe characteristic value is uniqueAnd (4) setting the value, wherein the weight is 1 without statistical characteristics, and the weight is 0 when the rest characteristic values do not exist. For example, Yuxi (soft) has a faint scent type, the corresponding weight of a faint scent type characteristic value in the scent type characteristic is 1, other scent type characteristic values do not exist in the brand, and in order to facilitate similarity calculation of a cigarette brand characteristic vector and a consumer characteristic vector, the characteristic value vector of each characteristic of the cigarette brand is expanded to have the same dimension as that of each characteristic value vector of a consumer preference characteristic, so that the weight of the characteristic value which does not exist in a certain characteristic of the cigarette brand is set to be 0.
Preferably, the step 4 specifically includes:
step 4.1: two consumer customersAndthe feature vector of (a) is:andthen, thenAndthe similarity of (2) is the average value of the similarity values of the corresponding features of the two consumer consumption preference vectors, namely:
step 4.2:calculating the similarity of corresponding features by utilizing the cosine value of the included angle of the feature value vector:
Preferably, the step 4 specifically includes:
step 5.1: the method comprises the following steps of obtaining consumer clusters by adopting a minimum spanning tree algorithm:
step 5.1.1: in the consumer vector space, the vertex of each consumer forming graph and the similarity value of two consumer vectors forming graph edges, and further all consumers are formed in a weighted undirected graphIn the middle, inThe independent consumers are in an initial state, and no edge connection exists among the consumers;
step 5.1.2: for all consumer vectors, calculating the similarity between any two vectors, and sorting the vectors in a descending order according to the similarity, namely sorting the vectors in a descending order according to the weight of the edges between two consumers;
step 5.1.3: selecting the edge with the maximum similarity value from the sorting queue obtained in the step 5.1.2, dequeuing, if the two consumer vertexes connected with the edge are positioned in different connected components, linking and merging the two connected components into a new connected component by the edge, and otherwise, neglecting the edge and not processing;
step 5.1.4: iteratively processing the process in step 5.1.3 until usedThe minimum edges are connected with all the consumer vertexes, namely, a minimum spanning tree is generated;
step 5.1.5: deleting the edge with the minimum weight value in the minimum spanning tree (namely deleting the connection between the least similar consumers) in sequence until deleting the connection between the least similar consumersEdges, then all vertices and edge components remainA connected component, thisA connected component isAn initial consumer cluster;
step 5.1.6: for each initial consumer cluster, averaging all paired vertex consumer vectors in the connected components to obtainCentroid of each initial consumer cluster:
step 5.1.7: recalculating the distance between the vector of each consumer and the centroid vector of each cluster, and classifying the distance into the closest consumer cluster;
step 5.1.8: after the adjustment of all the consumer clusters is completed, the centroid of each consumer cluster is recalculated;
step 5.1.9: repeating the calculating steps 5.1.7 and 5.1.8 until the error function converges;
step 5.2: calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid;
step 5.2.1: for a pair of corresponding features in the cigarette brand feature vector and each consumer clustering centroid vector, calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid vector by adopting a formula (5);
step 5.2.2: repeating the step 5.2.1 to obtain similarity values of all corresponding features, and calculating the similarity of the cigarette brand features and the clustering mass centers of each consumer by adopting the average value of the similarity of the corresponding features of the cigarette brand feature vector and the clustering mass center feature vector of the consumer, such as a formula (4);
step 5.2.3: and (5.2.2) repeating the step to obtain the similarity value between the cigarette brand feature vector and all the consumer cluster centroids, namely the similarity between the cigarette brand and the consumption preference features of all the consumer cluster groups.
Step 5.3: and taking the consumer clustering group with the maximum similarity value as a new brand promotion group.
Preferably, the extracted consumer consumption preferences are respectively cigarette production area, enterprise name, trademark, commodity name, specification and model, odor type, grade, draw resistance, length, hardness, circumference, tar content, smoke nicotine content, smoke carbon monoxide content.
Preferably, the two-dimensional code scanning device is a mobile phone, and the identification software is a special App or a wechat applet.
Detailed Description
Reference will now be made in detail to the embodiments of the present invention, the following examples of which are intended to be illustrative only and are not to be construed as limiting the scope of the invention.
1. Modeling consumer consumption preferences
The consumption record connects the consumer, the cigarette brand information and the cigarette information together, and implies the preference of the consumer to a certain brand, a certain model or cigarettes with certain characteristics, so that 14 characteristics of the cigarette brand information and the cigarette information characteristics are extracted to form a consumer cigarette preference characteristic vector.
Each feature in the consumer preference feature vector includes a plurality of feature values, for example, the name of a product may be: yuxi (soft), hadenmen (pure incense), taishan (white general), taishan (red general), taishan (hope Yue), etc., and each feature value appears in different consumer consumption records at different times, where the number of times of feature values represents a consumer's preference for a cigarette brand having such feature value.
In order to accurately describe the importance of different feature values to the preference of the consumer, the invention calculates the weights of different features in the preference feature vector of the consumer by using the classic TF-IDF algorithm thought in text similarity calculation. The invention analogizes the consumer set to the document set, the consumption record of each consumer in the consumer set is equal to the document in the document set, and analogizes different feature values of each feature in the consumer consumption record to the words in the document in the same way, so as to construct the consumer consumption preference feature vector, such as a formula
Wherein each featureComprising a plurality of eigenvalues which in turn form an eigenvalue vector, e.g. a formula
Wherein,representation featureThe value of the j-th characteristic of (c),the weight of the characteristic value is expressed by the following calculation formula
Wherein,representing characteristic valuesAt the consumerThe number of times of occurrence in the consumption record is larger, the characteristic can reflect the preference of the consumer,representing characteristic valuesMaximum among all consumers for weightingAnd (4) normalization processing.The number of the total consumers is the number of the total consumers,to include the characteristic valueThe number of the consumers of (1),the larger the value is, the characteristic value is representedThe more popular, it is difficult to express consumer preferences.
2. Cigarette brand modeling
For cigarette brands, 14 characteristics such as production places, enterprise names, commodity names, specification models, odor types, grades and the like are selected to construct cigarette brand characteristic vectors. For each feature in the cigarette brandThe feature value is a unique determination value, has no statistical property, and has a weight of 1, and the rest of feature values are absent, and have a weight of 0. For example, Yuxi (soft) has a faint scent type, the corresponding weight of a faint scent type characteristic value in the scent type characteristic is 1, other scent type characteristic values do not exist in the brand, and in order to facilitate similarity calculation of a cigarette brand characteristic vector and a consumer characteristic vector, the characteristic value vector of each characteristic of the cigarette brand is expanded to have the same dimension as that of each characteristic value vector of a consumer preference characteristic, so that the weight of the characteristic value which does not exist in a certain characteristic of the cigarette brand is set to be 0.
3. Consumer consumption preference similarity calculation
Two consumer customersAndthe feature vector of (a) is:andthen, thenAndthe similarity of (2) is the average value of the similarity values of the corresponding features of the two consumer consumption preference vectors, namely:
wherein,calculating the similarity of corresponding features by utilizing the cosine value of the included angle of the feature value vector:
4. Consumer group recommendation algorithm
The consumer group recommendation algorithm comprises the steps of firstly constructing a consumer consumption preference model according to consumer consumption record data, carrying out cluster analysis on consumer groups, dividing consumers into a plurality of group sets, calculating cluster centroids, constructing cigarette brand feature vectors for cigarette brands to be promoted, calculating the similarity between the cigarette brand feature vectors and each cluster centroid, and taking the consumer cluster group with the maximum similarity as the consumer group to be recommended. The algorithm comprises the following specific steps:
step 1: the method comprises the following steps of obtaining consumer clusters by adopting a minimum spanning tree algorithm:
step 1.1: in the consumer vector space, the vertex of each consumer forming graph and the similarity value of two consumer vectors forming graph edges, and further all consumers are formed in a weighted undirected graphIn the middle, inThe independent consumers are in an initial state, and no edge connection exists among the consumers;
step 1.2: for all consumer vectors, calculating the similarity between any two vectors, and sorting the vectors in a descending order according to the similarity, namely sorting the vectors in a descending order according to the weight of the edges between two consumers;
step 1.3: selecting the edge with the maximum similarity value from the sorting queue obtained in the step 1.2, dequeuing, if the two consumer vertexes connected with the edge are positioned in different connected components, linking and merging the two connected components into a new connected component by the edge, and otherwise, neglecting the edge and not processing;
step 1.4: iteratively processing the process in step 1.3 until usedThe minimum edges are connected with all the consumer vertexes, namely, a minimum spanning tree is generated;
step 1.5: deleting the edge with the minimum weight value in the minimum spanning tree (namely deleting the connection between the least similar consumers) in sequence until deleting the connection between the least similar consumersEdges, then all vertices and edge components remainA connected component, thisA connected component isAn initial consumer cluster;
step 1.6: for each initial consumer cluster, averaging all paired vertex consumer vectors in the connected components to obtainCentroid of each initial consumer cluster:
step 1.7: recalculating the distance between the vector of each consumer and the centroid vector of each cluster, and classifying the distance into the closest consumer cluster;
step 1.8: after the adjustment of all the consumer clusters is completed, the centroid of each consumer cluster is recalculated;
step 1.9: repeating the calculation steps 1.7 and 1.8 until the error function converges;
step 2: calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid;
step 2.1: for a pair of corresponding features in the cigarette brand feature vector and each consumer clustering centroid vector, calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid vector by adopting a formula (5);
step 2.2: repeating the step 2.1 to obtain similarity values of all corresponding features, and calculating the similarity of the cigarette brand features and the clustering mass centers of each consumer by adopting the average value of the similarity of the corresponding features of the cigarette brand feature vectors and the clustering mass centers of the consumers, as shown in a formula (4);
step 2.3: and (3) repeating the step 2.2 to obtain similarity values between the cigarette brand feature vector and all the consumer clustering centroids, namely the similarity between the cigarette brand and the consumption preference features of all the consumer clustering groups.
And step 3: and taking the consumer clustering group with the maximum similarity value as a new brand promotion group.
By the algorithm, a proper consumer group can be recommended for the cigarette brand promotion, and a reference basis is provided for brand promotion planning.
Although exemplary embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions, substitutions and the like can be made in form and detail without departing from the scope and spirit of the invention as disclosed in the accompanying claims, all of which are intended to fall within the scope of the claims, and that various steps in the various sections and methods of the claimed product can be combined together in any combination. Therefore, the description of the embodiments disclosed in the present invention is not intended to limit the scope of the present invention, but to describe the present invention. Accordingly, the scope of the present invention is not limited by the above embodiments, but is defined by the claims or their equivalents.
Claims (7)
1. A cigarette brand recommendation algorithm based on consumer modeling is characterized by sequentially comprising the following steps:
step 1: designing two-dimension code information of the cigarette external package, wherein a consumer can scan the two-dimension code through a mobile phone to acquire the information of the purchased cigarette so as to identify the authenticity, and meanwhile, a tobacco sales company can acquire consumer consumption records;
step 2: based on TF-IDF (word frequency-inverse document frequency) consumer modeling, a consumer set is analogized to a document set, a consumption record of each consumer in the consumer set is equal to a document in the document set, and different feature values of various features in the consumption record of the consumer are analogized to words in the document in the same way, so that a consumer consumption preference feature vector is constructed;
and step 3: modeling the cigarette brand feature vector, and extracting features the same as the consumer feature vector to construct the cigarette brand feature vector;
and 4, step 4: the similarity of the characteristic vectors of the consumers is calculated as the average value of the similarity of the corresponding characteristics of the preference vectors of the two consumers, and the calculation method of the similarity of the characteristic vectors of the consumers and the characteristic vectors of the cigarette brands is the same as the calculation method of the similarity between the characteristic vectors of the two consumers;
and 5: the method comprises the steps of calculating and realizing a cigarette brand recommendation algorithm based on consumer preference based on cigarette brand feature vectors and consumer preference feature similarity, firstly constructing a consumer preference model according to consumer consumption record data, carrying out cluster analysis on consumer groups, dividing consumers into a plurality of group sets, calculating clustering centroids, constructing cigarette brand feature vectors for cigarette brands to be promoted, calculating the similarity between the cigarette brand feature vectors and each clustering centroid, and taking the consumer clustering group with the maximum similarity as the consumer group to be recommended.
2. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 2 are as follows:
step 2.1: the consumer collection is analogized to the document collection, the consumption record of each consumer in the consumer collection is equal to the document in the document collection, 14 characteristics of cigarette production place, enterprise name, trademark, aroma type, grade and the like in the consumption record of the consumer are extracted, and a consumer consumption preference characteristic vector is constructed, such as a formula
U={Fi|1≤i≤14} (1)
Step 2.2: each feature F in the consumer consumption recordiContaining a plurality of eigenvalues, these different eigenvalues being analogous to words in the document, itselfAnd forming a vector of eigenvalues, e.g. formula
Fi={(fij,wij)|1≤j≤m} (2)
Step 2.3: for step 1.2, fijRepresents a feature FiJ-th characteristic value of (1), wijThe weight of the characteristic value is expressed by the following calculation formula
Wherein, tfijRepresenting the characteristic value fijThe number of times of occurrence in the consumption record of the consumer U is larger, the value is larger, the characteristic can reflect the preference of the consumer, maxtfijRepresenting the characteristic value fijMaximum among all consumers for the weight wijN is the total number of consumers, N isijTo contain the characteristic value fijThe number of consumers, nijThe larger the value is, the characteristic value f is representedijThe more popular, it is difficult to express consumer preferences.
3. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 3 are as follows:
step 3.1: for cigarette brands, 14 features such as production places, enterprise names, commodity names, specification models, odor types, grades and the like are selected, and cigarette brand feature vectors B ═ F are constructedi|1≤i≤14};
Step 3.2: for each feature F in a brand of cigarettesiThe feature value is a unique determination value, has no statistical property, and has a weight of 1, and the rest of feature values are absent, and have a weight of 0. For example, Yuxi (soft) has a faint scent type, the weight corresponding to the characteristic value of the faint scent type in the scent type characteristics is 1, the characteristic values of other scent types do not exist in the brand, and in order to calculate the similarity between the characteristic vector of the brand of cigarette and the characteristic vector of consumers, the characteristic value vector of each characteristic of the brand of cigarette is expanded to have the same dimension as that of each characteristic value vector of the preference characteristic of consumersAnd the weight value of the characteristic value which does not exist in certain characteristic of the cigarette brand is set to be 0.
4. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 4 are as follows:
step 4.1: two consumer clients U are setiAnd UjThe feature vector of (a) is:andthen U isiAnd UjThe similarity of (2) is the average value of the similarity values of the corresponding features of the two consumer consumption preference vectors, namely:
step 4.2:calculating the similarity of corresponding features by utilizing the cosine value of the included angle of the feature value vector:
wherein, w1iAnd w2iIs characterized by F1And F2The weight of the corresponding feature value.
5. The cigarette brand recommendation algorithm based on consumer modeling according to claim 1, wherein: the specific steps of the step 5 are as follows:
step 5.1: the method comprises the following steps of obtaining consumer clusters by adopting a minimum spanning tree algorithm:
step 5.1.1: in the consumer vector space, forming the graph edges by the vertexes of each consumer forming the graph and the similarity values of two consumer vectors, further forming all consumers in a weighted undirected graph G (V, E), taking n independent consumers as an initial state, and having no edge connection between the consumers;
step 5.1.2: for all consumer vectors, calculating the similarity between any two vectors, and sorting the vectors in a descending order according to the similarity, namely sorting the vectors in a descending order according to the weight of the edges between two consumers;
step 5.1.3: selecting the edge with the maximum similarity value from the sorting queue obtained in the step 5.1.2, dequeuing, if the two consumer vertexes connected with the edge are positioned in different connected components, linking and merging the two connected components into a new connected component by the edge, and otherwise, neglecting the edge and not processing;
step 5.1.4: iteratively processing the process in step 5.1.3 until all consumer vertices are connected with the least n-1 edges, i.e. generating a minimum spanning tree;
step 5.1.5: deleting the edge with the minimum weight value in the minimum spanning tree (namely deleting the connection between the least similar consumers) in sequence until the M-1 edge is deleted, and then, forming M connected components by remaining all vertexes and edges, wherein the M connected components are M initial consumer clusters;
step 5.1.6: for each initial consumer cluster, averaging all paired vertex consumer vectors in the connected components to obtain the centroids of the M initial consumer clusters:
wherein | CiI represents a cluster CiThe number of consumers in;
step 5.1.7: recalculating the distance between the vector of each consumer and the centroid vector of each cluster, and classifying the distance into the closest consumer cluster;
step 5.1.8: after the adjustment of all the consumer clusters is completed, the centroid of each consumer cluster is recalculated;
step 5.1.9: repeating the calculating steps 5.1.7 and 5.1.8 until the error function converges;
step 5.2: calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid;
step 5.2.1: for a pair of corresponding features in the cigarette brand feature vector and each consumer clustering centroid vector, calculating the similarity of the cigarette brand feature vector and each consumer clustering centroid vector by adopting a formula (5);
step 5.2.2: repeating the step 5.2.1 to obtain similarity values of all corresponding features, and calculating the similarity of the cigarette brand features and the clustering mass centers of each consumer by adopting the average value of the similarity of the corresponding features of the cigarette brand feature vector and the clustering mass center feature vector of the consumer, such as a formula (4);
step 5.2.3: repeating the step 5.2.2 to obtain similarity values between the cigarette brand feature vectors and all the consumer clustering centroids, namely the similarity between the cigarette brand and the consumption preference features of all the consumer clustering groups;
step 5.3: and taking the consumer clustering group with the maximum similarity value as a new brand promotion group.
6. The cigarette brand recommendation algorithm based on consumer modeling of claim 1, further characterized by: and constructing a corresponding consumer consumption preference characteristic vector according to the consumer consumption record by combining the tobacco brand information and the tobacco information, specifically constructing a 14-dimensional vector which is respectively a cigarette production place, an enterprise name, a trademark, a commodity name, a specification model, a fragrance type, a grade, a suction resistance, a length, a hardness, a circumference, tar content, a smoke nicotine content and a smoke carbon monoxide content.
7. The cigarette brand recommendation algorithm based on consumer modeling of claim 1, further characterized by: the two-dimensional code scanning device is a mobile phone, and the identification software is a special App or a WeChat applet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010064585.9A CN111275459A (en) | 2020-01-20 | 2020-01-20 | Cigarette brand recommendation algorithm based on consumer modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010064585.9A CN111275459A (en) | 2020-01-20 | 2020-01-20 | Cigarette brand recommendation algorithm based on consumer modeling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111275459A true CN111275459A (en) | 2020-06-12 |
Family
ID=71003438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010064585.9A Pending CN111275459A (en) | 2020-01-20 | 2020-01-20 | Cigarette brand recommendation algorithm based on consumer modeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111275459A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113742533A (en) * | 2021-08-05 | 2021-12-03 | 北京思特奇信息技术股份有限公司 | Prim algorithm based recommendation method, system and recommendation device |
CN113781175A (en) * | 2021-09-14 | 2021-12-10 | 广西中烟工业有限责任公司 | New cigarette product recommendation method and system |
WO2022164626A1 (en) * | 2021-02-01 | 2022-08-04 | Mastercard International Incorporated | Audience recommendation using node similarity in combined contextual graph embeddings |
CN115018588A (en) * | 2022-06-24 | 2022-09-06 | 平安普惠企业管理有限公司 | Product recommendation method and device, electronic equipment and readable storage medium |
CN118365383A (en) * | 2024-06-18 | 2024-07-19 | 昆明学院 | Consumer flow identifying method based on CGAN and deep CNN |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105528716A (en) * | 2015-12-03 | 2016-04-27 | 山东烟草研究院有限公司 | Tobacco brand remote intelligent recommendation method facing retailer individual need |
CN107103488A (en) * | 2017-03-02 | 2017-08-29 | 江苏省烟草公司常州市公司 | Cigarette consumption analysis method based on collaborative filtering and clustering algorithm |
CN109885688A (en) * | 2019-03-05 | 2019-06-14 | 湖北亿咖通科技有限公司 | File classification method, device, computer readable storage medium and electronic equipment |
-
2020
- 2020-01-20 CN CN202010064585.9A patent/CN111275459A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105528716A (en) * | 2015-12-03 | 2016-04-27 | 山东烟草研究院有限公司 | Tobacco brand remote intelligent recommendation method facing retailer individual need |
CN107103488A (en) * | 2017-03-02 | 2017-08-29 | 江苏省烟草公司常州市公司 | Cigarette consumption analysis method based on collaborative filtering and clustering algorithm |
CN109885688A (en) * | 2019-03-05 | 2019-06-14 | 湖北亿咖通科技有限公司 | File classification method, device, computer readable storage medium and electronic equipment |
Non-Patent Citations (1)
Title |
---|
DONG HAN ET AL.: "Cigarette Brand Recommendation Based on Consumer Modeling", 《2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI)》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022164626A1 (en) * | 2021-02-01 | 2022-08-04 | Mastercard International Incorporated | Audience recommendation using node similarity in combined contextual graph embeddings |
US11727422B2 (en) | 2021-02-01 | 2023-08-15 | Mastercard International Incorporated | Audience recommendation using node similarity in combined contextual graph embeddings |
CN113742533A (en) * | 2021-08-05 | 2021-12-03 | 北京思特奇信息技术股份有限公司 | Prim algorithm based recommendation method, system and recommendation device |
CN113781175A (en) * | 2021-09-14 | 2021-12-10 | 广西中烟工业有限责任公司 | New cigarette product recommendation method and system |
CN113781175B (en) * | 2021-09-14 | 2023-11-28 | 广西中烟工业有限责任公司 | Cigarette new product recommending method and recommending system |
CN115018588A (en) * | 2022-06-24 | 2022-09-06 | 平安普惠企业管理有限公司 | Product recommendation method and device, electronic equipment and readable storage medium |
CN118365383A (en) * | 2024-06-18 | 2024-07-19 | 昆明学院 | Consumer flow identifying method based on CGAN and deep CNN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111275459A (en) | Cigarette brand recommendation algorithm based on consumer modeling | |
US11734725B2 (en) | Information sending method, apparatus and system, and computer-readable storage medium | |
CN105701191B (en) | Pushed information click rate estimation method and device | |
WO2018010591A1 (en) | Information push method and apparatus, server, and storage medium | |
CN107332910B (en) | Information pushing method and device | |
CN103617230B (en) | Method and system for advertisement recommendation based microblog | |
CN105574067A (en) | Item recommendation device and item recommendation method | |
JP6635587B2 (en) | Advertising sentence selection device and program | |
CN106611344A (en) | Method and device for mining potential customers | |
CN113268656A (en) | User recommendation method and device, electronic equipment and computer storage medium | |
US20140288999A1 (en) | Social character recognition (scr) system | |
CN112102029B (en) | Knowledge graph-based long-tail recommendation calculation method | |
CN110348906B (en) | Improved commodity recommendation method based on multi-type implicit feedback | |
Sudrajat et al. | Analysis of data mining classification by comparison of C4. 5 and ID algorithms | |
CN107193832A (en) | Similarity method for digging and device | |
CN111429161B (en) | Feature extraction method, feature extraction device, storage medium and electronic equipment | |
KR102351879B1 (en) | Method and device for classifying unstructured item data automatically for goods or services | |
CN103577472B (en) | Personal information acquisition, presumption, the classification of commodity, search method and system | |
Sasank et al. | Credit card fraud detection using various classification and sampling techniques: a comparative study | |
CN112437053A (en) | Intrusion detection method and device | |
CN111651678A (en) | Knowledge graph-based personalized recommendation method | |
CN118013120B (en) | Method, medium and equipment for optimizing products recommended to users based on cluster labels | |
CN116611897B (en) | Message reminding method and system based on artificial intelligence | |
CN113326432A (en) | Model optimization method based on decision tree and recommendation method | |
CN114282119B (en) | Scientific and technological information resource retrieval method and system based on heterogeneous information network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200612 |