CN112862569B - Product appearance style evaluation method and system based on image and text multi-modal data - Google Patents

Product appearance style evaluation method and system based on image and text multi-modal data Download PDF

Info

Publication number
CN112862569B
CN112862569B CN202110241232.6A CN202110241232A CN112862569B CN 112862569 B CN112862569 B CN 112862569B CN 202110241232 A CN202110241232 A CN 202110241232A CN 112862569 B CN112862569 B CN 112862569B
Authority
CN
China
Prior art keywords
style
image
product
aesthetic
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110241232.6A
Other languages
Chinese (zh)
Other versions
CN112862569A (en
Inventor
朱思羽
戚进
胡洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110241232.6A priority Critical patent/CN112862569B/en
Publication of CN112862569A publication Critical patent/CN112862569A/en
Application granted granted Critical
Publication of CN112862569B publication Critical patent/CN112862569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy
    • G06Q30/0629Directed, with specific intent or strategy for generating comparisons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention provides a product appearance style evaluation method and system based on image and text multi-modal data, which comprises the following steps: the image aesthetic style model is a multilayer convolution neural network model, the color image is input, and the multi-dimensional image style is classified into output; the image aesthetic style prediction algorithm is used for pre-training and transfer learning to predict the style type of the product image; the semantic emotion analysis module is used for processing online comments of the user by using style labels in an image aesthetic style prediction algorithm and calculating the product style tendency fed back by the user; and the multi-mode fusion evaluation module is used for fusing the product style prediction output by the image aesthetic style prediction algorithm and the product style feedback output by the semantic emotion analysis and providing a product evaluation result in the aspect of appearance style. The method integrates product image information and user feedback text information, realizes product evaluation in the aspect of appearance style based on data modeling and analysis, and has the advantages of being more objective, scientific and accurate compared with the traditional expert evaluation method.

Description

Product appearance style evaluation method and system based on image and text multi-modal data
Technical Field
The invention relates to the technical field of multi-modal data, in particular to a product appearance style evaluation method and system based on image and text multi-modal data.
Background
With the increasing comprehensive requirements of consumers and the increasing variety of commodities in recent years, the influence of product appearance on purchasing decisions of the consumers is also increasing. For many everyday consumer products such as radios, hair dryers, etc., product appearance is becoming a determining factor affecting product success. The aesthetic style of the appearance of a product is important to the overall appearance of the product and is closely related to the type of user that is attracted. The aesthetic style is generally an abstract aesthetic concept depicted by a specific vocabulary semantic, has certain subjectivity and fuzziness, and may have difference from the aesthetic association transmitted by the specific vocabulary to a user. The aesthetic style to be transmitted by a product designer is generally represented by a product image, the style actually experienced by a user is often represented in a user feedback comment, the difference between the two reflects the success degree of product style presentation, the more successful the appearance design, the closer the aesthetic style to be transmitted is to the style actually fed back by the user.
The image aesthetic style analysis is based on image processing and analysis, and by modeling the mapping relation between the image and the aesthetic style label, the rule of the aesthetic style presented by the image is discovered, so that the method can be used for predicting the aesthetic style of the product image. The aesthetic styles have greater universality, for example, the styles suitable for images such as landscapes, people and the like can also be used for describing the appearance of the product, so that the image and aesthetic style mapping relation learned based on the existing large-scale image aesthetic style classification data set can be suitable for the product image through smaller adjustment. An AVA (A Large-Scale Database for Aesthetic Visual Analysis) is an image Aesthetic dataset containing over 250000 labeled images, for a total of 14 Aesthetic style labels. A smaller labeled product image data set is created in a specific product field, only part of product images need to be collected for style labeling, and creation is completed with less cost after data enhancement.
Semantic emotion analysis is a semantic processing technology for performing emotion tendency analysis based on a text, which is rapidly developed in recent years, and can obtain emotion tendencies of some features reflected by the text by processing and analyzing the text. These features may be concrete things such as products or abstract concepts such as certain aesthetic styles. The emotional tendency is generally bipolar, positive and negative, and the more the emotional tendency is biased to be positive, the higher the corresponding feature is embodied.
The traditional appearance style evaluation method is mainly an expert scoring method, has the defect of strong subjectivity, and has more obvious defects of abstract and fuzzy tasks such as appearance style evaluation.
Patent document CN106600385A (application number: CN 201611251457.5) discloses an online product analysis system based on user tracking, which includes a user comment data module, a text data module, an image data module, a file data analysis module, an image data analysis module, a comprehensive evaluation analysis module, and a user interaction module, where the user comment data module is used to extract comment data of a commodity user, the user comment data module is respectively connected to the text data module and the image data module, the text data module is connected to the file data analysis module, the file data analysis module is connected to the comprehensive evaluation analysis module, the image data module is connected to the image data analysis module, the image data analysis module is connected to the comprehensive evaluation analysis module, and the comprehensive evaluation analysis module is connected to the user interaction module. The method is trained based on the model and the algorithm, so that the method is more real and accurate.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a product appearance style evaluation method and system based on image and text multi-modal data.
The product appearance style evaluation method based on image and text multi-mode data comprises the steps of constructing an image aesthetic style model, and performing semantic emotion analysis and multi-mode fusion evaluation by using an image aesthetic style prediction algorithm;
the image aesthetic style model is a multilayer convolution neural network model, a color image is used as input, and multi-dimensional image style classification is used as output;
the image aesthetic style prediction algorithm is used for pre-training and transfer learning to predict the style type of the product image;
the semantic emotion analysis comprises the following steps: processing online comments of the user by using a style label in an image aesthetic style prediction algorithm, and calculating a product style tendency fed back by the user;
the multi-modal fusion assessment comprises: product style prediction output by the image aesthetic style prediction algorithm and product style feedback output by semantic emotion analysis are combined, and a product evaluation result in the aspect of appearance style is provided.
Preferably, the image aesthetic style model comprises, connected in sequence:
-an input layer, the input being a color image scaled to a size of 224 x 224, the input dimension being b 224 x 3, where b is the batch size;
-4 convolutional layers, with a convolutional kernel size of 9 x 9, step size of 1, number of convolutional kernels of 64, activation function of ReLU function;
-a batch normalization layer;
-1 pooling layer with maximum pooling, pooling size 2 x 2;
-3 convolutional layers, convolutional kernel size 7 × 7, step size 1, convolutional kernel number 64, activation function is the ReLU function;
-1 pooling layer with maximum pooling, pooling size 2 x 2;
-3 convolutional layers, convolutional kernel size 5 x 5, step size 1, convolutional kernel number 128, activation function is the ReLU function;
-Dropout layer, dropout probability 0.1;
-a batch normalization layer;
-1 pooling level, using maximum pooling, size of pooling 2 x 2
-3 convolutional layers, convolutional kernel size 3 x 3, step size 1, convolutional kernel number 128, activation function is the ReLU function;
-1 pooling level, using maximum pooling, size of pooling 2 x 2;
-a flat layer, developing a b 14 x 128 dimensional feature map into a one dimensional b 14 x 128 length vector;
and a full connection layer for outputting the style classification result, wherein the number of the labels is 14, the labels correspond to 14 style labels respectively, and the activation function is Softmax.
Preferably, the loss function of the image aesthetic style model is a minimum cross entropy loss function, an Adam optimizer is used for weight updating, and the learning rate is set to be 0.0001.
Preferably, the image aesthetic style prediction algorithm adopts a transfer learning strategy, firstly, the large-scale image aesthetic style classification data set AVA is pre-trained by using 14 style labels of the data set, then, the small-scale product image style data set marked by the 14 style labels in the specific product field is finely adjusted, and a test is carried out on a test set without labels;
the prediction output of the image aesthetic style model of the test image is the style prediction result of the image, and is a 14-dimensional vector P = (P) 1 ,P 2 …P 14 ) And P satisfies:
i P i =1
wherein, P i Indicating the probability that the image belongs to the ith style.
Preferably, the semantic emotion analysis module processes online user comments by using 14 style labels in an image aesthetic style prediction algorithm, finds synonyms of the 14 style labels by using a synonym search method lemma _ names of a WordNet semantic dictionary, and expands each style label into a style word set, and includes the following steps:
step 1: for the ith style tag word, searching semantic set syncs corresponding to the word in WordNet i
Step 2: for syncets i The jth semantic syncet in (1) ij Finding out its synonym set lem by using lemma _ names method ij
And step 3: all synonym sets lem corresponding to ith style label words ij Form the ith style word Set i
Set i =∪ j lem ij
Preferably, the semantic emotion analysis comprises: after the style tag is expanded into a style word set, online user comments of a preset product are collected from an online e-commerce platform, and a comment text is cleaned and preprocessed, wherein the method comprises the following steps:
step 1: text collection, which is to use python kurilib to automatically collect online user comments;
and 2, step: text cleaning, including screening out repeated sentences, screening out sentences which do not belong to a preset language, screening out sentences which only contain non-text contents, and removing words with misspelling;
and 3, step 3: text preprocessing, including converting all characters into lower case letters, eliminating punctuation marks which do not meet the standard, eliminating stop words, and converting all verbs into current tenses.
Preferably, the semantic emotion analyzing module includes: after the comment text is cleaned and preprocessed, similarity between each word in the text and 14 style word sets is respectively calculated by using a semantic similarity calculation method lin _ similarity provided by WordNet, after all the texts are processed, similarity results of all the words are counted to obtain style tendency results fed back by a user, wherein the kth word w is a word with a certain similarity value k With the ith style word Set i Similarity Sim of k,i The calculation formula is as follows:
Figure BDA0002962259530000041
Sim k,i,t for the k-th word w k With the ith style word Set i Similarity of the t-th word:
Figure BDA0002962259530000042
wherein synset km For the k-th word w k Semantic collections Synsets of k M semantic of (1), synset itn Set for the ith style word Set i Semantic collection syncets of the t-th word it The nth semantic, a semantic similarity calculation method provided by lin _ similarity for WordNet;
counting the similarity calculation results of all the words, and the normalized tendency value O of the ith style fed back by the user i Comprises the following steps:
O' i =∑ k Sim k,i
O i =O′ i /∑ i O′ i
finally, the style tendency O = (O) of the user feedback of the product is obtained 1 ,O 2 …O 14 ) As output of the semantic emotion analysis.
Preferably, the multimodal fusion assessment comprises: comparing a style predicted value P output by an image aesthetic style prediction algorithm with a style tendency feedback value O output by a semantic emotion analysis module;
absolute value | P of difference between each corresponding position element of P and O i -O i I represents the difference size of the information conveyed by the product image and the user feedback information from the view point of the ith aesthetic style; | P of each style label i -O i Sum of | ∑ i |P i -O i L represents the difference between the overall aesthetic style of the product image and the style fed back by the user; index | P i -O i Sum sigma i |P i -O i And l, the success degree of the product in the aspect of style presentation is evaluated in an auxiliary mode, and the larger the index value is, the larger the difference between the product image and the result fed back by the user is, and the more unsuccessful the product is in the aspect of style presentation.
Preferably, the multimodal fusion assessment comprises: comparing and fusing the style predicted value P output by the image aesthetic style prediction algorithm with the style tendency feedback value O output by the semantic emotion analysis to obtain a comprehensive product appearance style evaluation F = (F) 1 ,F 2 …F 14 ) Comprehensive evaluation of the i-th aesthetic Style F i Is P i And O i The fusion result of (2):
F′ i =(P i +O i )/(2*|P i -O i |)
F i =F′ i /∑ i F′ i
wherein, P i Indicates the probability that the image belongs to the ith style, O i Normalized propensity values for the ith style representing user feedback.
The product appearance style evaluation system based on image and text multi-modal data comprises an image aesthetic style model, an image aesthetic style prediction algorithm, a semantic emotion analysis module and a multi-modal fusion evaluation module;
the image aesthetic style model is a multilayer convolution neural network model, a color image is used as input, and multi-dimensional image style classification is used as output;
the image aesthetic style prediction algorithm is used for pre-training and transfer learning to predict the style type of the product image;
the semantic emotion analysis module is used for processing online comments of the user by using style labels in an image aesthetic style prediction algorithm and calculating the product style tendency fed back by the user;
the multi-mode fusion evaluation module is used for fusing product style prediction output by the image aesthetic style prediction algorithm and product style feedback output by semantic emotion analysis and providing a product evaluation result in the aspect of appearance style.
Compared with the prior art, the invention has the following beneficial effects:
(1) The method integrates product image information and user feedback text information, can realize product evaluation in the aspect of appearance style based on data modeling and analysis, and has the advantages of being more objective, scientific and accurate compared with the traditional expert evaluation method;
(2) According to the invention, through semantic emotion analysis, a large amount of texts can be rapidly analyzed, and the method plays an important role in the background of Internet big data;
(3) According to the invention, through multi-mode data, data in different modes such as images, texts, voice and the like are fused, multiple information sources can be mutually supplemented, and the reflection of real information is more accurate than that of single-mode data.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of a method for product evaluation based on multimodal data in accordance with the present invention;
FIG. 2 is a schematic structural diagram of an image aesthetic style model according to the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example 1:
the product appearance style evaluation method based on the image and text multi-mode data comprises the steps of constructing an image aesthetic style model, and performing semantic emotion analysis and multi-mode fusion evaluation by using an image aesthetic style prediction algorithm;
the image aesthetic style model is a multilayer convolution neural network model, a color image is used as input, and multi-dimensional image style classification is used as output;
the image aesthetic style prediction algorithm is used for pre-training and transfer learning and predicting the style type of the product image;
the semantic emotion analysis comprises the following steps: processing online comments of the user by using a style label in an image aesthetic style prediction algorithm, and calculating a product style tendency fed back by the user;
the multimodal fusion assessment comprises: and product style feedback output by product style prediction and semantic emotion analysis output by the image aesthetic style prediction algorithm is fused, and a product evaluation result in the aspect of appearance style is provided.
As shown in fig. 2, the image aesthetic style model comprises sequentially connected:
input layer, input is a color image scaled to size 224 × 224, input dimension b 224 × 3, where b is batch size batch _ size;
-4 convolutional layers, with a convolutional kernel size of 9 x 9, step size of 1, number of convolutional kernels of 64, activation function of ReLU function;
-a batch normalization layer;
-1 pooling level, using maximum pooling, size of pooling 2 x 2;
-3 convolutional layers, convolutional kernel size 7 × 7, step size 1, convolutional kernel number 64, activation function is the ReLU function;
-1 pooling layer with maximum pooling, pooling size 2 x 2;
-3 convolutional layers, convolutional kernel size 5 x 5, step size 1, number of convolutional kernels 128, activation function as ReLU function;
-Dropout layer, dropout probability 0.1;
-a batch normalization layer;
-1 pooling level, using maximum pooling, size of pooling 2 x 2
-3 convolutional layers, convolutional kernel size 3 x 3, step size 1, convolutional kernel number 128, activation function is the ReLU function;
-1 pooling level, using maximum pooling, size of pooling 2 x 2;
-a flat layer, developing a b 14 x 128 dimensional feature map into a one dimensional b 14 x 128 length vector;
and a full connection layer for outputting the style classification result, wherein the number of the labels is 14, the labels correspond to 14 style labels respectively, and the activation function is Softmax.
The loss function of the image aesthetic style model is a minimum cross entropy loss function, an Adam optimizer is used for weight updating, and the learning rate is set to be 0.0001. The image aesthetic style prediction algorithm adopts a transfer learning strategy, firstly, pre-training is carried out on a large-scale image aesthetic style classification data set AVA by using 14 style labels of a data set, then, fine adjustment is carried out on a small-scale product image style data set marked by the 14 style labels in the specific product field, and testing is carried out on a label-free test set; the prediction output of the image aesthetic style model of the test image is the style prediction result of the image and is a 14-dimensional vector P = (P) 1 ,P 2 …P 14 ) And P satisfies:
i P i =1
wherein, P i Indicating the probability that the image belongs to the ith style.
The semantic emotion analysis module processes online user comments by using 14 style labels in an image aesthetic style prediction algorithm, finds out synonyms of the 14 style labels by using a synonym search method lemma _ names of a WordNet semantic dictionary, and expands each style label into a style word set, and comprises the following steps of:
step 1: for the ith style tag word, searching semantic set syncs corresponding to the word in WordNet i
Step 2: for Synsets i The jth semantic synset in (1) ij Finding out its synonym set lem by using lemma _ names method ij
And step 3: all synonym sets lem corresponding to the ith style label words ij Form the ith style word Set i
Set i =∪ j lem ij
The semantic emotion analysis comprises the following steps: after the style tags are expanded into style word sets, online user comments of preset products are collected from an online e-commerce platform, and comment texts are cleaned and preprocessed, wherein the method comprises the following steps:
step 1: text collection, which is to use python kurilib to automatically collect online user comments;
step 2: text cleaning, including screening out repeated sentences, screening out sentences which do not belong to a preset language, screening out sentences which only contain non-text contents, and removing words with misspelling;
and step 3: text preprocessing, including converting all characters into lower case letters, eliminating punctuation marks which do not meet the standard, eliminating stop words, and converting all verbs into current tenses.
The semantic emotion analysis module comprises: after the comment text is cleaned and preprocessed, the similarity between each word in the text and 14 style word sets is respectively calculated by using a semantic similarity calculation method lin _ similarity provided by WordNet, after all the texts are processed, the similarity results of all the words are counted to obtain a style tendency result fed back by a user, and the kth word w is a word w k With the ith style word Set i Similarity Sim of k,i Is calculated by the formula:
Figure BDA0002962259530000081
Sim k,i,t For the k-th word w k With the ith style word Set i Similarity of the t-th word:
Figure BDA0002962259530000082
wherein, synset km For the k-th word w k Semantic collections Synsets of k M semantic of (1), synset itn Set for ith style word Set i Semantic set syncs of the t-th word it The nth semantic in (1), the semantic similarity calculation method provided by lin _ similarity for WordNet;
counting the similarity calculation results of all words, and the normalized tendency value O fed back by the user to the ith style i Comprises the following steps:
O' i =∑ k Sim k,i
O i =O′ i /∑ i O′ i
finally, the style tendency O = (O) of the user feedback of the product is obtained 1 ,O 2 …O 14 ) As output of the semantic emotion analysis.
The multi-modal fusion assessment comprises: comparing the style predicted value P output by the image aesthetic style prediction algorithm with the style tendency feedback value O output by the semantic emotion analysis module;
absolute value | P of difference between each corresponding position element of P and O i -O i I represents the difference size of the information conveyed by the product image and the user feedback information from the view point of the ith aesthetic style; | P of each style label i -O i Sum of | ∑ i |P i -O i L represents the difference between the overall aesthetic style of the product image and the style fed back by the user; the index | P i -O i I and E i |P i -O i And | the success degree of the product in the aspect of style presentation is evaluated in an auxiliary manner, and the larger the index value is, the larger the difference between the product image and the result fed back by the user is, and the less successful the product is in the aspect of style presentation.
The multi-modal fusion assessment comprises: comparing and fusing a style predicted value P output by an image aesthetic style prediction algorithm with a style tendency feedback value O output by semantic emotion analysis to obtain a comprehensive product appearance style evaluation F = (F) 1 ,F 2 …F 14 ) Comprehensive evaluation of the i-th aesthetic Style F i Is P i And O i The fusion result of (2):
F′ i =(P i +O i )/(2*|P i -O i |)
F i =F′ i /∑ i F′ i
wherein, P i Indicates the probability that the image belongs to the ith style, O i A normalized propensity value for the ith style representing user feedback.
The product appearance style evaluation system based on image and text multi-modal data comprises an image aesthetic style model, an image aesthetic style prediction algorithm, a semantic emotion analysis module and a multi-modal fusion evaluation module, as shown in figure 1;
the image aesthetic style model is a multilayer convolution neural network model, a color image is used as input, and multi-dimensional image style classification is used as output;
the image aesthetic style prediction algorithm is used for pre-training and transfer learning and predicting the style type of the product image;
the semantic emotion analysis module is used for processing online comments of the user by using style labels in an image aesthetic style prediction algorithm and calculating the product style tendency fed back by the user;
the multi-mode fusion evaluation module is used for fusing product style prediction output by the image aesthetic style prediction algorithm and product style feedback output by semantic emotion analysis and providing a product evaluation result in the aspect of appearance style.
The present invention will be described in more detail below by way of preferred examples.
Example 2:
the method is based on image aesthetic analysis and semantic emotion analysis technology, by means of a mature online e-commerce platform which is developed at present, data modeling analysis is carried out by utilizing a large amount of quickly-obtained product images and text comment data, the product image style is automatically predicted, the comment text is automatically subjected to style feedback analysis, and intelligent support is provided for product evaluation in the aspect of appearance style through comparison and fusion of multi-mode data.
The invention provides a product evaluation method based on multi-modal data, which comprises the following steps:
step 1: and constructing a product image style classification data set of a specific product field. Collecting product images in a specific product field on the Internet, manually marking aesthetic styles according to AVA data set standards, and then performing a data enhancement step to form a small product image style classification data set;
step 2: constructing an image aesthetic style model, pre-training the model by using an AVA data set, adopting an Adam optimizer and adopting a default learning rate of 0.0001;
and step 3: using the constructed product image style classification data set to finely adjust the model, adopting an Adam optimizer and adopting a default learning rate of 0.00005;
and 4, step 4: testing the model on a product image data set without a label, wherein the aesthetic style prediction output corresponding to the product image is the style prediction value P of the product image;
and 5: according to 14 style labels in the image aesthetic style prediction algorithm, expanding each style label into a style word Set by using a lemma _ names method of a semantic dictionary WordNet i
And 6: collecting user comments from an Amazom.com e-commerce platform, and cleaning and preprocessing texts;
and 7: calculating and counting similarity results of all words and 14 style word sets respectively to obtain style tendency O fed back by a user of a certain product;
and 8: and comparing the style predicted value P with the style tendency feedback value O to obtain a related conclusion of the style presentation success degree of the product appearance, and fusing the style predicted value P and the style tendency feedback value O to obtain a comprehensive product appearance style evaluation F.
The method comprises the following steps of 1, constructing a product image style classification data set, wherein the source of the product image comprises commodity description of an e-commerce platform, blogs, forums of related products and the like, the product image is collected and then uniformly scaled to 224 × 224, the aesthetic styles of referenced AVA data sets are 14, and the data enhancement step can enlarge the data set scale and increase the diversity of data without influencing the quality, and comprises random rotation and cutting operations.
The image aesthetic style model structure in the step 2 sequentially comprises an input layer, a convolution layer and a pooling layer which are sequentially stacked, a full-connection layer and an output layer, and the loss function is a minimum cross entropy loss function. Construction was performed using the tenserflow and keras deep learning frameworks.
In the step 3, the model is finely adjusted by using the product image style classification data set on the basis of the step 2, because the pre-training in the step 2 enables the model to effectively extract the characteristics of the low-level image, the data set required for fine adjustment is smaller, and the acceptable accuracy of the test set can be achieved by using less training cycles.
The unlabeled product image data set in the step 4 comprises product images corresponding to subsequent user comment processing, the model given in the step 3 outputs 14 styles of predicted values P according to the input product images, and the softmax function of the output layer enables the 14 styles of predicted values P i The sum is 1.
Step 5, expanding the style tag into a style word Set i Firstly, obtaining all semantemes of the ith style tag word by using a semantic query method of WordNet, then traversing all semantemes, querying to obtain synonyms of each semanteme by using a lemma _ names method of WordNet, and summarizing all synonyms of the ith style tag word into a style word Set of the synonyms i . The synonymous word groups are linked and generalized in the style word set by the '_' symbol, and when the style word set is used in step 6, the words containing the '_' symbol still existWith word form processing, the built-in mechanism of WordNet can process word semantics and phrase semantics compatibly. The WordNet semantic dictionary runs using the nltk (native language toolkit) library provided by the python language.
And 6, collecting, cleaning and preprocessing user comments on the Amazom.com E-commerce platform, wherein the user comments are operated by using a urllib library, an nltk library and a Beautiful Soup library, and the user comments correspond to product pictures for predicting the aesthetic styles of the images. Firstly, a url lib library is used for obtaining source codes of comment webpages of E-commerce platform users, a Beautiful Soup library is used for analyzing, comment text contents of all verified users are extracted, repeated sentences are screened out by comparing the text contents, then a WordNet semantic dictionary is used for searching all words, at the moment, words which do not belong to English words, misspelled words, emoticons and the like are screened out because the words cannot be searched out in WordNet, and corresponding comment sentences are screened out and misspelled words are removed through manual screening. The cleaned user reviews are preprocessed using the nltk library, converting all characters to lower case letters, a single or multiple consecutive "! ", a plurality of consecutive" ","; ", single, or multiple consecutive"? The words which have no specific meaning or are redundant or are easy to cause misunderstanding are sorted into stop words, the words belonging to the stop words in the user comments are removed, and finally all verbs are converted into the current tense.
And 7, calculating the similarity between each word and 14 style word sets respectively, wherein the similarity between each word and the style word sets is the maximum value of the similarity between each word and the words in the style word sets, and the similarity between each word and each word is the maximum similarity between semantic sets of the two words calculated by a lin _ similarity method. Counting the similarity Simk, i of 14 style word sets of all words respectively as a user feedback tendency result, wherein a style tendency value oi fed back by a user is the sum of the similarity Simk, i of all words and the style word sets of the style, and normalizing the style tendency values O of 14 styles fed back by the user in all styles i The sum is 1.
And 8, comparing the style predicted value P with the style tendency feedback value O, wherein each type of the style predicted value P and the style tendency feedback value OThe closer the predicted value and the feedback value of the style are, the more successful the style presentation is, and the appearance style presentation success degree of the product is the L1 distance of the two vectors P and O. The average value and the distance of the style predicted value P and the style tendency feedback value O need to be comprehensively considered, and the element F of the final appearance style evaluation result F i Is O i And P i The quotient of the average value and the distance of (c), and the result of normalization between all the styles, the normalization makes the sum of the evaluation values of the 14 fused styles 1.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (6)

1. A product appearance style evaluation method based on image and text multi-mode data is characterized by comprising the steps of constructing an image aesthetic style model, and performing semantic emotion analysis and multi-mode fusion evaluation by using an image aesthetic style prediction algorithm;
the image aesthetic style model is a multilayer convolution neural network model, takes a color image as input and takes multi-dimensional image style classification as output;
the image aesthetic style prediction algorithm is used for pre-training and transfer learning and predicting the style type of the product image;
the semantic emotion analysis comprises the following steps: processing online comments of the user by using a style label in an image aesthetic style prediction algorithm, and calculating a product style tendency fed back by the user;
the multi-modal fusion assessment comprises: product style prediction output by an image aesthetic style prediction algorithm and product style feedback output by semantic emotion analysis are fused, and a product evaluation result in the aspect of appearance style is provided;
the image aesthetic style prediction algorithm adopts a transfer learning strategy, firstly, the 14 style labels of a data set are used for pre-training on a large-scale image aesthetic style classification data set AVA, then, fine tuning is carried out on an image style data set of a small-scale product marked by the 14 style labels in the specific product field, and a test is carried out on a label-free test set;
the prediction output of the image aesthetic style model of the test image is the style prediction result of the image and is a 14-dimensional vector P = (P) 1 ,P 2 …P 14 ) And P satisfies:
i P i =1
wherein, P i Representing the probability that the image belongs to the ith style;
the semantic emotion analysis module processes online user comments by using 14 style labels in an image aesthetic style prediction algorithm, finds out synonyms of the 14 style labels by using a synonym search method lemma _ names of a WordNet semantic dictionary, and expands each style label into a style word set, and comprises the following steps of:
step 1: for the ith style label word, searching the semantic set Syncuts corresponding to the word in WordNet respectively i
And 2, step: for syncets i The jth semantic syncet in (1) ij Finding out its synonym set lem by using lemma _ names method ij
And step 3: all synonym sets corresponding to the ith style label wordlem ij Form the ith style word Set i
Set i =∪ j lem ij
The semantic emotion analysis comprises the following steps: after the style tag is expanded into a style word set, online user comments of a preset product are collected from an online e-commerce platform, and a comment text is cleaned and preprocessed, wherein the method comprises the following steps:
step 1: text collection, which is to use python kurilib to automatically collect online user comments;
step 2: text cleaning, including screening out repeated sentences, screening out sentences which do not belong to a preset language, screening out sentences which only contain non-text contents, and removing words with misspelling;
and 3, step 3: text preprocessing, including converting all characters into lower case letters, eliminating punctuation marks which do not meet the standard, eliminating stop words and converting all verbs into current tenses;
the semantic emotion analysis module comprises: after the comment text is cleaned and preprocessed, similarity between each word in the text and 14 style word sets is respectively calculated by using a semantic similarity calculation method lin _ similarity provided by WordNet, after all the texts are processed, similarity results of all the words are counted to obtain style tendency results fed back by a user, wherein the kth word w is a word with a certain similarity value k With the ith style word Set i Similarity Sim of k,i The calculation formula is as follows:
Figure FDA0003917948700000021
Sim k,i,t for the k-th word w k With the ith style word Set i Similarity of the t-th word:
Figure FDA0003917948700000022
/>
wherein, synset km For the k-th word w k Semantic collections Synsets of k M semantic of (1), synset itn Set for ith style word Set i Semantic set syncs of the t-th word it The nth semantic, a semantic similarity calculation method provided by lin _ similarity for WordNet;
counting the similarity calculation results of all words, and the normalized tendency value O fed back by the user to the ith style i Comprises the following steps:
O′ i =Σ k Sim k,i
O i =O′ i /∑ i O′ i
finally, the style tendency O = (O) of the user feedback of the product is obtained 1 ,O 2 …O 14 ) As output of the semantic emotion analysis.
2. The method according to claim 1, wherein the image aesthetic style model comprises, connected in sequence:
input layer, input is a color image scaled to size 224 × 224, input dimension b 224 × 3, where b is batch size batch _ size;
-4 convolutional layers, with a convolutional kernel size of 9 x 9, step size of 1, number of convolutional kernels of 64, activation function of ReLU function;
-a batch normalization layer;
-1 pooling level, using maximum pooling, size of pooling 2 x 2;
-3 convolutional layers, with a convolutional kernel size of 7 × 7, step size of 1, number of convolutional kernels of 64, activation function of ReLU function;
-1 pooling level, using maximum pooling, size of pooling 2 x 2;
-3 convolutional layers, convolutional kernel size 5 x 5, step size 1, number of convolutional kernels 128, activation function as ReLU function;
-Dropout layer, dropout probability 0.1;
-a batch normalization layer;
-1 pooling level, using maximum pooling, size of pooling 2 x 2
-3 convolutional layers, with a convolutional kernel size of 3 x 3, step size of 1, number of convolutional kernels of 128, activation function of ReLU function;
-1 pooling layer with maximum pooling, pooling size 2 x 2;
-a Flatten layer, developing the two dimensional b 14 x 128 feature map into a vector of one dimension b 14 x 128 length;
a full connection layer, outputting the style classification result, wherein the number of the tags is 14, the tags respectively correspond to 14 style tags, and the activation function is Softmax.
3. The method for evaluating the appearance style of a product based on image and text multi-modal data according to claim 1, wherein the loss function of the image aesthetic style model is a minimized cross entropy loss function, the Adam optimizer is used for weight update, and the learning rate is set to 0.0001.
4. The method of claim 1, wherein the multi-modal fusion assessment comprises: comparing a style predicted value P output by an image aesthetic style prediction algorithm with a style tendency feedback value O output by a semantic emotion analysis module;
absolute value | P of difference between each corresponding position element of P and O i -O i I represents the difference size of the information conveyed by the product image and the user feedback information from the view point of the ith aesthetic style; | P of each style label i -O i Sum of | ∑ i |P i -O i L represents the difference between the overall aesthetic style of the product image and the style fed back by the user; the index | P i -O i Sum sigma i |P i -O i And l, the success degree of the product in the aspect of style presentation is evaluated in an auxiliary mode, and the larger the index value is, the larger the difference between the product image and the result fed back by the user is, and the more unsuccessful the product is in the aspect of style presentation.
5. The method of claim 1, wherein the multi-modal data based product appearance style assessment is based on image and text multi-modal dataThe fusion evaluation included: comparing and fusing the style predicted value P output by the image aesthetic style prediction algorithm with the style tendency feedback value O output by the semantic emotion analysis to obtain a comprehensive product appearance style evaluation F = (F) 1 ,F 2 …F 14 ) Comprehensive evaluation of the i-th aesthetic Style F i Is P i And O i The fusion result of (2):
F′ i =(P i +O i )/(2*|P i -O i |)
F i =F′ i /∑ i F′ i
wherein, P i Indicates the probability that the image belongs to the ith style, O i Normalized propensity values for the ith style representing user feedback.
6. A product appearance style evaluation system based on image and text multi-modal data is characterized in that the product appearance style evaluation method based on the image and text multi-modal data, which is disclosed by any one of claims 1 to 5, is adopted, and comprises an image aesthetic style model, an image aesthetic style prediction algorithm, a semantic emotion analysis module and a multi-modal fusion evaluation module;
the image aesthetic style model is a multilayer convolution neural network model, a color image is used as input, and multi-dimensional image style classification is used as output;
the image aesthetic style prediction algorithm is used for pre-training and transfer learning and predicting the style type of the product image;
the semantic emotion analysis module is used for processing online comments of the user by using style labels in an image aesthetic style prediction algorithm and calculating the product style tendency fed back by the user;
the multi-mode fusion evaluation module is used for fusing product style prediction output by the image aesthetic style prediction algorithm and product style feedback output by semantic emotion analysis and providing a product evaluation result in the aspect of appearance style.
CN202110241232.6A 2021-03-04 2021-03-04 Product appearance style evaluation method and system based on image and text multi-modal data Active CN112862569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110241232.6A CN112862569B (en) 2021-03-04 2021-03-04 Product appearance style evaluation method and system based on image and text multi-modal data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110241232.6A CN112862569B (en) 2021-03-04 2021-03-04 Product appearance style evaluation method and system based on image and text multi-modal data

Publications (2)

Publication Number Publication Date
CN112862569A CN112862569A (en) 2021-05-28
CN112862569B true CN112862569B (en) 2023-04-07

Family

ID=75991746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110241232.6A Active CN112862569B (en) 2021-03-04 2021-03-04 Product appearance style evaluation method and system based on image and text multi-modal data

Country Status (1)

Country Link
CN (1) CN112862569B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204624B (en) * 2021-06-07 2022-06-14 吉林大学 Multi-feature fusion text emotion analysis model and device
CN116611131B (en) * 2023-07-05 2023-12-26 大家智合(北京)网络科技股份有限公司 Automatic generation method, device, medium and equipment for packaging graphics

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971200A (en) * 2017-03-13 2017-07-21 天津大学 A kind of iconic memory degree Forecasting Methodology learnt based on adaptive-migration
CN109902912A (en) * 2019-01-04 2019-06-18 中国矿业大学 A kind of personalized image aesthetic evaluation method based on character trait

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10685434B2 (en) * 2016-03-30 2020-06-16 Institute Of Automation, Chinese Academy Of Sciences Method for assessing aesthetic quality of natural image based on multi-task deep learning
CN108596051A (en) * 2018-04-04 2018-09-28 浙江大学城市学院 A kind of intelligent identification Method towards product style image
CN111950655B (en) * 2020-08-25 2022-06-14 福州大学 Image aesthetic quality evaluation method based on multi-domain knowledge driving

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971200A (en) * 2017-03-13 2017-07-21 天津大学 A kind of iconic memory degree Forecasting Methodology learnt based on adaptive-migration
CN109902912A (en) * 2019-01-04 2019-06-18 中国矿业大学 A kind of personalized image aesthetic evaluation method based on character trait

Also Published As

Publication number Publication date
CN112862569A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN112184525B (en) System and method for realizing intelligent matching recommendation through natural semantic analysis
CN107633007B (en) Commodity comment data tagging system and method based on hierarchical AP clustering
CN112001187B (en) Emotion classification system based on Chinese syntax and graph convolution neural network
CN110458627B (en) Commodity sequence personalized recommendation method for dynamic preference of user
CN109923557A (en) Use continuous regularization training joint multitask neural network model
CN112001185A (en) Emotion classification method combining Chinese syntax and graph convolution neural network
CN112100344A (en) Financial field knowledge question-answering method based on knowledge graph
CN112001186A (en) Emotion classification method using graph convolution neural network and Chinese syntax
CN110532386A (en) Text sentiment classification method, device, electronic equipment and storage medium
CN112905739B (en) False comment detection model training method, detection method and electronic equipment
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN112862569B (en) Product appearance style evaluation method and system based on image and text multi-modal data
CN111259153A (en) Attribute-level emotion analysis method of complete attention mechanism
CN111353044A (en) Comment-based emotion analysis method and system
Rohman et al. Natural Language Processing on Marketplace Product Review Sentiment Analysis
CN114218392B (en) Futures question-answer oriented user intention identification method and system
CN117056451A (en) New energy automobile complaint text aspect-viewpoint pair extraction method based on context enhancement
Liu et al. A deep learning-based sentiment analysis approach for online product ranking with probabilistic linguistic term sets
CN111400449A (en) Regular expression extraction method and device
CN111563361A (en) Text label extraction method and device and storage medium
CN117235253A (en) Truck user implicit demand mining method based on natural language processing technology
CN115934936A (en) Intelligent traffic text analysis method based on natural language processing
CN111859910B (en) Word feature representation method for semantic role recognition and fusing position information
CN111061939B (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN114238577B (en) Multi-task learning emotion classification method integrating multi-head attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant