CN113298365A

CN113298365A - LSTM-based cultural additional value assessment method

Info

Publication number: CN113298365A
Application number: CN202110515653.3A
Authority: CN
Inventors: 倪渊; 张腾; 韩鹏飞; 徐磊; 齐林; 王佳
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2021-08-24
Anticipated expiration: 2041-05-12
Also published as: CN113298365B

Abstract

The invention belongs to the technical field of culture added value assessment, and relates to a LSTM-based culture added value assessment method, which comprises the following steps of 1: constructing a three-dimensional index system based on a person-enterprise-society; step 2: establishing a characteristic word list of a comment material library representing a cultural product to be evaluated; and step 3: extracting the characteristic sentences to obtain characteristic sentence data; and 4, step 4: training an LSTM network model; and 5: carrying out accuracy test and prediction on the LSTM network model to obtain an emotion value; step 6: weighting indexes of the three-dimensional index system in the step 1; and 7: and establishing a culture added value calculation equation model to obtain a culture added evaluation value. The method disclosed by the invention optimizes the defects that the evaluation indexes in the traditional evaluation model are too subjective and not easy to quantify and the like, and is suitable for the problems of large scale of comment data and the like in the environment of a research network platform.

Description

LSTM-based cultural additional value assessment method

Technical Field

The invention belongs to the technical field of culture added value evaluation, relates to a LSTM-based culture added value evaluation method, and particularly relates to a LSTM (long-short term memory artificial neural network) -based culture added value evaluation method.

Background

The rapid development of internet technology leads to a digital economic trend. Under the background of a new era, the cultural and creative industry gradually goes to digitization and intellectualization, and brings different cultural experiences to people. A series of cultural innovation forms and new forms are derived through the organic fusion of the numbers, the cultures and the platforms, so that the cultural innovation products do not simply reproduce the traditional culture, the combination and the symbiosis of the products and different cultures are realized through the digital technology, the hollow and rigid culture symbols live, and more cultural added values are brought to the products. For example, the network cultural relics museum creates countless 'net red products': countless inhaling powder, harmony and imperial application of adhesive tape, etc. The high cultural added value enables the cultural products to meet the mental and cultural demands of consumers, become an important means for merchants to win the favor of the consumers, and create unique cultural brand images, so that excellent cultures stored in the cultural products can be brought into the lives of common people, and become cultural carriers and propagators.

Therefore, the cultural added value promotion is taken as the main trend of the development of the cultural creation industry, and a new round of thinking of the cultural creation enterprises and academia is also initiated along with the trend: how much the cultural added value is improved to the original product, how much the different cultural elements and the product are fused to the original product, how much the cultural added value is improved, how to use the rules behind the added values to guide the design and brand formation of the cultural product? ". The resolution of these key questions must first be answered: "what the cultural added value is formed", and "how to measure the cultural added value", however, the research on the two basic problems is mainly based on qualitative analysis, and the exploration of a cultural added value quantification method is lacked. In view of the above, the method analyzes the connotation and the structure of the cultural additional value from the emotion perspective; by taking product comment data on a network platform as support, the culture added value evaluation method based on LSTM fine-grained sentiment analysis is provided, and reference is provided for subsequent corresponding research.

Disclosure of Invention

The invention aims to: a cultural added value evaluation method based on an LSTM neural network is provided, an index system of the cultural added value and an LSTM emotion analysis evaluation model are constructed, and the problems provided in the background technology are solved.

The invention is realized by the following technical scheme:

a cultural added value assessment method based on LSTM comprises the following steps:

step 1: constructing a three-dimensional index system based on a person-enterprise-society from the hierarchical functional view of cultural added value;

step 2: preparing a comment corpus of the cultural product to be evaluated, performing word segmentation processing on the corpus, and establishing a feature word list of a comment corpus representing the cultural product to be evaluated based on a TF-IDF algorithm;

and step 3: extracting the characteristic sentences to obtain characteristic sentence data;

and 4, step 4: training an LSTM network model by using the characteristic sentence data extracted in the step (3), selecting cross entropy as a loss function parameter, and waiting for the convergence of the loss function to obtain a learning process curve;

and 5: carrying out accuracy test and prediction on the LSTM network model to obtain an emotion value;

step 6: weighting indexes of the three-dimensional index system in the step 1;

and 7: and establishing a culture added value calculation equation model to obtain a culture added evaluation value.

On the basis of the technical scheme, the step 1 specifically comprises the following steps: establishing a three-dimensional index system based on the individual-enterprise-society by referring to the existing relevant documents of culture added value evaluation and level function visual angles;

the personal-enterprise-social based three-dimensional index system comprises: 3 primary indexes;

the 3 primary indicators include: cultural spiritual enjoyment, cultural brand shaping and cultural essence inheritance;

the cultural spiritual enjoyment includes the following two-level indicators: the ornamental value of the cultural product and the artistry of the cultural product;

the culture brand modeling comprises the following two-level indexes: popularity of cultural brands and loyalty of cultural brands;

the cultural inheritance includes the following two-level indexes: cultural inheritance and cultural dissemination.

On the basis of the technical scheme, the basic unit of the comment material library is a single comment;

the specific steps of the step 2 are as follows:

step 2.1: segmenting the comments of the comment corpus by calling a segmentation module of the jieba tool to obtain a corpus segmentation result;

step 2.2: and setting parameters such as a necessary word frequency retention threshold value and the like by adopting a TF-IDF algorithm of a jieba tool to obtain a characteristic word list required for expressing the whole comment corpus.

On the basis of the technical scheme, the specific steps of the step 2.2 are as follows:

step 2.2.1: extracting keywords by using a TF-IDF (word frequency-inverse document frequency) algorithm, which specifically comprises the following steps: the calculation is carried out by using the formulas (1), (2) and (3),

wherein, TF_ωThe term frequency of the entry omega;

wherein, the IDF is the reverse file frequency; if the number of effective comment data containing a certain entry is less, the IDF of the entry is larger, and the entry has good category distinguishing capability;

TFIDF＝TF_ω*IDF (3)

wherein, TFIDF is: word frequency-inverse document frequency;

step 2.2.2: determining a word frequency retention threshold, and screening the entries with the TFIDF value higher than the word frequency retention threshold as keywords (for example, determining the word frequency retention threshold as: 20); the screening tends to filter out common words and retain relatively important words;

then, carrying out word frequency statistics on the keywords by using a Counter library to obtain candidate characteristic words;

the Counter library is one of python, belongs to a subclass of a dictionary, elements are stored as keywords of the dictionary, and the times of occurrence of the keywords are stored as corresponding numerical values;

and finally, according to a personal-enterprise-social three-dimensional index system, after manual screening and identification, classifying the candidate feature words in a grading way to obtain a feature word list required by the whole comment material library.

On the basis of the technical scheme, the characteristic sentence comprises: displaying the characteristic sentences and the implicit characteristic sentences;

the specific steps of the step 3 are as follows:

firstly, extracting an explicit characteristic sentence;

performing word-by-word traversal on the word segmentation results of all the corpora, comparing the word segmentation results with the feature word list in the step 2, and taking the matched feature words as feature attributes of the comments where the entries are located;

extracting comments with characteristic attributes, and marking the comments as explicit characteristic sentences;

then, using a standard NLP platform to perform dependency sentence pattern analysis on the extracted explicit characteristic sentences, and extracting modifiers of the explicit characteristic sentences;

the specific steps of extracting the modifiers of the explicit characteristic sentences are as follows: performing word-by-word traversal on the entries of the explicit characteristic sentences, comparing the entries with the modified words of the HowNet emotion dictionary, and taking the matched modified words as modified words of the explicit characteristic sentences where the entries are located;

the HowNet emotion dictionary comprises: adjectives, nouns, verbs, adverbs, and combinations thereof;

aiming at the explicit characteristic sentences matched with the modifiers, the following processing is carried out:

taking the feature words of the display feature sentences as leading words, taking the modifying words of the display feature sentences as emotion words, and constructing attribute feature-emotion word pairs so as to obtain attribute feature-emotion word-attribute emotion word pair weights;

the attribute characteristics are as follows: a dominant word;

and recording the attribute emotion word pair weight as: SQ, calculated according to equation (4),

the second step is that: extracting an implicit characteristic sentence;

for the characteristic sentences which are not matched with the characteristic words, performing word-by-word traversal on the entries of the characteristic sentences, and comparing the characteristic sentences with the modified words of the HowNet emotion dictionary;

when the characteristic sentence which is not matched with the characteristic word is not matched with the modifier, deleting the characteristic sentence;

when the characteristic sentences which are not matched with the characteristic words are matched with the modifiers, the matched modifiers are used as modifiers of the characteristic sentences of the entry, and the modifiers are used as emotion words;

then, according to the obtained attribute feature-emotion word-attribute emotion word pair weight, selecting the attribute feature with the maximum attribute emotion word pair weight as the feature word of the feature sentence which is not matched with the feature word according to the emotion word in the feature sentence which is not matched with the feature word;

taking the characteristic sentence of the obtained characteristic words which is not matched with the characteristic words as an implicit characteristic sentence;

the standard NLP platform is a natural language processing toolkit, and integrates a plurality of very practical functions, including word segmentation, part of speech tagging, syntactic analysis and the like; the Standford NLP platform is not a deep learning framework, but a trained model, and can be analogized to software; the stanford NLP platform is written by Java language and has a python interface;

namely: for the remaining comments which are not matched with the feature words, the features are not clear enough, and the corpus participle result needs to be imported into a standard NLP platform for sentence pattern dependency mining, so that the unclear features are mined through the step.

On the basis of the technical scheme, the specific steps of the step 4 are as follows:

step 4.1: manually labeling each feature sentence with a label aiming at the feature sentences extracted in the last step;

the label expressing the positive emotion is marked as +1, the label expressing the negative emotion is marked as-1, and the label expressing the neutral emotion is marked as 0;

step 4.2: converting the characteristic sentence into a word vector by using word2 vec;

classifying the characteristic sentences according to the secondary indexes and the primary indexes of the characteristic words matched with the characteristic sentences;

and taking the word vector, the feature words corresponding to the feature sentences, the classification results of the feature sentences and the labels corresponding to the feature sentences as follows: characteristic sentence data;

step 4.3: dividing the characteristic sentence data into training set data and test set data;

step 4.4: the ratio of the number of training set data to test set data was set to 4: 1.

On the basis of the technical scheme, the specific steps of the step 4 are as follows: training an LSTM network model by using training set data; the LSTM network model is tested using the test set data.

On the basis of the technical scheme, the activating function of the LSTM network selects a tan h function, the word vector dimension value is set as 100, the data batch processing amount is 32, and 32 samples are selected as input each time.

In addition, in the deep learning network training process, in order to prevent the overfitting phenomenon, neurons are temporarily discarded from the network according to a certain probability so as to weaken the joint adaptability among the neuron nodes and further enhance the generalization capability, and through cross validation, when the discarding rate (namely the dropout value) of the neurons is set to be 0.5, the randomly generated network structure is the most; and selecting the cross entropy as a main parameter for drawing the LSTM network model learning curve, waiting for the curve to be converged, and drawing a curve graph.

On the basis of the technical scheme, the specific steps of the step 5 are as follows: checking the accuracy, the recall rate and the F1 value of the LSTM network model trained in the step 4; and obtaining the emotion values of all secondary indexes by using the test set.

On the basis of the technical scheme, the weight of the index of the three-dimensional index system comprises the following steps: a primary index weight (also called primary index frequency) and a secondary index weight (also called secondary index frequency);

extracting characteristic sentences with positive emotions;

the primary index weight is calculated according to formula (5),

wherein, YJ1 is: the frequency (frequency) of occurrence of the matched first-level index feature words in the feature sentences with positive emotions is ZS: the frequency of occurrence of all matched feature words in the feature sentences with positive emotions;

the secondary index weight is calculated according to equation (6),

wherein EJ2 is: the frequency of occurrence of the matched secondary index feature words in the feature sentences with positive emotions is as follows, ZS 2: and in the primary indexes to which the secondary index feature words matched in the feature sentences with positive emotions belong, the frequency of occurrence of the feature words is high.

On the basis of the technical scheme, the culture added value calculation equation model in the step 7 is shown as a formula (7),

the culture added evaluation value is primary index weight enjoyed by culture (the appreciation of the culture product is ' secondary index weight is ' index emotion value of the culture product is + ' the artistry of the culture product is ' secondary index weight is ' index emotion value of the culture product is ' artistic ' index emotion value of the culture product is positive ') the first index weight is shaped by the culture brand (the popularity of the culture brand is ' secondary index weight is ' popularity of the culture brand is ' + ' the loyalty of the culture brand is ' secondary index weight is ' loyalty of the culture brand is ' index emotion value of culture brand is positive) ' the culture inherited primary index weight is ' propagated of the culture is ' secondary index weight is ' inheriting index emotion value of the culture ' + ' (7).

The invention has the following beneficial technical effects:

1. the method constructs a three-dimensional index system based on the individual-enterprise-society from the hierarchical functional perspective of the cultural added value, and constructs a three-dimensional index system based on the individual-enterprise-society, which comprises 3 primary indexes and 6 secondary indexes. The index system has better systematicness and hierarchy, and embodies the significance of perception value research on the development of cultural industry;

2. aiming at the culture added value, a perception value evaluation model of LSTM fine-grained emotion analysis is adopted. The method optimizes the defects that the evaluation indexes in the traditional evaluation model are too subjective and not easy to quantify and the like, and is suitable for the problems of large scale of comment data in the environment of a research network platform and the like.

Drawings

The invention has the following drawings:

FIG. 1 is a schematic diagram of a three-dimensional index architecture based on person-business-society according to the present application.

FIG. 2 is a schematic flow chart of a LSTM-based cultural added value assessment method according to the present application.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1-2, the object of the present invention is: a cultural added value evaluation method based on an LSTM neural network is provided, an index system of the cultural added value and an LSTM emotion analysis evaluation model are constructed, and the problems provided in the background technology are solved.

The invention is realized by the following technical scheme:

a culture added value assessment method based on an LSTM neural network comprises the following steps:

step 1: a three-dimensional index system based on the individual-enterprise-society is constructed from the hierarchical functional view of the cultural added value;

step 2: preparing a comment corpus of the cultural product to be evaluated, performing word segmentation processing on the corpus, and then establishing a feature word list from a comment corpus of the cultural product to be evaluated based on a TF-IDF algorithm;

and step 3: extracting the characteristic sentence to obtain characteristic sentence data;

and 4, step 4: carrying out LSTM network model training by using the characteristic sentence data extracted in the step (3), selecting cross entropy as a loss function parameter, and waiting for the convergence of the loss function to obtain a learning process curve;

step 6: the LSTM network model accuracy testing and testing set prediction are carried out, and an emotion value is obtained;

step 6: and (3) weighting indexes of the three-dimensional index system in the step (1).

And 8: and establishing a culture added value calculation equation model to obtain a culture added evaluation value.

Further, the step 1 specifically comprises: establishing a three-dimensional index system based on the individual-enterprise-society by referring to the existing relevant documents of culture added value evaluation and level function visual angles; the culture added value is considered to be represented by a first-level index: the cultural spiritual enjoyment, the cultural brand formation and the cultural essence inheritance, and the sum of the mutual relations. On the basis of comprehensively and uniformly covering three traditional characteristic factors of individuals, enterprises and society of cultural products, by combining the essence connotation of cultural elements, 6 secondary indexes are finally respectively extended, namely the ornamental value of the cultural product, the artistry of the cultural product, the popularity of the cultural brand, the loyalty of the cultural brand, the inheritance of the culture and the transmissibility of the culture, and finally a cultural value-added index system consisting of 3 primary indexes and 6 secondary indexes is formed.

Further, the step 2 specifically comprises: preparing a cultural product comment corpus to be evaluated, wherein the basic unit of the corpus is a single comment, performing word segmentation on the corpus by calling a jieba module to obtain a word segmentation result of the corpus, and then setting parameters such as a necessary word frequency retention threshold value and the like by adopting a TF-IDF algorithm of the jieba to obtain a feature word list representing the whole comment corpus.

Further, the step 3 is specifically two steps of extracting an explicit characteristic sentence and an implicit characteristic sentence. Traversing the word segmentation result of the corpus, comparing the word segmentation result with the feature word list in the step 2, and taking the matched feature words as feature attributes of the comments where the entries are located;

for the remaining implicit feature sentences with less clear feature attributes, the sentence division result of the corpus needs to be imported into a standard NLP platform for sentence dependency mining, and the feature attributes which are not clear are mined through the step.

The step 4 is specifically to summarize the feature sentences described as feature attributes under the same index, extract the feature sentences from the word segmentation results of the comment corpus, and perform centralized analysis and classification. Performing manual labeling on the characteristic sentences according to the word segmentation result of the comment corpus of each category, wherein a label expressing positive emotion is marked as +1, a label expressing negative emotion is marked as-1, and a label expressing neutral emotion is marked as 0;

converting the characteristic sentence into a word vector by using word2 vec;

dividing the characteristic sentence data into training set data and test set data;

the ratio of the number of training set data to test set data was set to 4: 1.

The step 4 is specifically to use an LSTM network model for training based on the obtained comment corpus participle result with the label, wherein the activating function of the model is a tan h function, the word vector dimension value is set to be 100, the data batch processing amount is 32, and 32 samples are selected as input each time. In addition, in the deep learning network training process, in order to prevent the over-fitting phenomenon, neurons are temporarily discarded from the network according to a certain probability so as to weaken the joint adaptability among the neuron nodes and further enhance the generalization capability, and through cross validation, when the dropout value is set to be 0.5, the randomly generated network structure is the largest. Selecting the cross entropy as a main parameter for drawing a model learning curve, waiting for the curve to be converged, and drawing a curve graph;

the step 5 specifically comprises the following steps: and (4) calling the LSTM model trained in the step (4) to carry out emotion analysis on the corpus, checking the accuracy rate, the recall rate and the F1 value of the corpus, judging the performance of the model, and calculating the emotion values of all secondary indexes after the performance is confirmed.

The step 6 specifically comprises the following steps: and 4, index weighting, namely screening the feature sentences with positive emotion polarities based on the classification result in the step 4, determining the corresponding frequency belonging to a second-level or first-level index by comparing the feature word list, respectively calculating the first-level index frequency and the second-level index frequency of the feature words, and setting the frequency as the weight corresponding to the index value.

The step 7 specifically comprises the following steps: and (4) establishing a culture added value calculation equation model, and referring to the weight of each level of index formed in the step 6.

For example: the culture added evaluation value (weighted total score) is 0.399 ═ 0.638 ═ 0.362 · 0.296 · (0.569 · popularity of culture brand: + 0.431: "loyalty of culture brand:) +0.305 (0.382:" inheritance of culture "+ 0.618:" propagation of culture ": index emotion value)

The decimal is the corresponding weight.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the form and principle of the present invention are intended to be included within the scope of the present invention.

Those not described in detail in this specification are within the knowledge of those skilled in the art.

Claims

1. A culture added value evaluation method based on LSTM is characterized by comprising the following steps:

step 6: weighting indexes of the three-dimensional index system in the step 1;

2. The LSTM-based cultural added-value assessment method of claim 1, wherein: the personal-enterprise-social based three-dimensional index system comprises: 3 primary indexes;

3. The LSTM-based cultural added-value assessment method of claim 2, wherein: the basic unit of the comment corpus is a single comment;

the specific steps of the step 2 are as follows:

step 2.2: and setting a word frequency retention threshold parameter by adopting a TF-IDF algorithm of a jieba tool to obtain a characteristic word list required for expressing the whole comment material library.

4. The LSTM-based cultural added-value assessment method of claim 3, wherein: the specific steps of step 2.2 are:

step 2.2.1: extracting keywords by using a TF-IDF algorithm, specifically: the calculation is carried out by using the formulas (1), (2) and (3),

wherein, TF_ωThe term frequency of the entry omega;

wherein, the IDF is the reverse file frequency;

TFIDF＝TF_ω*IDF (3)

wherein, TFIDF is: word frequency-inverse document frequency;

step 2.2.2: determining a word frequency retention threshold, and screening entries with the numerical value of TFIDF higher than the word frequency retention threshold as keywords;

5. The LSTM-based cultural added-value assessment method of claim 4, wherein: the characteristic sentence comprises: displaying the characteristic sentences and the implicit characteristic sentences;

the specific steps of the step 3 are as follows:

firstly, extracting an explicit characteristic sentence;

the attribute characteristics are as follows: a dominant word;

the second step is that: extracting an implicit characteristic sentence;

and taking the characteristic sentence which is not matched with the characteristic words and is obtained as the implicit characteristic sentence.

6. The LSTM-based cultural added-value assessment method of claim 5, wherein: the specific steps of the step 4 are as follows:

7. The LSTM-based cultural added-value assessment method of claim 6, wherein: the specific steps of the step 4 are as follows: training an LSTM network model by using training set data; testing the LSTM network model by using the test set data;

the activating function of the LSTM network is a tan h function, the word vector dimension value is set to be 100, the data batch processing amount is 32, and the neuron discarding rate is set to be 0.5; and selecting the cross entropy as a parameter drawn by the LSTM network model learning curve.

8. The LSTM-based cultural added-value assessment method of claim 7, wherein: the specific steps of the step 5 are as follows: checking the accuracy, the recall rate and the F1 value of the LSTM network model trained in the step 4; and obtaining the emotion values of all secondary indexes by using the test set.

9. The LSTM-based cultural added-value assessment method of claim 8, wherein: the weight of the index of the three-dimensional index system comprises: a primary index weight and a secondary index weight;

extracting characteristic sentences with positive emotions;

the primary index weight is calculated according to formula (5),

wherein, YJ1 is: the frequency of occurrence of the matched first-level index feature words in the feature sentences with positive emotions is as follows, ZS is: the frequency of occurrence of all matched feature words in the feature sentences with positive emotions;

the secondary index weight is calculated according to equation (6),

10. The LSTM-based cultural added-value assessment method of claim 9, wherein: and 7, the culture added value calculation equation model is shown in a formula (7), and the culture added evaluation value is primary index weight enjoyed by culture (the aesthetic value of the culture product is 'secondary index weight' the ornamental value of the culture product is 'the artistic value of the culture product is' the secondary index weight 'the artistic value of the culture product is' the primary index weight is shaped by the culture brand ('the popularity of the culture brand' the secondary index weight 'the popularity of the culture brand' + 'the loyalty of the culture brand' the secondary index weight 'the loyalty index value of the culture brand)' the propagation value of the culture brand '(the cultural value of the inheritance' the inherited 'the secondary index weight)' the propagation value of the culture weight) (7).