CN111858886B - Object and viewpoint extraction system for airport comments - Google Patents

Object and viewpoint extraction system for airport comments Download PDF

Info

Publication number
CN111858886B
CN111858886B CN202010666697.1A CN202010666697A CN111858886B CN 111858886 B CN111858886 B CN 111858886B CN 202010666697 A CN202010666697 A CN 202010666697A CN 111858886 B CN111858886 B CN 111858886B
Authority
CN
China
Prior art keywords
comment
data
module
dictionary
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010666697.1A
Other languages
Chinese (zh)
Other versions
CN111858886A (en
Inventor
张日崇
李肖杨
孙凯
胡志元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010666697.1A priority Critical patent/CN111858886B/en
Publication of CN111858886A publication Critical patent/CN111858886A/en
Application granted granted Critical
Publication of CN111858886B publication Critical patent/CN111858886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an object and viewpoint extraction system for airport comments, which belongs to the field of natural language processing, and is characterized in that a system logic architecture is arranged to comprise a data input module, a data preprocessing and data dividing module, a data enhancement module, a comment object extraction module, a comment content extraction module, an object and content matching module and a comment result output module, and the improved BilSTM-CRF-based model is utilized to realize the extraction of Chinese-based comment objects and comment contents, so that the labor cost for carrying out data annotation on emotion classification is reduced, a label system is expanded to pay attention to new comment objects, the emotion tendency of a specific comment object is displayed in a standardized manner, and the standardized comment matching result is finally output.

Description

Object and viewpoint extraction system for airport comments
Technical Field
The invention relates to the field of natural language processing, in particular to an object and viewpoint extraction system for airport comments.
Background
Comment object extraction is a basic task in the field of emotion analysis and opinion mining, and is a key problem for performing fine-grained emotion analysis. The goal is to identify and extract the objects evaluated in the text, where the objects evaluated are usually nouns or noun phrases. The comment object extraction can be divided into explicit extraction and implicit extraction, wherein the explicit extraction means that the comment object directly appears in the comment, and the implicitly extracted object does not obviously appear in the comment. There are generally three methods to solve the problem of comment object extraction, which are: rule-based methods, linear statistics-based methods, and deep learning-based methods. In recent years, deep learning approaches tend to perform better in many emotion analysis tasks. The existing deep learning method generally takes the extraction of comment objects as a sequence labeling problem, and obtains an evaluation object in a text sequence by labeling the text sequence.
Most of the current extraction models aim at English language systems, extracted objects are data of general fields generally, and no comment object extraction model aiming at specific fields, particularly the aviation field exists. The comment objects in the aviation field are terms and phrases having professional characteristics, unlike the daily comment objects. Meanwhile, other words in the comments and comments are usually spoken and have no complete syntactic structure. The above reasons make the aviation field review object extraction task challenging.
Disclosure of Invention
The technical scheme of the invention aims to realize an airport comment object and viewpoint extraction system, which comprises a data input module, a data preprocessing and data dividing module, a data enhancement module, a comment object extraction module, a comment content extraction module, an object and content matching module and a comment result output module on a system logic architecture;
the data input module is used for collecting and inputting external comments on flights and comments of an airport, and inputting data of the comments and comments to the data preprocessing and data dividing module, wherein the data preprocessing and data dividing module comprises two steps of data preprocessing and data dividing, the data preprocessing step is used for matching keywords of input comment data of unmarked flights by using an existing label system, taking the comment data as a comment object extraction label, performing word division on the comment data of the airport, screening nouns and noun phrases in the comment data, then manually filtering and modifying the comment data to obtain a corresponding extraction label, and manually deleting abnormal data in the comment data; the data dividing step selects a data combination with the lowest label repetition rate, and selects a label combination with the lowest label repetition rate in a plurality of modes;
The data enhancement module performs synonym replacement processing and duplicate removal processing on data and labels, and performs data enhancement by using an EDA algorithm to obtain a new label set;
the comment object extraction module extracts the comment object by using an improved extraction method based on a BilSTM-CRF model and sends the obtained comment object to the object and content matching module; the comment content extraction module is used for matching comment texts by utilizing an emotion dictionary formed by emotion words to obtain comment contents with emotion tendencies and sending the comment contents to the object and content matching module; the object and content matching module firstly screens the part-of-speech of the extracted result of the comment object, reserves the comment object of the part-of-speech of nouns and vernaculars, then splices the sentiment words extracted by the comment object extraction module with the comment object in each short sentence, and finally checks whether the spliced result is in the comment, if yes, the sentiment words are sent to the comment result output module as the final extracted result, and if not, the comment object is directly sent to the comment result output module as the result;
and the comment result output module is used for outputting the spliced comment result.
The method for selecting the label combination with the minimum label repetition rate in the multiple modes in the data dividing step is specifically that a target repetition rate of 30% and the data volume of a target training set are preset, comment data corresponding to one word are randomly selected to be added into a test set when the current repetition rate is smaller than the preset repetition rate, and the repetition rate is recalculated; if the current repetition rate is greater than the preset repetition rate, sentences corresponding to the words with less frequency are taken out and added into the test set, the words are ensured not to appear in the training set, and the process is continuously repeated until the number of the preset test set is reached; the whole process is repeated for 10 times, one time with the minimum final repetition rate is selected as a dividing result, and the preset repetition rate is set to be 50% or 40% or 30% or 20%.
The data enhancement step of the data enhancement module is realized by adopting an EDA algorithm, and the EDA algorithm adopts 4 random strategies to carry out data enhancement: synonym replacement, random insertion, random exchange, random deletion.
The improved review object extraction module modifies the feature input part and the auxiliary dictionary based on the BilSTM-CRF model: in the feature input part, word vectors of Chinese characters are used, and a bert pre-training model is used for embedding the word vectors; the position and part-of-speech characteristics simultaneously comprise two characteristics: firstly, the position of a character in a word is marked with characteristics by using { B, M, E, S } labels and an NLP tool, secondly, the part-of-speech characteristics, the part-of-speech of a word to which each character belongs is taken as the part-of-speech characteristics of the character, and the position and the part-of-speech characteristics pass through a bidirectional LSTM; the dictionary features are based on 4-gram dictionary matching features, the existing linguistic data are subjected to word segmentation, n-gram combinations are carried out on nouns obtained by word segmentation, the obtained nouns and noun phrases are added into a dictionary, for each character, the dictionary features judge whether the combination of the 4-gram before and after the character appears in the dictionary, and the obtained 8-dimensional vector is the dictionary feature; and splicing the three characteristics, inputting the characteristics into a bidirectional LSTM layer, and then passing through a CRF layer to obtain a final result.
The emotion dictionary of the comment content extraction module is divided into a positive emotion dictionary, a negative emotion dictionary and an adverb dictionary, and the comment text matching process comprises the following steps: firstly, dividing a whole sentence text into a plurality of short sentences according to punctuations, then matching corresponding emotional words from a positive emotional dictionary and a negative emotional dictionary for each short sentence, and finally finding corresponding adverbs before and after each emotional word according to an adverb dictionary pair to form comment content.
The technical effects are as follows:
the method and the device extract the comment objects of the user comments in the application scene of the flight and the airport. The extracted comment objects are mainly used for two aspects: firstly, the labor cost of carrying out data labeling on emotion classification can be reduced, and secondly, a new label different from an inherent label system can be found in an extracted comment object, so that the label system is expanded, and the new comment object is concerned. In actual business, airlines and airports compare the emotional tendency of a client to a particular review object. So after extracting the comment object, extracting and matching the related viewpoints to obtain the complete comment of the user.
The technical effects are taken as a system, and the following three technical effects can be realized:
First, there is no system for extracting review objects and review contents in the field of aviation, and it is an urgent need of airlines to develop such a system. By extracting the objects of the comments of the flights and the airports, the method can help the airlines to know the attention points and the requirements of the users, and further analyze the main opinions of the passengers. Secondly, the system is developed, so that the annotation of the related data set of emotion analysis can be assisted, and the labor cost is saved. In the comment emotion analysis task, comment objects and emotion polarities need to be labeled for a large number of comment texts. The comment object extraction system can automatically extract the comment object, saves time for marking emotion analysis, and is not limited by a set label system. Finally, the tag extraction system can discover new comment objects, not limited to extracting comment objects that appear in the training data. By extracting the comment objects from the new comment texts, new labels are often obtained, and the new labels can reflect new problems and can also be used for enriching the existing label system. Such a system has a positive guiding effect on airlines to improve the quality of service in a timely manner.
Drawings
FIG. 1: integral structure
FIG. 2 is a schematic diagram: extraction model structure
Detailed Description
In order to achieve the purpose, the system logic architecture comprises a data input module, a data preprocessing and data dividing module, a data enhancement module, a comment object extraction module, a comment content extraction module, an object and content matching module and a comment result output module. In the aspect of data set, firstly solving the problem of no labeling, and then carrying out data enhancement processing such as synonym replacement, duplicate removal, EDA (electronic design automation) and the like on data in order to solve the problems of too few label varieties and avoid overfitting. The overall structure of the model is shown in fig. 1.
Data preprocessing step
The comment data of the invention is derived from flight comments and airport comments, wherein the comment data comprises 30000 flight comments and 2000 airport comments, and the initial data are not labeled. The labels of the flight comments are obtained by means of keyword matching, namely 187 labels of the existing label system are matched with 3 ten thousand flight comments and used as the labels extracted as the comment objects. The tags of the airport comments are obtained by segmenting the comments to screen out nouns and noun phrases in the comments, and then manually filtering and modifying the nouns and the noun phrases to obtain corresponding extraction tags. Both the labeling method of keyword matching and the method of word segmentation plus manual filtering are due to the high cost of manual labeling.
For 32000 comments, the abnormal data in the comments is deleted, which comprises the following steps: (1) there are no reviews of Chinese text; (2) no comments matching the tag; (3) clearing the emoticons; (4) and deleting the messy code symbols. After data washing, 19926 pieces of data were left, and the total number of label types was 163.
Data partitioning step
Considering that one of the objectives of the label extraction of the present invention is to find new labels, the extracted labels of the training set and the test set are not repeated as much as possible to ensure the model to perform in the task of extracting new labels. It is necessary to select the data combination with the lowest tag repetition rate when dividing the data set.
When data is divided, a plurality of dividing methods are tried, and the situation that the label repetition rate is minimum is selected. Specifically, a target repetition rate of 30% and a target training set data amount are preset first. If the current repetition rate is less than the preset repetition rate, randomly adding comment data corresponding to one word into the test set, and recalculating the repetition rate; if the current repetition rate is greater than the preset repetition rate, sentences corresponding to the words with less frequency are taken out and added into the test set, and the words can not appear in the training set. This is repeated until the number of test sets is reached. Repeating the above process 10 times, and selecting the one with the minimum final repetition rate as the division result. For the preset repetition rate, the rate is decreased from 50% to 20% by 10%, and experiments show that the division effect of the preset 30% is the best for the current comment data. And finally, 15896 data in the training set and 4030 data in the flight comment data are tested, and the label repetition rate is reduced to 33%. For 2000 airport reviews, a method of word segmentation and manual screening is adopted to remove special reviews without labels, so that 1418 reviews and 708 new labels are obtained and added into a training set.
Data enhancement module
And the data preprocessing step and the data dividing step are used for processing the labels, and the obtained data have corresponding extraction labels. However, due to the limitation of the above tag matching method, the number of tags corresponding to 2 ten thousand pieces of data for the flight comment is only 163. The number of labels obtained is too small compared to the training data, especially in the flight review section, which tends to result in over-fitting, making it difficult to extract new labels.
Firstly, aiming at the problem that the number of labels is not enough, synonym replacement processing is carried out on data and labels. Specifically, a Chinese synonym dictionary is utilized to replace synonyms for the labels and the labels appearing in the comment text according to the proportion in the synonym dictionary, and the purpose of enriching the label types is achieved. The 163 tags are expanded to 395 tags.
Secondly, because the number of the comments is too large, data is subjected to deduplication processing, namely, the number of data corresponding to each tag is controlled. For each label, about 4 training data are kept.
Finally, considering that the texts of the Data set are all short texts and the composition is simple, the EDA (easy Data augmentation) algorithm proposed in 2019 is selected for Data enhancement. The algorithm is proven to significantly improve the performance of natural language processing models on small data sets and reduce the degree of overfitting. The purpose of the EDA algorithm is to generate new text with similar semantics to existing text, and the algorithm adopts 4 random strategies for data enhancement: (1) synonym replacement, in which a plurality of words without stop words are randomly selected from the text and replaced by the synonym; (2) random insertion, in which a word of a non-stop word is randomly found out from a text, a synonym of the word is obtained, the synonym is inserted into a random position in a sentence, and the process is repeated for a plurality of times; (3) random exchange, in which two words are randomly selected from a text to carry out position exchange, and the position exchange is repeated for a plurality of times; (4) random deletion, removing words from sentences with some fixed probability.
After the above processing, the final training set size was 2396 pieces of data, the test set size was 1440 pieces of data, and total 1016 labels, and the label repetition rate of the training set and the test set was 33%.
Comment object extraction module
Through data enhancement, training data suitable for comment object extraction are obtained. In the link of comment object extraction, a feature input part and an auxiliary dictionary are modified by using an extraction method based on a BilSTM-CRF model proposed by Yanzeng Li et al in 2018. The overall structure of the model is as follows.
At first, most of the extraction models are models aiming at English language systems and are models based on words. In Chinese, a word is a basic unit representing semantics, so the model is based on the word. In the feature input section, the first feature is a word vector of the Chinese character. In order to better improve the model effect, the invention uses a bert pre-training model to embed the word vectors.
The position and part-of-speech characteristics comprise two characteristics at the same time. One is the position of the character in the word, and features are marked on the character by using a { B, M, E, S } label and an NLP tool. The second is part-of-speech characteristics, which are originally based on the characteristics of words, and the part-of-speech of the word to which each word belongs is taken as the part-of-speech characteristics of the word. The location and part-of-speech features pass through the bi-directional LSTM.
The dictionary features are 4-gram based dictionary matching features, and the features depend on a predefined extraction dictionary. In the invention, the existing linguistic data is divided into words, n-gram combination is carried out on nouns obtained by dividing words, and the obtained nouns and noun phrases are added into a dictionary. For each character, the dictionary feature determines whether a 4-gram combination before and after the character appears in the dictionary. The resulting 8-dimensional vector is the dictionary feature.
And splicing the three characteristics, inputting the characteristics into a bidirectional LSTM layer, and then passing through a CRF layer to obtain a final result.
Comment content extraction module
Besides the comment object, in practice, the airline company often pays attention to the comment content corresponding to the comment object. Common emotional words are usually fixed, such as "good", "bad", and the like. By using the emotion dictionary formed by these emotion words, the comment text is matched, and the comment content having an emotion tendency can be obtained.
Specifically, the emotion dictionary is divided into a positive emotion dictionary, a negative emotion dictionary, and an adverb dictionary. Firstly, the whole sentence text is divided into a plurality of short sentences according to punctuation marks. And then matching corresponding emotion words from the positive emotion dictionary and the negative emotion dictionary for each short sentence. And finally, finding corresponding adverbs before and after each emotional word according to the adverb dictionary pair to form comment content.
Comments Comment content
Good meal, good blank sister face value Can, good luck
Driving skill special stick of captain Extraordinary bar
Table 1 review content extraction example
Comment object and comment content matching module
Through observation of the comment text, it can be found that if a comment object has relatively definite comment content, the comment content tends to appear near the corresponding comment object. In consideration of the characteristics, after the comment content and the comment object are respectively extracted, the comment content and the comment object are matched to obtain a complete comment.
Specifically, part-of-speech filtering is performed on the result of extraction of the comment objects, and the comment objects of nouns and verb nouns are reserved. And then, in each short sentence, the sentiment words extracted by the comment object extraction model are spliced with the comment objects in the short sentence. Finally, whether the splicing result appears in the comment is checked. If so, as a final result of the decimation; if the comment object does not appear, the comment object does not have the corresponding emotional vocabulary possibly, so that the comment object is directly output as a result.
Comments Comment content Comment object Matching results
Good meal, good blank sister face value Can, good luck Diet and empty sister face value Good food, good blank and good face value
Driving skill special stick of captain Extraordinary bar Driving technique Driving skill extraordinary bar
Table 2 comment content and comment object matching examples.

Claims (4)

1. An object and viewpoint extraction system for airport reviews, characterized in that: the system logic architecture comprises a data input module, a data preprocessing and data dividing module, a data enhancement module, a comment object extraction module, a comment content extraction module, an object and content matching module and a comment result output module;
the data input module is used for collecting and inputting external comments on flights and comments of an airport, and inputting data of the comments and the comments to the data preprocessing and data dividing module, wherein the data preprocessing and data dividing module comprises two steps of data preprocessing and data dividing, the data preprocessing step is used for matching keywords of input comment data of unmarked flights by using an existing label system, taking the comment data as a comment object extraction label, performing word division on the comment data of the airport, screening nouns and noun phrases in the comment data, then obtaining a corresponding extraction label through manual filtration and modification, and deleting abnormal data in the comment data through a manual mode; the data dividing step selects a data combination with the lowest label repetition rate, and selects a label combination with the lowest label repetition rate in a plurality of modes;
The data enhancement module performs synonym replacement processing and duplicate removal processing on data and labels, and performs data enhancement by using an EDA algorithm to obtain a new label set;
the comment object extraction module extracts a comment object by using an improved extraction method based on a BilSTM-CRF model and sends the obtained comment object to the object and content matching module; the comment content extraction module is used for matching comment texts by utilizing an emotion dictionary formed by emotion words to obtain comment contents with emotion tendencies and sending the comment contents to the object and content matching module; the object and content matching module firstly performs part-of-speech screening on the result extracted by the comment object, reserves the comment object of the part-of-speech of nouns and action nouns, then splices the sentiment words extracted by the comment object extraction module with the comment object in each short sentence, and finally checks whether the spliced result appears in the comment, if so, the spliced result is sent to the comment result output module as the final result of extraction, and if not, the comment object is directly sent to the comment result output module as the result;
the comment result output module is used for outputting the spliced comment result;
The improved review object extraction module modifies the feature input part and the auxiliary dictionary based on a BilSTM-CRF model: in the feature input part, word vectors of Chinese characters are used, and a bert pre-training model is used for embedding the word vectors; the position and part-of-speech characteristics simultaneously comprise two characteristics: firstly, the position of a character in a word is marked with characteristics by using { B, M, E, S } labels and an NLP tool, secondly, the part-of-speech characteristics, the part-of-speech of a word to which each character belongs is taken as the part-of-speech characteristics of the character, and the position and the part-of-speech characteristics pass through a bidirectional LSTM; the dictionary features are based on 4-gram dictionary matching features, the existing linguistic data are subjected to word segmentation, n-gram combinations are carried out on nouns obtained by word segmentation, the obtained nouns and noun phrases are added into a dictionary, for each character, the dictionary features judge whether the combination of the 4-gram before and after the character appears in the dictionary, and the obtained 8-dimensional vector is the dictionary feature; and splicing the position, the part-of-speech characteristic and the dictionary characteristic of the character in the word, inputting the characteristic into a bidirectional LSTM layer, and then passing through a CRF layer to obtain a final result.
2. The system for extracting objects and viewpoints of airport reviews as claimed in claim 1, wherein: the method for selecting the label combination with the minimum label repetition rate in the multiple modes in the data dividing step is specifically that a target repetition rate of 30% and the data volume of a target training set are preset, comment data corresponding to one word are randomly selected to be added into a test set when the current repetition rate is smaller than the preset repetition rate, and the repetition rate is recalculated; if the current repetition rate is greater than the preset repetition rate, sentences corresponding to the words with less frequency are taken out and added into the test set, the words are ensured not to appear in the training set, and the process is continuously repeated until the number of the preset test set is reached; the whole process is repeated for 10 times, one time with the minimum final repetition rate is selected as a dividing result, and the preset repetition rate is set to be 50% or 40% or 30% or 20%.
3. The system for extracting objects and viewpoints of airport reviews as claimed in claim 2, wherein: the data enhancement step of the data enhancement module is realized by adopting an EDA algorithm, and the EDA algorithm adopts 4 random strategies to enhance data: synonym replacement, random insertion, random exchange, random deletion.
4. The system for extracting objects and viewpoints of airport reviews as claimed in claim 3, wherein:
the emotion dictionary of the comment content extraction module is divided into a positive emotion dictionary, a negative emotion dictionary and an adverb dictionary, and the comment text matching process comprises the following steps: firstly, dividing a whole sentence text into a plurality of short sentences according to punctuations, then matching corresponding emotional words from a positive emotional dictionary and a negative emotional dictionary for each short sentence, and finally finding corresponding adverbs before and after each emotional word according to an adverb dictionary pair to form comment content.
CN202010666697.1A 2020-07-13 2020-07-13 Object and viewpoint extraction system for airport comments Active CN111858886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010666697.1A CN111858886B (en) 2020-07-13 2020-07-13 Object and viewpoint extraction system for airport comments

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010666697.1A CN111858886B (en) 2020-07-13 2020-07-13 Object and viewpoint extraction system for airport comments

Publications (2)

Publication Number Publication Date
CN111858886A CN111858886A (en) 2020-10-30
CN111858886B true CN111858886B (en) 2022-05-31

Family

ID=72984262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010666697.1A Active CN111858886B (en) 2020-07-13 2020-07-13 Object and viewpoint extraction system for airport comments

Country Status (1)

Country Link
CN (1) CN111858886B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836052B (en) * 2021-02-19 2023-04-07 中国第一汽车股份有限公司 Automobile comment text viewpoint mining method, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220352B (en) * 2017-05-31 2020-12-08 北京百度网讯科技有限公司 Method and device for constructing comment map based on artificial intelligence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Cooperation Based Metric for Mobile Applications Recommendation;Richong Zhang et al.;《2013 IEEE/WIC/ACM International Conferences on Web Intelligence (WI) and Intelligent Agent Technology (IAT)》;20131117;第3卷;13-16 *
机场商店消费者购买意愿影响因素研究;徐东昌;《中国优秀硕士学位论文全文数据库》;20200215(第2期);第1-40页 *

Also Published As

Publication number Publication date
CN111858886A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
Das et al. Emotion classification in a resource constrained language using transformer-based approach
CN106021288A (en) Method for rapid and automatic classification of classroom testing answers based on natural language analysis
CN111858935A (en) Fine-grained emotion classification system for flight comment
Rashid et al. Feature level opinion mining of educational student feedback data using sequential pattern mining and association rule mining
CN112417854A (en) Chinese document abstraction type abstract method
CN104346326A (en) Method and device for determining emotional characteristics of emotional texts
CN112926345A (en) Multi-feature fusion neural machine translation error detection method based on data enhancement training
CN111563167B (en) Text classification system and method
Vora et al. Classification of tweets based on emotions using word embedding and random forest classifiers
Sazzed A hybrid approach of opinion mining and comparative linguistic analysis of restaurant reviews
Philemon et al. A machine learning approach to multi-scale sentiment analysis of amharic online posts
Ashna et al. Lexicon based sentiment analysis system for malayalam language
Kotelnikova et al. Lexicon-based methods and BERT model for sentiment analysis of Russian text corpora
CN112036120A (en) Skill phrase extraction method
CN111858886B (en) Object and viewpoint extraction system for airport comments
CN112711666B (en) Futures label extraction method and device
CN111191029B (en) AC construction method based on supervised learning and text classification
Ye et al. Syntactic word embedding based on dependency syntax and polysemous analysis
CN109241521B (en) Scientific literature high-attention sentence extraction method based on citation relation
Ahmad et al. Machine and deep learning methods with manual and automatic labelling for news classification in bangla language
Acs et al. Hunaccent: Small footprint diacritic restoration for social media
CN111191413A (en) Method, device and system for automatically marking event core content based on graph sequencing model
Zotova et al. Vicomtech at ALexS 2020: Unsupervised Complex Word Identification Based on Domain Frequency.
CN115952794A (en) Chinese-Tai cross-language sensitive information recognition method fusing bilingual sensitive dictionary and heterogeneous graph
Amin et al. Kurdish Language Sentiment Analysis: Problems and Challenges

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant