WO2018182501A1 - Procédé et système de détection intelligente de sentiment et d'émotion par apprentissage adaptatif - Google Patents

Procédé et système de détection intelligente de sentiment et d'émotion par apprentissage adaptatif Download PDF

Info

Publication number
WO2018182501A1
WO2018182501A1 PCT/SG2017/050172 SG2017050172W WO2018182501A1 WO 2018182501 A1 WO2018182501 A1 WO 2018182501A1 SG 2017050172 W SG2017050172 W SG 2017050172W WO 2018182501 A1 WO2018182501 A1 WO 2018182501A1
Authority
WO
WIPO (PCT)
Prior art keywords
learning based
module
text
accordance
text messages
Prior art date
Application number
PCT/SG2017/050172
Other languages
English (en)
Inventor
Zhaoxia WANG
Joo Chuan Victor Tong
Seng-Beng Ho
Original Assignee
Agency For Science, Technology And Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency For Science, Technology And Research filed Critical Agency For Science, Technology And Research
Priority to PCT/SG2017/050172 priority Critical patent/WO2018182501A1/fr
Publication of WO2018182501A1 publication Critical patent/WO2018182501A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention generally relates to text data analytics, such as social media analytics, and more particularly relates to a method and system for sentiment classification of text (e.g., social media text).
  • a system for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages includes a text decomposing module, a first learning based module, a non-learning based module, an aggregation module and a second learning based module.
  • the text decomposing module decomposes each of the text messages into one or more portions.
  • the first learning based module scores each portion of each of the text messages and the non-learning based module scores each portion of each of the text messages, wherein the learning based module and the non-learning based module score each portion of each of the text messages concurrently as each portion of each of the text messages is processed in a single pass.
  • the aggregation module is coupled to the learning based module and the non-learning based module for classifying each of the text messages in accordance with the sentiment and/or emotion expressed by the text message in response to an aggregation of a combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message.
  • the second learning based module is coupled to the aggregation module to learn and update knowledge for the non-learning based module and the first learning based module in response to the aggregation of the combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message.
  • a method for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages includes decomposing each of the text messages into one or more portions and performing learning based scoring of each portion of each of the text messages while concurrently performing non-learning based scoring of each portion of each of the text messages to process each portion of each of the text messages in a single pass.
  • the method further includes classifying each of the text messages in accordance with the sentiment and/or emotion expressed by the text message in response to an aggregation of a combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message.
  • a system for handling ambivalence or hidden sarcasm in text messages includes a text decomposing module, a module for detecting ambivalence and/or hidden sarcasm in a portion of a text message, and an output module for combining the one or more portions of the text message.
  • the text decomposing module decomposes each of the text messages into one or more portions and the output module handles a portion of the text message in response to the module for detecting ambivalence and/or hidden sarcasm detecting hidden sarcasm in the portion of the text message.
  • FIG. 1 depicts a block diagram of a data processing and analysis system in accordance with a present embodiment.
  • FIG. 2 depicts a block diagram of an intelligent sensing system of the data processing and analysis system of FIG. 1 in accordance with the present embodiment.
  • FIG. 3 depicts a block diagram of an obvious sarcasm detection module of the the intelligent sensing system of FIG. 2 in accordance with the present embodiment.
  • FIG. 4 depicts a block diagram of an ambivalence handler with hidden sarcasm detection sub-module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.
  • FIG. 5 depicts a block diagram of a negation identification and question portion handling module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.
  • FIG. 6 depicts a block diagram of a first artificial intelligence (AI) learning- based module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.
  • AI artificial intelligence
  • FIG. 7 depicts a block diagram of a portion sensing further analysis sub- module of an aggregation module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.
  • FIG. 8 depicts a block diagram of an aggregation assembling sub-module of the aggregation module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.
  • FIG. 9 depicts a block diagram of a second AI learning-based module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.
  • a unified classifier that combines non-learning scoring, such as knowledge- based and feature-based scoring, and learning-based scoring in one pass through the data;
  • the capability for adaptive learning is introduced in accordance with the present embodiment wherein the artificial intelligence (AI) learning-based modules are used not only as part of an adaptive classification mechanism, but are also used to adaptively continuously enhance the lexicon database and knowledge base of the system of the present embodiment.
  • AI artificial intelligence
  • the adaptive learning not only can learn from a labelled data set as other learning-based systems, but can also perform adaptive learning in accordance with the present embodiment from a sensing processing without a ground truth training data set.
  • the ambivalence handling and sarcasm detection in accordance with the present embodiments can detect the sarcasm in the text data.
  • the unified classifier that combines non-learning scoring, such as knowledge-based and feature-based scoring, and learning-based scoring in one pass of the data.
  • non-learning process in accordance with the present embodiment performs linguistic pattern recognition through linguistic analysis with feature relationship analysis. Therefore, the system and methods in accordance with the present embodiments not only has all the advantages of learning-based methods, but also has all the advantages of non-learning methods, including novel, robust non- learning methods having advantages over conventional non-learning method.
  • a block diagram 100 depicts a data processing and analysis system in accordance with the present embodiment.
  • the end-to-end text analysis system 102 advantageously demonstrates a real-world implementation of the intelligent sensing system 104 in accordance with the present embodiment.
  • the system 102 receives social media or other textual data 106 as input by a user or listener 108 advantageously providing useful information for marketing research personnel, product suppliers, service providers and system integrators as the listener 108.
  • the listener 108 input could be from various social media sources or other text data sources 106, including but not limited to sources from the Internet, such as Internet forums (e.g., HardwareZone and reddit), social networking websites (e.g., Twitter and Facebook), and weblogs (e.g., Blogger, Tumblr and WordPress).
  • Internet forums e.g., HardwareZone and reddit
  • social networking websites e.g., Twitter and Facebook
  • weblogs e.g., Blogger, Tumblr and WordPress
  • the text is filtered 109 by a filter 110 to identify the text (i.e., filtered data 112) to be reviewed by the system 102.
  • the filter 110 removes irrelevant text received from the listener 108 to provide the filtered data 112 for processing. Examples of irrelevant text could include advertisements, contents which do not include any comments on a product or a service, and other irrelevant content- specific textual data.
  • one or more smart filters 114 could be designed according to requirements of the listener 108 and optionally provided to further filter the filtered data 112.
  • the filtered textual data is then provided to the intelligent system 104 for processing in accordance with the present embodiment.
  • a profiler 116 organizes the data for reviewing by a result viewer module 118.
  • the result viewer module 118 could be a based on a monitoring model to provide a consumer preference analysis, based on an alert model for detection of an anomaly or identification of a needed action, or based on a predictive model for time-series analysis for sales or other forecasting.
  • Those skilled in the art will realize that the result viewer could be based on other models adapted to a customer's needs or requirements.
  • the intelligent sensing system 104 overcomes the limitations of the learning-based sensing methods as well as the limitations of non-learning based method by utilizing a novel and robust strong/tight hybrid method in which components or processes of different methods are integrated into components or processes of one another resulting in a strongly/tightly coupled system.
  • a block diagram 200 depicts the intelligent sensing system 104 in accordance with the present embodiment.
  • the intelligent sensing system 104 includes an intelligent sentiment and emotion sensing section 202 and an automatic knowledge learning and updating section 204.
  • the intelligent sentiment and emotion sensing module 202 includes a text clean-up module 206 which receives the filtered data 112 input and provides the cleaned-up text to an obvious sarcasm detection module 208.
  • the obvious sarcasm detection module 208 is shown in more detail in a block diagram 300 depicted in FIG. 3.
  • the obvious sarcasm detection module 208 includes an expert knowledge-based system 302 coupled to a human expert input module 304 which allows users to input new rules or knowledge base to improve the performance of an obvious sarcasm detector 306.
  • the obvious sarcasm detector 306 detects no obvious sarcasm, the text is output to a text decompose and portion cleaning module 210 (FIG. 2).
  • the obvious sarcasm detector 306 detects obvious sarcasm
  • the text is output to a direct sensing analysis processing module 212.
  • the obvious sarcasm detection module 208 is different from a hidden sarcasm detection sub-module in an ambivalence handler with hidden sarcasm detection module 214 as discussed later.
  • the obvious sarcasm detection module 208 is an optional module designed to detect obvious sarcasm such as "I love that Arequipa just shuts the water off for a day #sarcasm", where there is an obvious sarcasm word indicator (i.e., #sarcasm).
  • the obvious sarcasm detection module 208 is designed to handle such obvious sarcasm text to avoid misclassifying the opinion in such text post. This module is optional and the user can enable or disable it by using the human expert input module 304 through the expert knowledge base unit 308. In addition, the user can input new sarcasm indicators according to different data sources (e.g., "#not" for tweets). [0034] Online users may write a post which is sarcastic but without the label #sarcasm, such as "Her phone is broken down again, she is so lucky! where the sarcasm is hidden in the tone and the ambivalence of the text.
  • the ambivalence handler with sarcasm detection module 214 identifies hidden sarcasm by determining whether the text owner meant the opposite of what he/she inputted.
  • a linguistic processing module 216 in the direct sensing analysis processor 212 performs linguistic analysis, feature selection and relationship analysis of parsed separate portions text of each text message according to language structure, grammar as well as a knowledge base to derive the sentiment and emotion categories.
  • the sentiment output of each separate portion of the message can be one of four categories, i.e., positive, negative, neutral and ambivalent (mixed) as shown in Table 1 below.
  • the ambivalence handler with hidden sarcasm detection module 214 is designed to handle the "ambivalence" category in each text portion and sarcasm often appears as a type of ambivalent text. Even though each text portion can be classified into one of the four categories of sentiments (positive, negative, neutral and ambivalent), often the intended sentiment of text classified in the category of ambivalence can still be positive or negative as shown in Table 1. Often what seems to be a mixed sentiment of positivity and negativity in a sentence is actually primarily a positive or negative sentiment (such as in the example in Table 1: The sentence "The design of this brand is ok, but I dislike the colour and the price of it" is seemingly ambivalent, but it is actually negative).
  • the ambivalent sentence is converted to either a positive or a negative sentiment in response to the intended sentiment of the owner of the post.
  • the four categories positive, negative, neutral and ambivalent
  • the four categories are collapsed back into three categories (positive, negative and neutral).
  • a block diagram 400 depicts the ambivalence handler with hidden sarcasm detection sub-module 214 in accordance with the present embodiment.
  • the hidden sarcasm detection module 406 analyses all the input information obtained from the linguistic analysis, feature selection and relationship analysis modules and other linguistic analysis modules 216 as shown in FIG. 2 to classify the sarcasm text.
  • the human expert input module 404 allows the user to select the mode of output to include either three categories (i.e., positive, negative, and neutral) or four categories (i.e., positive, negative, neutral and ambivalent) for each of the text portions.
  • the hidden sarcasm detection module 406 includes three submodules: a hidden sarcasm detector 410 (called “a hidden sarcasm detector” in order to differentiate it from the obvious sarcasm detector 306), a hidden sarcasm handler 412 and a non-sarcastic ambivalence handler 414.
  • the hidden sarcasm detector 410, the hidden sarcasm handler 412 and the non-sarcastic ambivalence handler 414 are seamlessly connected with the expert knowledge base unit 402 which provides the sarcasm detection rules and ambivalence handler rules for use by the hidden sarcasm detector 410 and the non-sarcastic ambivalence handler 414, respectively. If sarcasm is detected by the hidden sarcasm detector 410, sarcasm handling is performed by the hidden sarcasm handler 412 in accordance with the sarcasm detection rules of the expert knowledge base unit 402.
  • the non-sarcastic ambivalence handler 414 will perform ambivalence handling in accordance with the ambivalence handler rules of the expert knowledge base unit 402.
  • a block diagram 500 depicts the negation identification and question portion handling module 218 in accordance with the present embodiment.
  • the negation identification and question portion handling module 218 includes five submodules: a question portion detection sub-module 502, a negation detection sub-module 504, a question portion recording sub-module 506, a negation location recording sub-module 508 and an output decision sub-module 510.
  • the negation detection sub-module 504 finds all the negation words or items (such as "not", "don't", shouldn't") through matching with a knowledge base 512
  • the negation location recording sub-module 508 records the location of all the negation words or items and a data structure is created to hold the location indices of all found negation words or items for later processing.
  • the locations of the negation words are different in the two sentences: “I do not like it" and “I like it not only for the design." and the sentiments associate with these sentences are different.
  • the question portions are detected by the question portion detection sub-module 502 and recorded by the question portion recording sub-module 506.
  • the output decision sub-module 510 decides whether the portions from both the question portion recording sub-module 506 and the negation location recording sub-module 508 should be channelled to the direction sensing analysis processor 212 or the learning-based AI classification processor 220 in response to whether the text includes negation portion(s) and/or question portion(s). All text portions including text portions with questions and text portions with negation are channelled to the direction sensing analysis processor 212. All non-negation, non-question text portions are channelled to the learning-based AI classification processor 220.
  • the AI learning based module 220 is one of two AI learning based modules and performs functions of an AI classifier module.
  • a second AI learning based module 222 works with the AI learning based module 220 to enable the intelligent sensing ability and adaptive learning ability.
  • the novel and innovative intelligent sensing method in accordance with the present embodiment combines learning-based scoring by the two AI learning based modules 220, 222 with non-learning scoring, such as knowledge-based and feature-based scoring, in one pass of the data.
  • the combined processing with both a learning process and a non-learning process starts at the negation identification and question portion handling module 218 (FIG. 2).
  • the non-learning process is provided by the direct sensing analysis processing module 212 where, as indicated as a normal arrow emanating from the module 218, the textual data is provided for processing by the linguistic analysis and feature relationship analysis module 216 and then the ambivalence handler with hidden sarcasm detection sub-module 214.
  • the AI learning based process proceeds simultaneously along another path emanating as a double arrow from the module 218 which goes through the AI learning-based module 220 to a portion sensing further analysis sub-module 224 of an aggregation module 223. Therefore, the system and methods in accordance with the present embodiments not only has all the advantages of learning based methods, but also has all the advantages of non-learning methods, including novel, robust non-learning methods having advantages over conventional non-learning methods.
  • the AI learning-based module 220 works together with the non-learning based sensing process to produce new ground truth data and feeds the obtained ground truth data to the automatic knowledge learning and updating section 204 through the portion sensing further analysis sub-module 224.
  • the AI learning-based module 220 supports the direct sensing analysis processing module 212 (i.e., the non-learning process) to produce the final valence of each portion of the text data by aggregating in the aggregation module 223 the learning based processed portion of the text data 232 and the non-learning based processed portion of the corresponding text data 230 to produce an output 234 of the intelligent sentiment and emotion sensing unit 202.
  • the learning based processed portion of the text data 232 and the non-learning based processed portion of the corresponding text data 230 are processed through the portion sensing further analysis sub-module 224 in the aggregation module 223 and the non-learning based processed portion of the corresponding text data from the direct sensing analysis processing module 212 are further processed by a portion relationship analysis with topic and object identification module 228.
  • An aggregation assembling sub-module 226 in the aggregation module 223 receives the inputs from both the portion relationship analysis with topic and object identification module 228 and the portion sensing further analysis sub-module 224 of the aggregation module 223, and performs aggregation assembling analysis.
  • AI learning-based module 220 includes two submodules or process steps: a training classifiers sub-module 602 and a sensing analysis sub-module 604.
  • the training classifiers sub-module 602 trains different learning based classifiers using ground truth data 606 and an existing knowledge base 608.
  • Third party or open source training data sets 610 can also be used, if available.
  • the sensing analysis sub-module 604 performs sensing analysis using different (trained) learning based classifiers (the learning-based classifiers can be Naive Bayes classifiers, Maximum Entropy, Support vector machine classifiers, or similar conventional or proprietary classifiers).
  • the output of the AI learning-based module 220 consists of the classified text portions from the various classifiers (e.g., suppose there are five classifiers used and each may classify the text portion differently, then the output will consist of the five classified sentiments for that text portion such as positive, positive, negative, negative, and positive).
  • the portion sensing further analysis sub-module 224 in the aggregation module 223 receives a first input 230 from the non-learning sensing analysis process (one classified sentiment) and a second input 232 from the learning based AI classification process (as many sentiments as there are number of AI classifiers used (for example if M classifiers are used in the AI learning based module 220 then there are M classified outputs for each text portion).
  • a block diagram 700 depicts the portion sensing further analysis sub-module 224 in accordance with the present embodiment.
  • the portion sensing further analysis sub- module 224 includes a consensus analysis for learning sub-module 702 and a consensus analysis for sensing sub-module 704.
  • the consensus analysis for learning sub-module 702 aggregates the results from the non-learning analysis (1 classified sentiment) by the direct sensing analysis processing module 212 and the resulting classifiers from the AI learning based module 220 (M classified sentiments) and provides them to the AI learning based module 222 in the automatic knowledge learning and updating unit 204.
  • the consensus analysis for sensing sub-module 704 aggregates only the resulting classifiers from the AI learning based module 220 and outputs the results to the aggregation assembling sub-module 226.
  • the aggregation process of the results from the learning based AI classifiers and the results from both the non-learning and learning based classifiers proceeds by considering a majority of agreement of the classified sentiments.
  • a ratio of agreement is defined as the number of majority agreements divided by the total number of classified sentiments. For example, if there are five learning based AI classifiers used, the consensus analysis for learning sub-module 702 will receive the results 232 of the five learning based AI classifiers as well as the one result 230 from the non-learning bases analysis. Of these six results, the number of majority agreements divided by the total number of classified sentiments determines the ratio of agreement. For example, if the pattern is positive, negative, positive, positive, negative, positive, then the majority agreement is four positive and the ratio of agreement for positive is four divided by six.
  • the agreement ratio from the consensus analysis for learning sub-module 702 is compared with a threshold (where the default value is 1 or 100%) before the result is sent to the AI learning based module 222 with for processing by a negation and question mark handler thereof.
  • the threshold above which the majority agreed sentiment is sent to the AI learning-based module 222 as the new obtained ground truth for learning is user selectable. By default, the threshold is 1 (i.e., all the results from the preceding modules must agree).
  • This threshold can be tuned/selected by the user and the number of the learning based AI classifiers, M, can also be user adjusted depending on the number of suitable existing classifiers and available computational resources.
  • the ratio computed in the consensus analysis for sensing module 704 is not compared with the threshold and, instead, is sent directly to the aggregation assembling sub-module 226 in the aggregation module 223 for further processing.
  • a block diagram 800 depicts the aggregation assembling sub-module 226 in the aggregation module 223 in accordance with the present embodiment.
  • the aggregation assembling sub-module 226 receives input from both the portion relationship analysis with topic and object identification module 228 as well as the portion sensing further analysis sub-module 224.
  • the aggregation assembling sub-module 226 includes a portion valence computing sub-module 802, an aggregation sum operation sub-module 804, a multi-portion ambivalence handler 806 and a sentiment and emotion further analysis sub-module 808, all of which are seamlessly connected with a human expert input module 810 through an expert knowledge base unit 812.
  • the human expert input module 810 allows users to select desired output modes such as outputting two main sentiment categories (positive and negative only), outputting three main sentiment categories (positive, negative and neutral), outputting four main sentiment categories (positive, negative, neutral and mixed) or outputting six main sentiment categories (positive, negative, neutral, mixed positive, mixed negative and mixed neutral) via the output 234 (FIG. 2) of the intelligent sentiment and emotion sensing unit 202.
  • desired output modes such as outputting two main sentiment categories (positive and negative only), outputting three main sentiment categories (positive, negative and neutral), outputting four main sentiment categories (positive, negative, neutral and mixed) or outputting six main sentiment categories (positive, negative, neutral, mixed positive, mixed negative and mixed neutral) via the output 234 (FIG. 2) of the intelligent sentiment and emotion sensing unit 202.
  • the aggregation assembling sub-module 226 also allows a user to select the weightage of non-learning based processing (y%) and learning based AI processing ((100-y)%) for aggregating the classified sentiments from both these sources to compute the valence in the portion valence computing module 802.
  • the aggregation sum operation submodule 804 performs a first summing operation to calculate the number of positive sentiment as shown in Equation (1):
  • Equation (2) a second summing operation to calculate the negative sentiment as shown in Equation (2):
  • V ⁇ ⁇ v j - (2)
  • V + is the total positive value obtained from all the positive portions, and where V ⁇ is the total negative value obtained from all the negative portions, is the positive value of portion i; v is the negative value of the portion j;
  • N + is the total number of the positive portions of the text data;
  • N ⁇ is the total number of the negative portions of the text data;
  • N + + N ⁇ + N° N, where N° is the number of portions for which the valence values are neither positive nor negative.
  • the multi-portion ambivalence handler 806 is different from the non- sarcastic ambivalence handler submodule 414 in the ambivalence handler with hidden sarcasm detection module 214.
  • the ambivalence handler with hidden sarcasm detection module 214 handles the ambivalence sentiment in one portion, whereas the multi-portion ambivalence handler 806 is designed to handle the ambivalence sentiment of a whole text data which can contain multiple text portions.
  • the final output depends on the values of V ⁇ and V ⁇ as set forth in Equations (3) to (8):
  • the aggregation assembling sub-module 226 will output via the output 234 the final sentiment analysis result according to users' requirements as shown in the Table 2 below:
  • sentiment and emotion further analysis sub-module 808 is designed to perform further analysis on Sentiment and on Emotion before outputting final results via the output 234.
  • the automatic knowledge and learning and updating unit 204 extracts knowledge from a set of user inputted ground truth labelled data 236 or other existing data, such as domain specific labelled data, and automatically extracts useful knowledge 238, updating that knowledge in a knowledge base 240.
  • the user inputted ground truth labelled data 236 is used to update a ground truth test database 242.
  • the generated knowledge is updated online to support the main sentiment analysis engine, the intelligent sentiment and emotion sensing unit 202.
  • a block diagram 900 depicts the AI learning module 222 in accordance with the present embodiment.
  • the AI learning based module 222 includes a text pre-processing and different parts of Speech (POS) analysis sub- module 902, a special handler for negation and question mark handling 904, a knowledge extraction through learning sub-module 906 and a knowledge selection and updating sub-module 908.
  • the text pre-processing and different parts of Speech (POS) analysis sub-module 902 basically aims to filter out noise from the input text. For example, in the sub-module 902, tags, urls and other unrecognized entities are removed from the source to minimize the impact on later analysis.
  • POS Part of speech
  • POS Part of speech
  • different parts of speech are assigned different sentiment weights.
  • adjectives convey stronger sentiment information than verbs or nouns so adjectives are assigned larger sentiment weights.
  • Verbs and nouns may also convey sentiment information from time to time. For example, the verb "love” and the noun "congratulations" are often associated with positive sentiment.
  • adjectives play a significantly more dominant role than verbs and nouns. In light of the above, smaller sentiment weights to verbs and nouns.
  • inputs 244, 246 are received from both the intelligent sensing and emotion sensing unit 202 (the input 244 from the output of the portion sensing further analysis sub-module 224) and the users' input ground truth labelled data 236 (the input 246).
  • the special handler for negation and question mark handling 904 is designed to handle the text data from the users' input ground truth labelled data 236 received on the input 246 with negations or question marks before feeding them into the learning process.
  • This special handler 904 addresses the issue that when a negation appears in a text, it works together with positive or negative words or phrases to produce a final sensing output of positive or negative sentiment opposite to the original sentiment.
  • texts with question marks pose problems for conventional knowledge learning. For example, in texts such as "Is this the best one?" and "Which one is a good brand?”, people are asking questions rather than giving opinions. Therefore, the special handler for negation and question mark handling 904 filters away the data with question marks.
  • a learning-based method is selected to compute a polarity score for each word or phrase item occurring in the text.
  • the users' input ground truth labelled data 236 is tested by the system and, if the testing accuracy is higher than a threshold accuracy, the users' input ground truth labelled data 236 is used as ground truth data for the knowledge extraction through the knowledge extraction through learning sub-module 906, and the user can select the knowledge extraction mode and view the knowledge extracted before performing the knowledge updating by the knowledge selection and updating module 908 receiving input from a human expert input module 910. If the testing results are not good but the user would like to use such data to extract the knowledge 238, the extracted knowledge will be saved in a new domain to form a new user domain knowledge base 240.
  • a polarity value for each word and phrase item with different parts of speech that occurs in the text is computed.
  • a sorting process based on the value in the ascending order is then carried out.
  • the first few words at the top of the list are the most negative words for this domain and the last few words are the most positive ones.
  • Users may handcraft their own strategies to select a word or phrase based on their own need. For example, a user may select the top and bottom k percent for their sentiment lexicon. Alternatively, the user may set a threshold where any word having a polarity value greater than the threshold is included in the sentiment dictionary for that particular domain.
  • the proposed method can utilize learning-based method through the AI learning based module 222 (AI learning) to extract new lexicons and n-grams from existing ground truth data.
  • AI learning AI learning
  • the AI learning based module 222 is also capable of continually identifying and collecting new ground truth data every time the system processes new test data through the AI learning based module 220 (AI classifiers).
  • AI classifiers AI classifiers
  • the social sentiment platform in accordance with the present embodiment is designed for use by a layman.
  • the intelligent sentiment and emotion sensing unit 202 with the learning method embedded within the platform has the capabilities to collect, filter, classify, analyse and display a descriptive and predictive analytic dashboard for a given concept.
  • the system with the above unique advantageous features enables users to understand the public voice over the Internet more effectively and accurately answering an urgent need in the industry.
  • the present embodiment provides an efficient and accurate method and system for sentiment classification of text, such as social media data, utilizing intelligent features and learning capabilities.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un système de classification de messages textuels selon un sentiment et/ou une émotion exprimés par les messages textuels, un procédé de classification de messages textuels en fonction d'un sentiment et/ou d'une émotion exprimés par les messages textuels, ainsi qu'un procédé de gestion d'ambivalence ou de sarcasme caché dans des messages textuels. Le procédé de classification de messages textuels selon un sentiment et/ou une émotion exprimés par les messages textuels consiste à décomposer chacun des messages textuels en une ou plusieurs parties et effectuer une notation basée sur l'apprentissage de chaque partie de chacun des messages textuels tout en effectuant simultanément une notation basée sur le non-apprentissage de chaque partie de chacun des messages textuels pour traiter chaque partie de chacun des messages textuels en un seul passage. Le procédé consiste en outre à classifier chacun des messages textuels en fonction du sentiment et/ou de l'émotion exprimés par le message textuel en réponse à une agrégation d'une combinaison de la notation basée sur l'apprentissage et de la notation basée sur le non-apprentissage de chacune desdites une ou plusieurs parties du message textuel.
PCT/SG2017/050172 2017-03-30 2017-03-30 Procédé et système de détection intelligente de sentiment et d'émotion par apprentissage adaptatif WO2018182501A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SG2017/050172 WO2018182501A1 (fr) 2017-03-30 2017-03-30 Procédé et système de détection intelligente de sentiment et d'émotion par apprentissage adaptatif

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SG2017/050172 WO2018182501A1 (fr) 2017-03-30 2017-03-30 Procédé et système de détection intelligente de sentiment et d'émotion par apprentissage adaptatif

Publications (1)

Publication Number Publication Date
WO2018182501A1 true WO2018182501A1 (fr) 2018-10-04

Family

ID=63677935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2017/050172 WO2018182501A1 (fr) 2017-03-30 2017-03-30 Procédé et système de détection intelligente de sentiment et d'émotion par apprentissage adaptatif

Country Status (1)

Country Link
WO (1) WO2018182501A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829054A (zh) * 2019-01-17 2019-05-31 齐鲁工业大学 一种文本分类方法及系统
CN110516236A (zh) * 2019-08-09 2019-11-29 安徽工程大学 一种社交短文本细粒度情感采集方法
CN111159405A (zh) * 2019-12-27 2020-05-15 北京工业大学 基于背景知识的讽刺检测方法
CN112182209A (zh) * 2020-09-24 2021-01-05 东北大学 终身学习框架下基于gcn的跨领域情感分析方法
WO2021042160A1 (fr) * 2019-09-02 2021-03-11 Ozecom Pty Ltd Procédé de classification de texte
WO2021134177A1 (fr) * 2019-12-30 2021-07-08 深圳市优必选科技股份有限公司 Procédé, appareil et dispositif d'étiquetage de sentiments pour contenu de conversation et support de stockage
CN116882415A (zh) * 2023-09-07 2023-10-13 湖南中周至尚信息技术有限公司 一种基于自然语言处理的文本情感分析方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200506657A (en) * 2003-08-11 2005-02-16 Univ Nat Cheng Kung Semantic emotion classifying system
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN102682130A (zh) * 2012-05-17 2012-09-19 苏州大学 一种文本情感分类方法及系统
US20160189057A1 (en) * 2014-12-24 2016-06-30 Xurmo Technologies Pvt. Ltd. Computer implemented system and method for categorizing data
US20160351187A1 (en) * 2015-06-01 2016-12-01 Dell Software, Inc. Method and Apparatus to Extrapolate Sarcasm and Irony Using Multi-Dimensional Machine Learning Based Linguistic Analysis
US20170011029A1 (en) * 2013-05-09 2017-01-12 Moodwire, Inc. Hybrid human machine learning system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200506657A (en) * 2003-08-11 2005-02-16 Univ Nat Cheng Kung Semantic emotion classifying system
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN102682130A (zh) * 2012-05-17 2012-09-19 苏州大学 一种文本情感分类方法及系统
US20170011029A1 (en) * 2013-05-09 2017-01-12 Moodwire, Inc. Hybrid human machine learning system and method
US20160189057A1 (en) * 2014-12-24 2016-06-30 Xurmo Technologies Pvt. Ltd. Computer implemented system and method for categorizing data
US20160351187A1 (en) * 2015-06-01 2016-12-01 Dell Software, Inc. Method and Apparatus to Extrapolate Sarcasm and Irony Using Multi-Dimensional Machine Learning Based Linguistic Analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SIDDIQUA U.A. ET AL.: "Combining a Rule-based Classifier with Ensemble of Feature Sets and Machine Learning Techniques for Sentiment Analysis on Microblog", 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 18 December 2016 (2016-12-18), pages 304 - 309, XP033068358, [retrieved on 20170509] *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829054A (zh) * 2019-01-17 2019-05-31 齐鲁工业大学 一种文本分类方法及系统
CN110516236A (zh) * 2019-08-09 2019-11-29 安徽工程大学 一种社交短文本细粒度情感采集方法
CN110516236B (zh) * 2019-08-09 2022-10-28 安徽工程大学 一种社交短文本细粒度情感采集方法
WO2021042160A1 (fr) * 2019-09-02 2021-03-11 Ozecom Pty Ltd Procédé de classification de texte
GB2603678A (en) * 2019-09-02 2022-08-10 Ozecom Pty Ltd A text classification method
CN111159405A (zh) * 2019-12-27 2020-05-15 北京工业大学 基于背景知识的讽刺检测方法
CN111159405B (zh) * 2019-12-27 2023-09-12 北京工业大学 基于背景知识的讽刺检测方法
WO2021134177A1 (fr) * 2019-12-30 2021-07-08 深圳市优必选科技股份有限公司 Procédé, appareil et dispositif d'étiquetage de sentiments pour contenu de conversation et support de stockage
CN112182209A (zh) * 2020-09-24 2021-01-05 东北大学 终身学习框架下基于gcn的跨领域情感分析方法
CN116882415A (zh) * 2023-09-07 2023-10-13 湖南中周至尚信息技术有限公司 一种基于自然语言处理的文本情感分析方法及系统
CN116882415B (zh) * 2023-09-07 2023-11-24 湖南中周至尚信息技术有限公司 一种基于自然语言处理的文本情感分析方法及系统

Similar Documents

Publication Publication Date Title
Bonta et al. A comprehensive study on lexicon based approaches for sentiment analysis
Zhang et al. Natural language processing applied to mental illness detection: a narrative review
Baden et al. Three gaps in computational text analysis methods for social sciences: A research agenda
Christian et al. Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
Sharif et al. Sentiment analysis of Bengali texts on online restaurant reviews using multinomial Naïve Bayes
Müller et al. Utilizing big data analytics for information systems research: challenges, promises and guidelines
WO2018182501A1 (fr) Procédé et système de détection intelligente de sentiment et d'émotion par apprentissage adaptatif
US20170308523A1 (en) A method and system for sentiment classification and emotion classification
US8676730B2 (en) Sentiment classifiers based on feature extraction
US9092789B2 (en) Method and system for semantic analysis of unstructured data
Seerat et al. Opinion Mining: Issues and Challenges(A survey)
US9710829B1 (en) Methods, systems, and articles of manufacture for analyzing social media with trained intelligent systems to enhance direct marketing opportunities
Hasanli et al. Sentiment analysis of Azerbaijani twits using logistic regression, Naive Bayes and SVM
US9632998B2 (en) Claim polarity identification
Kochuieva et al. Usage of Sentiment Analysis to Tracking Public Opinion.
Almagrabi et al. A survey of quality prediction of product reviews
Govindarajan Approaches and applications for sentiment analysis: a literature review
Bashir et al. Human aggressiveness and reactions towards uncertain decisions
Trivedi et al. Capturing user sentiments for online Indian movie reviews: A comparative analysis of different machine-learning models
Phan et al. A method for detecting and analyzing the sentiment of tweets containing conditional sentences
Qian et al. Satiindicator: Leveraging user reviews to evaluate user satisfaction of sourceforge projects
Abel et al. Sentiment-analysis for German employer reviews
Al-Bnd et al. Sentiment analysis and opinion mining via microblogging in social media like: twitter
Wambsganss et al. Using Deep Learning for Extracting User-Generated Knowledge from Web Communities.
Akerkar et al. Natural language processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17903746

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17903746

Country of ref document: EP

Kind code of ref document: A1