US11748573B2 - System and method to quantify subject-specific sentiment - Google Patents

System and method to quantify subject-specific sentiment Download PDF

Info

Publication number
US11748573B2
US11748573B2 US17/122,712 US202017122712A US11748573B2 US 11748573 B2 US11748573 B2 US 11748573B2 US 202017122712 A US202017122712 A US 202017122712A US 11748573 B2 US11748573 B2 US 11748573B2
Authority
US
United States
Prior art keywords
subject
text input
word
sentiment
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US17/122,712
Other versions
US20210216721A1 (en
Inventor
Sitarama Brahmam GUNTURI
Pranavi SURA
Brajesh Singh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tata Consultancy Services Ltd
Original Assignee
Tata Consultancy Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Ltd filed Critical Tata Consultancy Services Ltd
Assigned to TATA CONSULTANCY SERVICES LIMITED reassignment TATA CONSULTANCY SERVICES LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GUNTURI, Sitarama Brahmam, Sura, Pranavi, SINGH, BRAJESH
Publication of US20210216721A1 publication Critical patent/US20210216721A1/en
Application granted granted Critical
Publication of US11748573B2 publication Critical patent/US11748573B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the disclosure herein generally relates to a field of sentiment analysis of a text input and, more particularly, a system and method for a quantitative measure of a subject-specific sentiment analysis of a text input.
  • NLP natural language processing
  • Vader based approaches are quantitative approaches that provide quantitative measure of the sentiment of the text.
  • Machine learning based approaches classify the text messages into predefined set of classes, such as strong positive, positive, neutral, negative, and strong negative. Both these approaches do not deal with sentiment related to the individual subjects of the text.
  • Embodiments of the present disclosure provides technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
  • a processor-implemented method for a quantitative measure of a subject-specific sentiment analysis of a text input comprises one or more subjects and one or more objects. It would be appreciated that the sentiment of each subject is quantified separately then an overall sentiment of the text input.
  • the method comprises one or more steps as follows.
  • at least one text input is received.
  • the received at least one text input comprises at least one subject and at least one object. It is to be noted that in absence of at least one object in the received text input, a first adjective present in the received text input would be considered as an object of the received text input.
  • the received text input is tokenized using a predefined word delimiter. Each tokenized word is tagged based on a part-of-speech (POS) and a universal dependency tag. Further, at least one subject and at least one object is identified using a subject-verb-object (SVO) detection model. It would be appreciated that, wherein more than one subject present in the received text, each subject of the received text input is identified and tagged.
  • POS part-of-speech
  • SVO subject-verb-object
  • a tree of the universal dependency tagged words is prepared and analyzed for identified subject to determine a token dependency of the identified subject using a predefined list of tuples of each identified word and a dependence tag of each tuple. Extracting at least one phrase corresponding to the identified subject and represent the extracted at least one phrase in a numerical vector. It is to be noted that wherein more than one subject is present in the received text input, an index will be prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input. Further, the identified subject is quantified using a pre-trained deep learning-based sentiment analyzer and finally recommending a sentiment score for the quantified subject along with a sentiment score of the received text input using a probability score of the deep learning-based sentiment analyzer. A predefined class score is also assigned to the quantified subject.
  • a system is configured for a quantitative measure of a subject-specific sentiment analysis of a text input. It would be appreciated that the sentiment of each subject is quantified separately then an overall sentiment of the text input.
  • the system comprising at least one memory storing a plurality of instructions and one or more hardware processors communicatively coupled with at least one memory.
  • the one or more hardware processors are configured to execute one or more modules comprises of a receiving module, a tokenization module, a tagging module, an identification module, an analyzing module, a quantification module, and a recommendation module.
  • the receiving module of the system is configured to receive at least one text input comprising at least one subject and at least one object. It is to be noted that in absence of at least one object in the received text input, a first adjective present in the received text input would be considered as an object of the received text input.
  • the tokenization module of the system is configured to tokenize the received text input based on a predefined word delimiter. Further, each word of the tokenized text input is tagged at a tagging module based on a part-of-speech (POS) and a universal dependency tag.
  • the identification module of the system is configured to identify at least one subject and at least one object of the tokenized text input using a subject-verb-object (SVO) detection model. It would be appreciated that, wherein more than one subject present in the received text, each subject of the received text input is identified and tagged.
  • a universal dependency tag tree is prepared from the identified subjects.
  • the analyzing module of the system is configured to analyze the universal dependency tag tree to determine token dependency of the identified subject using a predefined list of tuples of each identified word and a dependence tag of each tuple. At least one phrase corresponding to the identified subject is extracted and represented in a numerical vector. It is to be noted that wherein more than one subject is present in the received text input, an index will be prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input.
  • the quantification module is configured to quantify the identified subject using a deep learning-based sentiment analyzer and the recommendation module is configured to recommend a sentiment score for the quantified subject along with a sentiment score of the received text input using a probability score of the deep learning-based sentiment analyzer. Further, a predefined class score is assigned to the quantified subject.
  • a non-transitory computer readable medium storing one or more instructions which when executed by a processor on a system cause the processor to perform method.
  • the method comprises one or more steps as follows.
  • at least one text input is received.
  • the received at least one text input comprises at least one subject and at least one object. It is to be noted that in absence of at least one object in the received text input, a first adjective present in the received text input would be considered as an object of the received text input.
  • the received text input is tokenized using a predefined word delimiter. Each tokenized word is tagged based on a part-of-speech (POS) and a universal dependency tag. Further, at least one subject and at least one object is identified using a subject-verb-object (SVO) detection model. It would be appreciated that, wherein more than one subject present in the received text, each subject of the received text input is identified and tagged.
  • POS part-of-speech
  • SVO subject-verb-object
  • a tree of the universal dependency tagged words is prepared and analyzed for identified subject to determine a token dependency of the identified subject using a predefined list of tuples of each identified word and a dependence tag of each tuple. Extracting at least one phrase corresponding to the identified subject and represent the extracted at least one phrase in a numerical vector. It is to be noted that wherein more than one subject is present in the received text input, an index will be prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input. Further, the identified subject is quantified using a pre-trained deep learning-based sentiment analyzer and finally recommending a sentiment score for the quantified subject along with a sentiment score of the received text input using a probability score of the deep learning-based sentiment analyzer. A predefined class score is also assigned to the quantified subject.
  • FIG. 1 illustrates a system for a quantitative measure of a subject-specific sentiment analysis of an unstructured form of texts, in accordance with some embodiments of the present disclosure.
  • FIG. 2 is a flow diagram to illustrate a method for a quantitative measure of a subject-specific sentiment analysis of a text input, in accordance with some embodiments of the present disclosure.
  • FIG. 3 is a functional block diagram of the system to measure a sentiment score to a text input, in accordance with some embodiments of the present disclosure.
  • the embodiments herein provide a method and a system for a quantitative measure of a subject-specific sentiment from input text.
  • the system and method to identify each subject and the related phrases and the sentiment is quantitatively measured at the subject level along with at a text input level.
  • FIG. 1 through FIG. 3 where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
  • FIG. 1 illustrates a system ( 100 ) for a quantitative measure of a subject-specific sentiment analysis of an unstructured form of texts.
  • the system ( 100 ) comprises at least one memory ( 102 ) with a plurality of instructions and one or more hardware processors ( 104 ) which are communicatively coupled with the at least one memory ( 102 ) to execute modules therein.
  • the hardware processor ( 104 ) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions.
  • the hardware processor ( 104 ) is configured to fetch and execute computer-readable instructions stored in the memory ( 102 ).
  • the system comprises a receiving module ( 106 ), a tokenization module ( 108 ), a tagging module ( 110 ), an identification module ( 112 ), an analyzing module ( 114 ), a quantification module ( 116 ), and a recommendation module ( 118 ).
  • the receiving module ( 106 ) of the system ( 100 ) is configured to receive at least one text input comprising at least one subject and at least one object word.
  • the text input is received from a predefined internet source of information or a predefined file in a form of natural language.
  • the predefined internet source of information includes a social media platform, and a consumer forum. It is to be noted that in absence of the at least one object in the received text input, a first adjective is considered as an object of the received text input.
  • the tokenization module ( 108 ) of the system ( 100 ) is configured to tokenize the received at least one text input based on a predefined word delimiter.
  • the predefined word delimiter includes a comma, a space, and a semicolon.
  • the space is used to delimit words whereas the comma and semicolon are used to separate text input.
  • the tagging module ( 110 ) of the system is configured to tag each word of the tokenized at least one text input based on a part-of-speech (POS) and a universal dependency tag.
  • POS part-of-speech
  • a universal dependency tag tree is identified from the each tagged word with the universal dependency tag.
  • the identification module ( 112 ) of the system ( 100 ) is configured to identify at least one subject and at least one object of the at least one tokenized text input using a subject-verb-object (SVO) detection model.
  • the identified subject and objects are classified in noun chunks. It is to be noted that in presence of more than one subject in the received text input, an index is prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input.
  • a list of noun chunks present in the text input is also identified using a subject-verb-object detection model.
  • the list of noun chunks is analyzed to identify a number of subjects and a number of objects in the text input. If the number of subjects is more than one, then first subject word of the text input is marked with subject tag from the list of noun chunks. Following the first subject word, a first object is identified from a token dependency list and marked with an object tag. However, if in the text input the identified first object word is followed with a word having an object tag or an adjective dependency tag then the first word with object tag is changed with the following word as an object tag of the text input.
  • a rule engine is calibrated to use the dependency tags and the noun phrases to extract clauses relevant to each identified subject. If the number of subjects in the noun chuck is more than one, then first word with subject tag from the noun chunk list and marked as a subject word. Further, the system identifies an object word from the noun chunk and prepares an index of the corresponding subject and object from the token list and get a resultant clause. The same step is repeated for each identified subject in the noun chunk list.
  • an input text to the system is “I ordered the lemon raspberry ice cocktail, which was also immense”.
  • the input text is tokenized at tokenization module of the system and at least one subject and object are identified at the identification module.
  • one subject ‘I’ and one object ‘the lemon raspberry ice cocktail’ is identified for a quantitative measure of a subject-specific sentiment analysis.
  • an input text to the system is “the story is lame, not interesting and never really explains the sinister origins of the puppets”.
  • the input text comprises one subject ‘the story’ and multiple objects ‘the sinister origin’ and ‘the puppets’ for a quantitative measure of a subject-specific sentiment analysis.
  • a first adjective word is identified with an object tag.
  • an index will be prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input.
  • an input to the system is “the cashew cream sauce was bland, and the vegetables were undercooked”.
  • the input text comprises multiple subjects without any object.
  • the subject includes ‘the cashew cream sauce’ and ‘the vegetables’. Therefore, a first adjective in the input text is considered as an object.
  • the first adjective is ‘bland’ to be taken as an object of the input text.
  • the analyzing module ( 114 ) of the system ( 100 ) is configured to analyze the universal dependency tag tree for the identified subject to determine a token dependency of the identified subject using a dependency parser. Extracting at least one phrase corresponding to the identified at least one subject from the determined token dependency and represent the extracted at least one phrase in a numerical vector.
  • a dependency parser analyzes a grammatical structure of the text input, establishing a relationship between a headword and a word, which modify the headword.
  • the dependency parser supports a universal dependency scheme.
  • the universal dependency is a framework for a consistent annotation of grammar across different human languages. It would be appreciated that different dependency tags are present, which describes the relationship of a word with respect to its neighboring words.
  • the quantification module ( 116 ) of the system ( 100 ) is configured to quantify the identified at least one subject using a deep learning based a sentiment analyzer and a predefined class score of the at least one subject.
  • the deep learning-based sentiment analyzer is trained on a dataset using a convolutional neural network (CNN) model.
  • CNN convolutional neural network
  • one or more noun phrases of each subject are converted to word embedding and are given to a pre-trained CNN for sentiment determination.
  • Output of the CNN model is a list of probability score for each class.
  • the probability scores along with the pre-defined sentiment score range is used to calculate the sentiment score of the input text.
  • the CNN model comprises an input layer, an embedding layer, a plurality of convolution layers, a pooling layer and a dense layer.
  • the input layer takes the sequence of tokens of the received input text as input to the CNN model.
  • the embedding layer is a look-up table, wherein each tokenized word of the received text input is mapped to a trainable feature vector.
  • the plurality of convolution layers is the core building block of the CNN model.
  • the plurality of convolution layers comprises of a set of independent learnable filters. Each of the learnable filters is independently convolved with the embedding matrix producing different feature maps.
  • the pooling layer progressively reduces the spatial size of the representation to reduce the number of parameters and computation in the network. Pooling layer operates on each feature map independently.
  • the recommendation module ( 118 ) of the system ( 100 ) is configured to recommend a sentiment score for the quantified subject and a sentiment score of the received text input using a probability score of the deep learning-based sentiment analyzer. Further, assigning a predefined sentiment class using the recommended sentiment score to the quantified at least one subject and the sentiment score to the received text input.
  • an input text to the rule engine of the system is “the cashew cream sauce was bland, and the vegetables were undercooked”.
  • the output of the rule engine is “the cashew cream sauce was bland”, first clause and “the vegetables were undercooked” secs and clause of the input text.
  • the input to the CNN model is first clause “the cashew cream sauce was bland”.
  • the output of the CNN model will be a list of probability scores corresponding to each sentiment class [P 1 , P 2 , P 3 , P 4 , P 5 ]. Therefore, a sentiment score of the identified subject and a sentiment score of the received text input is calculated.
  • ⁇ i 1 5 S ⁇ i * P ⁇ i
  • S i is a predefined class score ⁇ 1.0, ⁇ 0.5, 0, 0.5, 1.0 ⁇
  • P i is the probability score given by the CNN model at an output layer. Therefore, the sentiment score of the given clause is ⁇ 0.98. It is repeated for second clause and the sentiment score is ⁇ 0.88.
  • the sentiment score for each clause is quantified, the corresponding subject of each clause is classified as a positive or negative subject of the input text. Further, an overall sentiment score is calculated for the complete input text using the sentiment analyzer.
  • the overall sentiment score for the input text is ⁇ 0.99.
  • a processor-implemented method for quantitative measure of subject wise sentiment analysis of unstructured form of texts.
  • the method comprises one or more steps as follows.
  • a text input is received at a receiving module of ( 106 ) of the system ( 100 ).
  • the text input comprises at least one subject and at least one object word.
  • the text input is received from a predefined internet source of information or a predefined file in a form of natural language.
  • the predefined internet source of information includes a social media platform, and a consumer forum. It is to be noted that in absence of the at least one object in the received text input, a first adjective is considered as an object of the received text input.
  • the received at least one text input is tokenized at a tokenization module ( 108 ) of the system ( 100 ) using a predefined word delimiter model.
  • Each word of the tokenized text input is tagged based on a part-of-speech (POS) and a universal dependency tag.
  • POS part-of-speech
  • each word of the tokenized at least one text input is tagged based on a part-of-speech (POS) and a universal dependency tag. Further, a universal dependency tag tree is identified from the each tagged word with the universal dependency tag.
  • POS part-of-speech
  • the identified subject and objects are classified in noun chunks.
  • At least one text input is tokenized using the predefined word limiter and a list of tuples of each word of the text input is obtained. Further, a list of noun chunks present in the text input is also identified using subject-verb-object detection model. The list of noun chunks is analyzed to identify a number of subjects and a number of objects in the text input. If the number of subjects is more than one, then first subject word of the text input is marked with subject tag from the list of noun chunks.
  • the system pre-processes the received text input using the word delimiter and quantifying using a deep learning-based sentiment analyzer.
  • the system recommends a sentiment score to the received text input using a probability score of the deep learning-based sentiment analyzer.
  • a tree of the universal dependency tagged words are analyzed at an analyzing module ( 112 ) of the system ( 100 ) for identified at least one subject to determine a token dependency of the identified at least one subject using a dependency parser. Extracting at least one phrase corresponding to the identified subject and represent the extracted at least one phrase in a numerical vector.
  • the identified at least one subject is quantified at a quantification module ( 114 ) of the system ( 100 ) using a deep learning-based sentiment analyzer.
  • the deep learning-based sentiment analyzer is trained on a dataset using a convolutional neural network (CNN) model.
  • CNN convolutional neural network
  • one or more noun phrases of each subject are converted to word embedding and are given to a pre-trained CNN for sentiment determination.
  • Output of the CNN model is a list of probability score for each class. The probability scores along with the pre-defined sentiment score range is used to calculate the sentiment score of the input text.
  • the last step ( 214 ) recommending a sentiment score for the quantified subject along with a sentiment score of the received text input at a recommendation module ( 116 ) of the system ( 100 ) using the probability score of the deep learning-based sentiment analyzer, Further, assigning a predefined sentiment class using the recommended sentiment score to the quantified at least one subject and the sentiment score to the received text input.
  • the embodiments of present disclosure herein address unresolved problem associated with quantitative measure of subject wise sentiment analysis of unstructured form of texts.
  • Most of the existing solutions have been reported to measure the sentiment of the text and they can be categorized as qualitative and quantitative approaches.
  • Vader based approaches are quantitative approaches that provide quantitative measure of the sentiment of the text.
  • Machine learning based approaches classify the text messages into predefined set of classes, such as strong positive, positive, neutral, negative and strong negative. Both these approaches do not deal with sentiment related to the individual subjects of the text.
  • the hardware device can be any kind of device, which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof.
  • the device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g.
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • the means can include both hardware means, and software means.
  • the method embodiments described herein could be implemented in hardware and software.
  • the device may also include software means.
  • the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
  • the embodiments herein can comprise hardware and software elements.
  • the embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc.
  • the functions performed by various components described herein may be implemented in other components or combinations of other components.
  • a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • a computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored.
  • a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein.
  • the term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.

Abstract

This disclosure relates to a system and method for quantitative measure of subject specific sentiment analysis of a text input. The text input comprises subjects and objects. The text input is tokenized, and each word of the tokenized text input is tagged based on a part-of-speech (POS) and a universal dependency tag. A universal dependency tag tree is prepared based on dependency tags. Further, the subjects and objects are identified using a subject-verb-object (SVO) detection. The universal dependency tree is analyzed for each identified subject to determine a token dependency of the subject. The identified subject is quantified using a deep learning-based sentiment analyzer and finally a sentiment score is recommended for the subject using a probability score and a class score is assigned to the subject.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
This U.S. patent application claims priority under 35 U.S.C. § 119 to Indian application No. 201921052179, filed on Dec. 16, 2019. The entire content of the abovementioned application is incorporated herein by reference.
TECHNICAL FIELD
The disclosure herein generally relates to a field of sentiment analysis of a text input and, more particularly, a system and method for a quantitative measure of a subject-specific sentiment analysis of a text input.
BACKGROUND
The amount of data generated each year is growing exponentially because of digitalization of almost every industry. Majority of data available is unstructured in the form of text and images. Various natural language processing (NLP) techniques have been used to process the unstructured data and get the useful information out of that. Over the past few years, many deep learning-based approaches have been introduced to provide efficient solutions to problems in natural language processing (NLP) such as text classification, sentiment analysis, text summarization, topic discovery and so on.
Existing solutions have been reported in the literature to measure the sentiment of the text and they can be categorized as qualitative and quantitative approaches. Vader based approaches are quantitative approaches that provide quantitative measure of the sentiment of the text. Machine learning based approaches classify the text messages into predefined set of classes, such as strong positive, positive, neutral, negative, and strong negative. Both these approaches do not deal with sentiment related to the individual subjects of the text.
SUMMARY
Embodiments of the present disclosure provides technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
In one aspect, a processor-implemented method for a quantitative measure of a subject-specific sentiment analysis of a text input. Herein, the text input comprises one or more subjects and one or more objects. It would be appreciated that the sentiment of each subject is quantified separately then an overall sentiment of the text input.
The method comprises one or more steps as follows. Herein, at least one text input is received. The received at least one text input comprises at least one subject and at least one object. It is to be noted that in absence of at least one object in the received text input, a first adjective present in the received text input would be considered as an object of the received text input. The received text input is tokenized using a predefined word delimiter. Each tokenized word is tagged based on a part-of-speech (POS) and a universal dependency tag. Further, at least one subject and at least one object is identified using a subject-verb-object (SVO) detection model. It would be appreciated that, wherein more than one subject present in the received text, each subject of the received text input is identified and tagged.
Further, a tree of the universal dependency tagged words is prepared and analyzed for identified subject to determine a token dependency of the identified subject using a predefined list of tuples of each identified word and a dependence tag of each tuple. Extracting at least one phrase corresponding to the identified subject and represent the extracted at least one phrase in a numerical vector. It is to be noted that wherein more than one subject is present in the received text input, an index will be prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input. Further, the identified subject is quantified using a pre-trained deep learning-based sentiment analyzer and finally recommending a sentiment score for the quantified subject along with a sentiment score of the received text input using a probability score of the deep learning-based sentiment analyzer. A predefined class score is also assigned to the quantified subject.
In another aspect, a system is configured for a quantitative measure of a subject-specific sentiment analysis of a text input. It would be appreciated that the sentiment of each subject is quantified separately then an overall sentiment of the text input. The system comprising at least one memory storing a plurality of instructions and one or more hardware processors communicatively coupled with at least one memory. The one or more hardware processors are configured to execute one or more modules comprises of a receiving module, a tokenization module, a tagging module, an identification module, an analyzing module, a quantification module, and a recommendation module.
The receiving module of the system is configured to receive at least one text input comprising at least one subject and at least one object. It is to be noted that in absence of at least one object in the received text input, a first adjective present in the received text input would be considered as an object of the received text input. The tokenization module of the system is configured to tokenize the received text input based on a predefined word delimiter. Further, each word of the tokenized text input is tagged at a tagging module based on a part-of-speech (POS) and a universal dependency tag. The identification module of the system is configured to identify at least one subject and at least one object of the tokenized text input using a subject-verb-object (SVO) detection model. It would be appreciated that, wherein more than one subject present in the received text, each subject of the received text input is identified and tagged. A universal dependency tag tree is prepared from the identified subjects.
Further, the analyzing module of the system is configured to analyze the universal dependency tag tree to determine token dependency of the identified subject using a predefined list of tuples of each identified word and a dependence tag of each tuple. At least one phrase corresponding to the identified subject is extracted and represented in a numerical vector. It is to be noted that wherein more than one subject is present in the received text input, an index will be prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input.
The quantification module is configured to quantify the identified subject using a deep learning-based sentiment analyzer and the recommendation module is configured to recommend a sentiment score for the quantified subject along with a sentiment score of the received text input using a probability score of the deep learning-based sentiment analyzer. Further, a predefined class score is assigned to the quantified subject.
In yet another embodiment, a non-transitory computer readable medium storing one or more instructions which when executed by a processor on a system cause the processor to perform method is provided. The method comprises one or more steps as follows. Herein, at least one text input is received. The received at least one text input comprises at least one subject and at least one object. It is to be noted that in absence of at least one object in the received text input, a first adjective present in the received text input would be considered as an object of the received text input. The received text input is tokenized using a predefined word delimiter. Each tokenized word is tagged based on a part-of-speech (POS) and a universal dependency tag. Further, at least one subject and at least one object is identified using a subject-verb-object (SVO) detection model. It would be appreciated that, wherein more than one subject present in the received text, each subject of the received text input is identified and tagged.
Further, a tree of the universal dependency tagged words is prepared and analyzed for identified subject to determine a token dependency of the identified subject using a predefined list of tuples of each identified word and a dependence tag of each tuple. Extracting at least one phrase corresponding to the identified subject and represent the extracted at least one phrase in a numerical vector. It is to be noted that wherein more than one subject is present in the received text input, an index will be prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input. Further, the identified subject is quantified using a pre-trained deep learning-based sentiment analyzer and finally recommending a sentiment score for the quantified subject along with a sentiment score of the received text input using a probability score of the deep learning-based sentiment analyzer. A predefined class score is also assigned to the quantified subject.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates a system for a quantitative measure of a subject-specific sentiment analysis of an unstructured form of texts, in accordance with some embodiments of the present disclosure.
FIG. 2 is a flow diagram to illustrate a method for a quantitative measure of a subject-specific sentiment analysis of a text input, in accordance with some embodiments of the present disclosure.
FIG. 3 is a functional block diagram of the system to measure a sentiment score to a text input, in accordance with some embodiments of the present disclosure.
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes, which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments.
The embodiments herein provide a method and a system for a quantitative measure of a subject-specific sentiment from input text. Herein, the system and method to identify each subject and the related phrases and the sentiment is quantitatively measured at the subject level along with at a text input level.
Referring now to the drawings, and more particularly to FIG. 1 through FIG. 3 , where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates a system (100) for a quantitative measure of a subject-specific sentiment analysis of an unstructured form of texts. In the preferred embodiment, the system (100) comprises at least one memory (102) with a plurality of instructions and one or more hardware processors (104) which are communicatively coupled with the at least one memory (102) to execute modules therein.
The hardware processor (104) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the hardware processor (104) is configured to fetch and execute computer-readable instructions stored in the memory (102). Further, the system comprises a receiving module (106), a tokenization module (108), a tagging module (110), an identification module (112), an analyzing module (114), a quantification module (116), and a recommendation module (118).
In the preferred embodiment of the disclosure, the receiving module (106) of the system (100) is configured to receive at least one text input comprising at least one subject and at least one object word. Herein, the text input is received from a predefined internet source of information or a predefined file in a form of natural language. The predefined internet source of information includes a social media platform, and a consumer forum. It is to be noted that in absence of the at least one object in the received text input, a first adjective is considered as an object of the received text input.
In the preferred embodiment of the disclosure, the tokenization module (108) of the system (100) is configured to tokenize the received at least one text input based on a predefined word delimiter. The predefined word delimiter includes a comma, a space, and a semicolon. The space is used to delimit words whereas the comma and semicolon are used to separate text input.
In the preferred embodiment of the disclosure, the tagging module (110) of the system is configured to tag each word of the tokenized at least one text input based on a part-of-speech (POS) and a universal dependency tag. A universal dependency tag tree is identified from the each tagged word with the universal dependency tag.
In the preferred embodiment of the disclosure, the identification module (112) of the system (100) is configured to identify at least one subject and at least one object of the at least one tokenized text input using a subject-verb-object (SVO) detection model. The identified subject and objects are classified in noun chunks. It is to be noted that in presence of more than one subject in the received text input, an index is prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input.
In one aspect, when the text input is tokenized using the predefined word limiter and a list of tuples of each word of the text input is obtained. Further, a list of noun chunks present in the text input is also identified using a subject-verb-object detection model. The list of noun chunks is analyzed to identify a number of subjects and a number of objects in the text input. If the number of subjects is more than one, then first subject word of the text input is marked with subject tag from the list of noun chunks. Following the first subject word, a first object is identified from a token dependency list and marked with an object tag. However, if in the text input the identified first object word is followed with a word having an object tag or an adjective dependency tag then the first word with object tag is changed with the following word as an object tag of the text input.
In another aspect, after getting the dependency tags and noun phrases from the input text, a rule engine is calibrated to use the dependency tags and the noun phrases to extract clauses relevant to each identified subject. If the number of subjects in the noun chuck is more than one, then first word with subject tag from the noun chunk list and marked as a subject word. Further, the system identifies an object word from the noun chunk and prepares an index of the corresponding subject and object from the token list and get a resultant clause. The same step is repeated for each identified subject in the noun chunk list.
In one example, wherein an input text to the system is “I ordered the lemon raspberry ice cocktail, which was also incredible”. The input text is tokenized at tokenization module of the system and at least one subject and object are identified at the identification module. Herein, one subject ‘I’ and one object ‘the lemon raspberry ice cocktail’ is identified for a quantitative measure of a subject-specific sentiment analysis.
In another example, wherein an input text to the system is “the story is lame, not interesting and never really explains the sinister origins of the puppets”. The input text comprises one subject ‘the story’ and multiple objects ‘the sinister origin’ and ‘the puppets’ for a quantitative measure of a subject-specific sentiment analysis.
In another embodiment, wherein the received text input does not contain a word with an object tag then a first adjective word is identified with an object tag. Further, wherein more than one subject is present in the received text input, an index will be prepared using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input.
In another example, wherein an input to the system is “the cashew cream sauce was bland, and the vegetables were undercooked”. The input text comprises multiple subjects without any object. The subject includes ‘the cashew cream sauce’ and ‘the vegetables’. Therefore, a first adjective in the input text is considered as an object. Herein, the first adjective is ‘bland’ to be taken as an object of the input text.
In the preferred embodiment of the disclosure, the analyzing module (114) of the system (100) is configured to analyze the universal dependency tag tree for the identified subject to determine a token dependency of the identified subject using a dependency parser. Extracting at least one phrase corresponding to the identified at least one subject from the determined token dependency and represent the extracted at least one phrase in a numerical vector.
It would be appreciated that a dependency parser analyzes a grammatical structure of the text input, establishing a relationship between a headword and a word, which modify the headword. The dependency parser supports a universal dependency scheme. The universal dependency is a framework for a consistent annotation of grammar across different human languages. It would be appreciated that different dependency tags are present, which describes the relationship of a word with respect to its neighboring words.
In the preferred embodiment of the disclosure, the quantification module (116) of the system (100) is configured to quantify the identified at least one subject using a deep learning based a sentiment analyzer and a predefined class score of the at least one subject. The deep learning-based sentiment analyzer is trained on a dataset using a convolutional neural network (CNN) model. Herein, one or more noun phrases of each subject are converted to word embedding and are given to a pre-trained CNN for sentiment determination. Output of the CNN model is a list of probability score for each class. The probability scores along with the pre-defined sentiment score range is used to calculate the sentiment score of the input text.
It is to be noted that the CNN model comprises an input layer, an embedding layer, a plurality of convolution layers, a pooling layer and a dense layer. The input layer takes the sequence of tokens of the received input text as input to the CNN model. The embedding layer is a look-up table, wherein each tokenized word of the received text input is mapped to a trainable feature vector. The plurality of convolution layers is the core building block of the CNN model. The plurality of convolution layers comprises of a set of independent learnable filters. Each of the learnable filters is independently convolved with the embedding matrix producing different feature maps. The pooling layer progressively reduces the spatial size of the representation to reduce the number of parameters and computation in the network. Pooling layer operates on each feature map independently. Finally, after several convolutional and max pooling layers, the high-level reasoning in the neural network is done through a fully connect layer. Neurons in a fully connected layer have connections to all activations in the max-pooling layer, as seen in regular neural networks. Fully connected layer generates an output equal to the number of classes a user defines.
In the preferred embodiment of the disclosure, the recommendation module (118) of the system (100) is configured to recommend a sentiment score for the quantified subject and a sentiment score of the received text input using a probability score of the deep learning-based sentiment analyzer. Further, assigning a predefined sentiment class using the recommended sentiment score to the quantified at least one subject and the sentiment score to the received text input.
In one example, wherein an input text to the rule engine of the system is “the cashew cream sauce was bland, and the vegetables were undercooked”. The output of the rule engine is “the cashew cream sauce was bland”, first clause and “the vegetables were undercooked” secs and clause of the input text. Further, the input to the CNN model is first clause “the cashew cream sauce was bland”. The output of the CNN model will be a list of probability scores corresponding to each sentiment class [P1, P2, P3, P4, P5]. Therefore, a sentiment score of the identified subject and a sentiment score of the received text input is calculated.
i = 1 5 S i * P i
wherein, Si is a predefined class score {−1.0, −0.5, 0, 0.5, 1.0} and Pi is the probability score given by the CNN model at an output layer. Therefore, the sentiment score of the given clause is −0.98. It is repeated for second clause and the sentiment score is −0.88. Once the sentiment score for each clause is quantified, the corresponding subject of each clause is classified as a positive or negative subject of the input text. Further, an overall sentiment score is calculated for the complete input text using the sentiment analyzer. Herein, the overall sentiment score for the input text is −0.99.
Referring FIG. 2 , a processor-implemented method (200) for quantitative measure of subject wise sentiment analysis of unstructured form of texts. The method comprises one or more steps as follows.
Initially, at the step (202), a text input is received at a receiving module of (106) of the system (100). The text input comprises at least one subject and at least one object word. Herein, the text input is received from a predefined internet source of information or a predefined file in a form of natural language. The predefined internet source of information includes a social media platform, and a consumer forum. It is to be noted that in absence of the at least one object in the received text input, a first adjective is considered as an object of the received text input.
In the preferred embodiment of the disclosure, at the next step (204), the received at least one text input is tokenized at a tokenization module (108) of the system (100) using a predefined word delimiter model. Each word of the tokenized text input is tagged based on a part-of-speech (POS) and a universal dependency tag.
In the preferred embodiment of the disclosure, at the next step (206), each word of the tokenized at least one text input is tagged based on a part-of-speech (POS) and a universal dependency tag. Further, a universal dependency tag tree is identified from the each tagged word with the universal dependency tag.
In the preferred embodiment of the disclosure, at the next step (208), at least one subject and at least one object is identified at an identification module (110) of the system (100) using a subject-verb-object (SVO) detection model. The identified subject and objects are classified in noun chunks.
Referring FIG. 3 , wherein at least one text input is tokenized using the predefined word limiter and a list of tuples of each word of the text input is obtained. Further, a list of noun chunks present in the text input is also identified using subject-verb-object detection model. The list of noun chunks is analyzed to identify a number of subjects and a number of objects in the text input. If the number of subjects is more than one, then first subject word of the text input is marked with subject tag from the list of noun chunks.
Furthermore, to measure sentiment score of the text input excluding the sentiment score for each of the identified subjects present in the text input, the system pre-processes the received text input using the word delimiter and quantifying using a deep learning-based sentiment analyzer. The system recommends a sentiment score to the received text input using a probability score of the deep learning-based sentiment analyzer.
In the preferred embodiment of the disclosure, at the next step (210), a tree of the universal dependency tagged words are analyzed at an analyzing module (112) of the system (100) for identified at least one subject to determine a token dependency of the identified at least one subject using a dependency parser. Extracting at least one phrase corresponding to the identified subject and represent the extracted at least one phrase in a numerical vector.
In the preferred embodiment of the disclosure, at the next step (212), the identified at least one subject is quantified at a quantification module (114) of the system (100) using a deep learning-based sentiment analyzer. The deep learning-based sentiment analyzer is trained on a dataset using a convolutional neural network (CNN) model. Herein, one or more noun phrases of each subject are converted to word embedding and are given to a pre-trained CNN for sentiment determination. Output of the CNN model is a list of probability score for each class. The probability scores along with the pre-defined sentiment score range is used to calculate the sentiment score of the input text.
In the preferred embodiment of the disclosure, at the last step (214), recommending a sentiment score for the quantified subject along with a sentiment score of the received text input at a recommendation module (116) of the system (100) using the probability score of the deep learning-based sentiment analyzer, Further, assigning a predefined sentiment class using the recommended sentiment score to the quantified at least one subject and the sentiment score to the received text input.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
The embodiments of present disclosure herein address unresolved problem associated with quantitative measure of subject wise sentiment analysis of unstructured form of texts. Most of the existing solutions have been reported to measure the sentiment of the text and they can be categorized as qualitative and quantitative approaches. Vader based approaches are quantitative approaches that provide quantitative measure of the sentiment of the text. Machine learning based approaches classify the text messages into predefined set of classes, such as strong positive, positive, neutral, negative and strong negative. Both these approaches do not deal with sentiment related to the individual subjects of the text.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device, which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development would change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

Claims (11)

What is claimed is:
1. A processor-implemented method (200) comprising:
receiving (202), via one or more hardware processors, at least one text input having at least one subject and at least one object;
tokenizing (204), via the one or more hardware processors, the received at least one text input using a predefined word delimiter;
tagging (206), via the one or more hardware processors, each word of the tokenized at least one text input based on a part-of-speech (POS) and a universal dependency tag, wherein a universal dependency tag tree is identified from the each tagged word with the universal dependency tag;
identifying (208), via the one or more hardware processors, at least one subject and at least one object from the tagged each word of the at least one text input using a subject-verb-object (SVO) detection model, wherein the identified at least one subject and the at least one object is classified in noun chunks;
analyzing, via the one or more hardware processors, the noun chunks to identify a number of subjects and a number of objects in the at least one text input, wherein a first subject word of the at least one text input is marked with a subject tag when the number of subjects is more than one and a first object word is marked with an object tag, and wherein, if in the at least one text input the first object word is followed with a word having an object, then the first object word with the object tag is changed with the following word as an object tag of the at least one text input;
analyzing (210), via the one or more hardware processors, the universal dependency tag tree for the identified at least one subject to determine a token dependency of the identified at least one subject using a predefined list of tuples of each word and a universal dependence tag of each tuple using a dependency parser, wherein at least one phrase is extracted corresponding to the identified at least one subject from the determined token dependency and the extracted at least one phrase is represented in a numerical vector, and wherein the dependency parser analyzes a grammatical structure of the at least one text input, establishing a relationship between a headword and a word, which modify the headword and the dependency parser supports a universal dependency scheme where the universal dependency is a framework for a consistent annotation of grammar across different languages, and varied dependency tags are present describing relationship of each word with neighboring words, wherein after getting dependency tags and noun phrases from the input text, a rule engine is calibrated to use the dependency tags and the noun phrases to extract clauses relevant to the identified at least one subject;
quantifying (212), via the one or more hardware processors, the identified at least one subject using a pre-trained deep learning-based sentiment analyzer and a predefined class score of the at least one subject, wherein a probability score is obtained against the identified at least one subject, wherein the deep learning-based sentiment analyzer is pre-trained on a dataset using a convolutional neural network (CNN) model and one or more noun phrases of each subject are converted to word embedding and are given to the pre-trained deep learning-based sentiment analyzer for sentiment determination, wherein sentiment of the each subject in the at least one text input is quantified separately and an overall sentiment of the at least one text input is quantified, wherein the sentiment is quantitatively measured at a subject level along with at a text input level, wherein input to the CNN model is the extracted at least one phrase corresponding to the identified at least one subject and output of the CNN model is a list of probability scores for each sentiment class, wherein the CNN model comprises an input layer, an embedding layer, a plurality of convolution layers, a pooling layer and a dense layer, wherein the input layer takes a sequence of tokens of the received at least one text input as input to the CNN model, wherein the embedding layer is a look-up table, wherein each tokenized word of the received at least one text input is mapped to a trainable feature vector, wherein the plurality of convolution layers comprises of a set of independent learnable filters and each of the learnable filters is independently convolved with an embedding matrix producing different feature maps, and wherein the pooling layer operates on each feature map independently;
calculating a sentiment score of to the quantified subject using an equation:
i = 1 5 Si * Pi
wherein Si is the predefined class score ranging from −1.0 to 1.0 and Pi is the probability score given by the CNN model at an output layer, and wherein the sentiment score is computed for the clause and the sentiment score calculation is repeated for subsequent clause extracted corresponding to the identified at least one subject and once the sentiment score for each phrase is quantified, the corresponding subject of each phrase is classified as a positive subject or a negative subject of the at least one text input; and
recommending (214), via the one or more hardware processors, the sentiment score to the quantified subject along with a sentiment score to the received text input based on the quantified at least one subject and the probability score of the identified at least one subject.
2. The processor-implemented method (200) of claim 1, wherein an index is prepared in presence of more than one subject in the received text input, using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input.
3. The processor-implemented method (200) of claim 1, wherein a first adjective is considered as an object of the received text input in absence of at least one object in the received text input.
4. The processor-implemented method (200) of claim 1, wherein the noun chunks comprise the identified at least one subject and at least one object of the received text input.
5. The processor-implemented method (200) of claim 1, wherein a predefined sentiment class is assigned using the recommended sentiment score to the quantified at least one subject and the sentiment score to the received text input.
6. A system (100) comprising:
at least one memory (102) storing a plurality of instructions;
one or more hardware processors (104) communicatively coupled with the at least one memory (102), wherein the one or more hardware processors (104) are configured to:
receive at least one text input having at least one subject and at least one object;
tokenize the received at least one text input using a predefined word delimiter;
tag each word of the tokenized at least one text input based on a part-of-speech (POS) and a universal dependency tag, wherein a universal dependency tag tree is identified from the each tagged word with the universal dependency tag;
identify at least one subject and at least one object from the tagged each word of the at least one text input using a subject-verb-object (SVO) detection model, wherein the identified at least one subject and the at least one object is classified in noun chunks;
analyze the noun chunks to identify a number of subjects and a number of objects in the at least one text input, wherein a first subject word of the at least one text input is marked with a subject tag when the number of subjects is more than one and a first object word is marked with an object tag, and wherein, if in the at least one text input the first object word is followed with a word having an object, then the first object word with the object tag is changed with the following word as an object tag of the at least one text input;
analyze the universal dependency tag tree for the identified at least one subject to determine a token dependency of the identified at least one subject using a predefined list of tuples of each word and a universal dependence tag of each tuple using a dependency parser, wherein at least one phrase is extracted corresponding to the identified at least one subject from the determined token dependency and the extracted at least one phrase is represented in a numerical vector, and wherein the dependency parser analyzes a grammatical structure of the at least one text input, establishing a relationship between a headword and a word, which modify the headword and the dependency parser supports a universal dependency scheme where the universal dependency is a framework for a consistent annotation of grammar across different languages, and varied dependency tags are present describing relationship of each word with neighboring words, wherein after getting dependency tags and noun phrases from the input text, a rule engine is calibrated to use the dependency tags and the noun phrases to extract clauses relevant to the identified at least one subject;
quantify the identified at least one subject using a pre-trained deep learning-based sentiment analyzer and a predefined class score of the at least one subject, wherein a list of probabilities is obtained against the identified at least one subject, wherein the deep learning-based sentiment analyzer is pre-trained on a dataset using a convolutional neural network (CNN) model and one or more noun phrases of each subject are converted to word embedding and are given to the pre-trained deep learning-based sentiment analyzer for sentiment determination, wherein sentiment of the each subject in the at least one text input is quantified separately and an overall sentiment of the at least one text input is quantified, wherein the sentiment is quantitatively measured at a subject level along with at a text input level, wherein input to the CNN model is the extracted at least one phrase corresponding to the identified at least one subject and output of the CNN model is a list of probability scores for each sentiment class, wherein the CNN model comprises an input layer, an embedding layer, a plurality of convolution layers, a pooling layer and a dense layer, wherein the input layer takes a sequence of tokens of the received at least one text input as input to the CNN model, wherein the embedding layer is a look-up table, wherein each tokenized word of the received at least one text input is mapped to a trainable feature vector, wherein the plurality of convolution layers comprises of a set of independent learnable filters and each of the learnable filters is independently convolved with an embedding matrix producing different feature maps, and wherein the pooling layer operates on each feature map independently;
calculate a sentiment score to the quantified subject using an equation:
i = 1 5 Si * Pi
wherein Si is the predefined class score ranging from −1.0 to 1.0 and Pi is the probability score given by the CNN model at an output layer, and wherein the sentiment score is computed for the clause and the sentiment score calculation is repeated for subsequent clause extracted corresponding to the identified at least one subject and once the sentiment score for each phrase is quantified, the corresponding subject of each phrase is classified as a positive subject or a negative subject of the at least one text input; and
recommend the sentiment score to the quantified subject along with a sentiment score to the received text input based on the quantified at least one subject and the probability score of the identified at least one subject.
7. The system (100) of claim 6, wherein an index is prepared in presence of more than one subject in the received text input, using the universal dependency tag tree for each identified subject and corresponding at least one object of the received text input.
8. The system (100) of claim 6, wherein a first adjective is considered as an object of the received text input in absence of at least one object in the received text input.
9. The system (100) of claim 6, wherein a predefined sentiment class is assigned using the recommended sentiment score to the quantified subject and the sentiment score to the received text.
10. The system (100) of claim 6, wherein the noun chunks comprise the identified at least one subject and at least one object of the received text input.
11. A non-transitory computer readable medium storing one or more instructions which when executed by a processor on a system, cause the processor to perform method comprising:
receiving, via one or more hardware processors, at least one text input having at least one subject and at least one object;
tokenizing, via the one or more hardware processors, the received at least one text input using a predefined word delimiter;
tagging, via the one or more hardware processors, each word of the tokenized at least one text input based on a part-of-speech (POS) and a universal dependency tag, wherein a universal dependency tag tree is identified from the each tagged word with the universal dependency tag;
identifying, via the one or more hardware processors, at least one subject and at least one object from the tagged each word of the at least one text input using a subject-verb-object (SVO) detection model, wherein the identified at least one subject and the at least one object is classified in noun chunks;
analyzing, via the one or more hardware processors, the noun chunks to identify a number of subjects and a number of objects in the at least one text input, wherein a first subject word of the at least one text input is marked with a subject tag when the number of subjects is more than one and a first object word is marked with an object tag, and wherein, if in the at least one text input the first object word is followed with a word having an object, then the first object word with the object tag is changed with the following word as an object tag of the at least one text input;
analyzing, via the one or more hardware processors, the universal dependency tag tree for the identified at least one subject to determine a token dependency of the identified at least one subject using a predefined list of tuples of each word and a universal dependence tag of each tuple using a dependency parser, wherein at least one phrase is extracted corresponding to the identified at least one subject from the determined token dependency and the extracted at least one phrase is represented in a numerical vector, and wherein the dependency parser analyzes a grammatical structure of the at least one text input, establishing a relationship between a headword and a word, which modify the headword and the dependency parser supports a universal dependency scheme where the universal dependency is a framework for a consistent annotation of grammar across different languages, and varied dependency tags are present describing relationship of each word with neighboring words, wherein after getting dependency tags and noun phrases from the input text, a rule engine is calibrated to use the dependency tags and the noun phrases to extract clauses relevant to the identified at least one subject;
quantifying, via the one or more hardware processors, the identified at least one subject using a pre-trained deep learning-based sentiment analyzer and a predefined class score of the at least one subject, wherein a probability score is obtained against the identified at least one subject, wherein the deep learning-based sentiment analyzer is pre-trained on a dataset using a convolutional neural network (CNN) model and one or more noun phrases of each subject are converted to word embedding and are given to the pre-trained deep learning-based sentiment analyzer for sentiment determination, wherein sentiment of the each subject in the at least one text input is quantified separately and an overall sentiment of the at least one text input is quantified, wherein the sentiment is quantitatively measured at a subject level along with at a text input level, wherein input to the CNN model is the extracted at least one phrase corresponding to the identified at least one subject and output of the CNN model is a list of probability scores for each sentiment class, wherein the CNN model comprises an input layer, an embedding layer, a plurality of convolution layers, a pooling layer and a dense layer, wherein the input layer takes a sequence of tokens of the received at least one text input as input to the CNN model, wherein the embedding layer is a look-up table, wherein each tokenized word of the received at least one text input is mapped to a trainable feature vector, wherein the plurality of convolution layers comprises of a set of independent learnable filters and each of the learnable filters is independently convolved with an embedding matrix producing different feature maps, and wherein the pooling layer operates on each feature map independently;
calculating a sentiment score to the quantified subject using an equation:
i = 1 5 Si * Pi
wherein Si is the predefined class score ranging from −1.0 to 1.0 and Pi is the probability score given by the CNN model at an output layer, and wherein the sentiment score is computed for the clause and the sentiment score calculation is repeated for subsequent clause extracted corresponding to the identified at least one subject and once the sentiment score for each phrase is quantified, the corresponding subject of each phrase is classified as a positive subject or a negative subject of the at least one text input; and recommending, via the one or more hardware processors, the sentiment score to the quantified subject along with a sentiment score to the received text input based on the quantified at least one subject and the probability score of the identified at least one subject.
US17/122,712 2019-12-16 2020-12-15 System and method to quantify subject-specific sentiment Active 2041-05-17 US11748573B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201921052179 2019-12-16
IN201921052179 2019-12-16

Publications (2)

Publication Number Publication Date
US20210216721A1 US20210216721A1 (en) 2021-07-15
US11748573B2 true US11748573B2 (en) 2023-09-05

Family

ID=73834305

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/122,712 Active 2041-05-17 US11748573B2 (en) 2019-12-16 2020-12-15 System and method to quantify subject-specific sentiment

Country Status (2)

Country Link
US (1) US11748573B2 (en)
EP (1) EP3839763A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11556718B2 (en) * 2021-05-01 2023-01-17 International Business Machines Corporation Altering messaging using sentiment analysis

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140129213A1 (en) * 2012-11-07 2014-05-08 International Business Machines Corporation Svo-based taxonomy-driven text analytics
US20150186790A1 (en) * 2013-12-31 2015-07-02 Soshoma Inc. Systems and Methods for Automatic Understanding of Consumer Evaluations of Product Attributes from Consumer-Generated Reviews
CN106528533A (en) 2016-11-08 2017-03-22 浙江理工大学 Dynamic sentiment word and special adjunct word-based text sentiment analysis method
CN106599933A (en) 2016-12-26 2017-04-26 哈尔滨工业大学 Text emotion classification method based on the joint deep learning model
CN106776581A (en) 2017-02-21 2017-05-31 浙江工商大学 Subjective texts sentiment analysis method based on deep learning
CN107862087A (en) 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
US10002129B1 (en) * 2017-02-15 2018-06-19 Wipro Limited System and method for extracting information from unstructured text
CN108733644A (en) 2018-04-09 2018-11-02 平安科技(深圳)有限公司 A kind of text emotion analysis method, computer readable storage medium and terminal device
US10460028B1 (en) * 2019-04-26 2019-10-29 Babylon Partners Limited Syntactic graph traversal for recognition of inferred clauses within natural language inputs
US10467344B1 (en) * 2018-08-02 2019-11-05 Sas Institute Inc. Human language analyzer for detecting clauses, clause types, and clause relationships
US20200004816A1 (en) * 2018-06-28 2020-01-02 Language Logic d.b.a. Ascribe Systems and methods for automatically detecting sentiments and assigning and analyzing quantitate values to the sentiments expressed in text
US20200159833A1 (en) * 2018-11-21 2020-05-21 Accenture Global Solutions Limited Natural language processing based sign language generation
US20200410054A1 (en) * 2019-06-27 2020-12-31 Conduent Business Services, Llc Neural network systems and methods for target identification from text
US11531998B2 (en) * 2017-08-30 2022-12-20 Qualtrics, Llc Providing a conversational digital survey by generating digital survey questions based on digital survey responses

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10282467B2 (en) * 2014-06-26 2019-05-07 International Business Machines Corporation Mining product aspects from opinion text

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140129213A1 (en) * 2012-11-07 2014-05-08 International Business Machines Corporation Svo-based taxonomy-driven text analytics
US20150186790A1 (en) * 2013-12-31 2015-07-02 Soshoma Inc. Systems and Methods for Automatic Understanding of Consumer Evaluations of Product Attributes from Consumer-Generated Reviews
CN106528533A (en) 2016-11-08 2017-03-22 浙江理工大学 Dynamic sentiment word and special adjunct word-based text sentiment analysis method
CN106599933A (en) 2016-12-26 2017-04-26 哈尔滨工业大学 Text emotion classification method based on the joint deep learning model
US10002129B1 (en) * 2017-02-15 2018-06-19 Wipro Limited System and method for extracting information from unstructured text
CN106776581A (en) 2017-02-21 2017-05-31 浙江工商大学 Subjective texts sentiment analysis method based on deep learning
US11531998B2 (en) * 2017-08-30 2022-12-20 Qualtrics, Llc Providing a conversational digital survey by generating digital survey questions based on digital survey responses
CN107862087A (en) 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
CN108733644A (en) 2018-04-09 2018-11-02 平安科技(深圳)有限公司 A kind of text emotion analysis method, computer readable storage medium and terminal device
US20200004816A1 (en) * 2018-06-28 2020-01-02 Language Logic d.b.a. Ascribe Systems and methods for automatically detecting sentiments and assigning and analyzing quantitate values to the sentiments expressed in text
US10467344B1 (en) * 2018-08-02 2019-11-05 Sas Institute Inc. Human language analyzer for detecting clauses, clause types, and clause relationships
US20200159833A1 (en) * 2018-11-21 2020-05-21 Accenture Global Solutions Limited Natural language processing based sign language generation
US10460028B1 (en) * 2019-04-26 2019-10-29 Babylon Partners Limited Syntactic graph traversal for recognition of inferred clauses within natural language inputs
US20200410054A1 (en) * 2019-06-27 2020-12-31 Conduent Business Services, Llc Neural network systems and methods for target identification from text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Explosion AI, "spaCy 101: Everything you need to know", Archived May 1, 2019, Explosion AI. (Year: 2019). *

Also Published As

Publication number Publication date
US20210216721A1 (en) 2021-07-15
EP3839763A1 (en) 2021-06-23

Similar Documents

Publication Publication Date Title
CN110008311B (en) Product information safety risk monitoring method based on semantic analysis
US10565533B2 (en) Systems and methods for similarity and context measures for trademark and service mark analysis and repository searches
US9058308B2 (en) System and method for identifying text in legal documents for preparation of headnotes
Faria et al. A domain-independent process for automatic ontology population from text
WO2018028077A1 (en) Deep learning based method and device for chinese semantics analysis
CN110209805B (en) Text classification method, apparatus, storage medium and computer device
US20070016863A1 (en) Method and apparatus for extracting and structuring domain terms
US11232358B1 (en) Task specific processing of regulatory content
EP3057003A1 (en) Device for collecting contradictory expression and computer program for same
US9632998B2 (en) Claim polarity identification
CN112000802A (en) Software defect positioning method based on similarity integration
US11748573B2 (en) System and method to quantify subject-specific sentiment
US11625536B2 (en) System and method for identification and profiling adverse events
Zanuz et al. Fostering judiciary applications with new fine-tuned models for legal named entity recognition in portuguese
Kramer et al. Improvement of a naive Bayes sentiment classifier using MRS-based features
US20220336111A1 (en) System and method for medical literature monitoring of adverse drug reactions
Reddy et al. Classification of user’s review using modified logistic regression technique
CN115269833A (en) Event information extraction method and system based on deep semantics and multitask learning
Pallavi et al. HITS@ FIRE task 2015: Twitter based Named Entity Recognizer for Indian Languages.
Naik et al. An adaptable scheme to enhance the sentiment classification of Telugu language
Marques-Lucena et al. Framework for customers’ sentiment analysis
Lai et al. An unsupervised approach to discover media frames
Hirsch et al. Detecting non-natural language artifacts for de-noising bug reports
CN113761094A (en) Construction method, system, device and storage medium of geological disaster affair map
KR20200088164A (en) Methods for performing sentiment analysis of messages in social network service based on part of speech feature and sentiment analysis apparatus for performing the same

Legal Events

Date Code Title Description
AS Assignment

Owner name: TATA CONSULTANCY SERVICES LIMITED, INDIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GUNTURI, SITARAMA BRAHMAM;SURA, PRANAVI;SINGH, BRAJESH;SIGNING DATES FROM 20191206 TO 20191210;REEL/FRAME:054765/0774

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STCF Information on status: patent grant

Free format text: PATENTED CASE