US20130073480A1 - Real time cross correlation of intensity and sentiment from social media messages - Google Patents
Real time cross correlation of intensity and sentiment from social media messages Download PDFInfo
- Publication number
- US20130073480A1 US20130073480A1 US13/427,833 US201213427833A US2013073480A1 US 20130073480 A1 US20130073480 A1 US 20130073480A1 US 201213427833 A US201213427833 A US 201213427833A US 2013073480 A1 US2013073480 A1 US 2013073480A1
- Authority
- US
- United States
- Prior art keywords
- sentiment
- time series
- social media
- series
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims abstract description 86
- 230000014509 gene expression Effects 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 10
- 230000001419 dependent effect Effects 0.000 claims description 8
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 31
- 239000010931 gold Substances 0.000 description 31
- 229910052737 gold Inorganic materials 0.000 description 31
- 239000000470 constituent Substances 0.000 description 27
- 230000001755 vocal effect Effects 0.000 description 15
- 238000004364 calculation method Methods 0.000 description 14
- 239000003921 oil Substances 0.000 description 14
- 235000019198 oils Nutrition 0.000 description 14
- 238000013459 approach Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 12
- 101150026173 ARG2 gene Proteins 0.000 description 10
- 101100005166 Hypocrea virens cpa1 gene Proteins 0.000 description 10
- 101100379633 Xenopus laevis arg2-a gene Proteins 0.000 description 10
- 101100379634 Xenopus laevis arg2-b gene Proteins 0.000 description 10
- 101150088826 arg1 gene Proteins 0.000 description 10
- 238000005065 mining Methods 0.000 description 10
- 230000007935 neutral effect Effects 0.000 description 10
- 238000003058 natural language processing Methods 0.000 description 9
- 239000003607 modifier Substances 0.000 description 8
- 230000008901 benefit Effects 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 7
- 241001074085 Scophthalmus aquosus Species 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000007423 decrease Effects 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000006855 networking Effects 0.000 description 4
- 239000010779 crude oil Substances 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000011179 visual inspection Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 241000699666 Mus <mouse, genus> Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000005267 amalgamation Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000004836 empirical method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000037406 food intake Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 235000008390 olive oil Nutrition 0.000 description 1
- 239000004006 olive oil Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000009290 primary effect Effects 0.000 description 1
- 238000010926 purge Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 235000012424 soybean oil Nutrition 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- PICXIOQBANWBIZ-UHFFFAOYSA-N zinc;1-oxidopyridine-2-thione Chemical class [Zn+2].[O-]N1C=CC=CC1=S.[O-]N1C=CC=CC1=S PICXIOQBANWBIZ-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/06—Asset management; Financial planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Definitions
- the present invention relates a method and system using social media for real-time event driven trading of equities, commodities and other traded assets.
- Sentiment analysis applies various analytical techniques in identifying subjective information from different information sources. Sentiment analysis, therefore, attempts to ascertain the feelings, thoughts, attitude, opinion, etc. of a speaker or a writer with respect to a topic.
- the first approach in particular, a so called “bag of words” approach, attempts to apply a positive/negative document classifier based on occurrence frequencies of the various words in a document. Applying this approach various learning methods can be used to select or weight different parts of the text used in the classification process.
- This approach fails to process the sentiment with respect to assets (for example, equities or commodities) in short digital messages such as tweets sent via the online social networking service Twitter.
- Semantic orientation automatically classifies words into two classes, “good” and “bad”, and then computes an overall good/bad score for the text.
- This method does not take into consideration the sentiment conveyed by parts of speech other than adjectives, including verbs, for example, to bounce, to crash, nouns, for example, a put, a call, and phrases, for example, ascending triangle, black Friday, head-and-shoulders.
- an object of the present invention to provide a method for finding patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social media messages, wherein the target represents a quantifiable property of an asset being tracked.
- the method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, s s , relating to an asset; generating a frequency time series, s f , relating to an asset; and determining a pattern based upon the sentiment time series and the frequency time series.
- It is another object of the present invention to provide a method wherein the interpolation method I, is a function of a time series s s and of a time t that is C 1 -piecewise continuous with respect to t, and such that if there exists a point (t, v) in s s , I(s s , t) v.
- FIG. 1 is a schematic overview of the present system.
- FIG. 2 is a representation of the graphical user interface in accordance with the present invention.
- FIG. 3 is a partial view of the reaction indicator.
- FIG. 4 is a graphical depiction showing the correlation of frequency and sentiment.
- FIG. 5 is a screen shot showing the ingest and processing of various assets.
- FIGS. 6A and 6B are screen shots when a moving spherical graphic object is clicked in the graphical user interface.
- FIG. 7 is a screen shot showing various moving spherical graphic objects shrinking and growing based on social media intensity thereof.
- a method and system using social media for event-driven trading are disclosed.
- the present method and system 10 use social media for the real-time evaluation of publicly traded assets, in particular, equities and commodities, using information generated through social media interactions.
- equities social comments transmitted using the social networking service Twitter
- an “asset” is considered to be a resource with economic value that an individual, corporation or country owns or controls with the expectation that it will provide future benefit.
- Assets include, but not limited to investments in equities, options, derivatives, commodities, bonds, futures, currencies, etc. It should further be appreciated that “equities” are stocks or any other securities representing an ownership interest.
- the present method and system 10 with reference to the stock market, although the application of the present invention could be extended to commodities and other asset based markets.
- the present system 10 is able to effectively predict swings in asset prices for effective and profitable trading thereof.
- the present method and system 10 provide a sentiment calculator 22 that employs natural language processing in evaluating social media interactions by anticipating the sentiment of traders relating to specific equities and commodities in terms of the polarity of the sentiment and the strength of the sentiment.
- the data generated by the sentiment calculator 22 is applied to a reaction indicator 31 in the form of a graphical user interface 30 that combines sentiment and frequency (which is indicative of the intensity of the sentiment) data relating to the assets.
- sentiment and frequency are fully appreciated, the present system 10 and method provide a mechanism for cross-correlating the sentiment and intensity data (the perceived strength of the sentiment being expressed by the social media) with the actual fluctuations occurring with the price of assets.
- the present system 10 provides for the processing of social media messages generating data for the real-time evaluation of publicly traded assets, for example, stocks.
- the system 10 includes an ingest component 11 for ingesting the social media messages; a filter module 14 eliminating expressions not considered useful language from social media messages; a natural language processor (NLP) 16 processing filtered social media messages; a sentiment calculator 22 applying rules to the filtered and NLP processed social media messages so as to compute a representation of values associated with the filtered and NLP processed social media messages; and a graphical user interface 30 displaying the values generated by the sentiment calculator 22 .
- NLP natural language processor
- the ingest component 11 consumes, acquires or gathers a wide range of social media messages 12 and immediately filters the messages as will be explained below in greater detail.
- the ingest component 11 is a data acquisition module.
- the ingest component 11 allows the system 10 to automatically import raw social media messages, for example, tweets from Twitter or other social media sites.
- the data that is, the raw social media messages, is acquired on the basis of a predefined set of keywords or combination of keywords the system 10 has been programmed to look for.
- the filtered social media messages are then subjected to natural language processing via NLP module 16 based upon lexical databases 18 , 20 of both stock specific sentiment terminology (Stock-Lex 18 ) and general, non-stock specific, sentiment terminology (Sent-Lex 20 ).
- the filtered and NLP processed social media messages are next processed by the sentiment calculator 22 and inference engine 24 .
- the sentiment calculator 22 and inference engine 24 apply information from databases 26 , 28 respectively relating to the knowledge of the stock market world and the knowledge of the world.
- the results of the sentiment calculator 22 and inference engine 24 are then presented to the user via a reaction indicator 31 in the form of a graphical user interface upon a computer monitor which displays sentiment per asset information.
- sentiment calculation is part of the present system 10 for event-driven trading using social media messages.
- the system 10 ingests content (that is, social media messages such as tweets) from one or multiple social media sources based on user-specified criteria.
- the meaning of the information conveyed by the social media messages is determined using a natural language processing (NLP) module 16 .
- NLP natural language processing
- the system 10 calculates “sentiment” and presents metrics relating thereto in real-time.
- FIG. 5 shows social media messages, for example, “tweets”, with annotations relating to the sentiment scoring for the individual tweets.
- sentiment calculations in accordance with the present invention may be used to anticipate the reaction of the traders before they act.
- the sentiment calculator 22 of the present system 10 analyzes social media messages to calculate the sentiment with respect to events pertaining to objects.
- Objects relate to assets being traded, via situations having a bearing on public sentiment and relating the value of the asset being traded (preferably on an exchange).
- object(s) refers to anything related to an asset that can be publicly traded and monitored.
- an “iphone” and the stock symbol “AAPL” are objects which relates to the asset Apple Inc. which can be publicly traded.
- the sentiment calculator 22 represents one module of the present multilayered system 10 for processing short and noisy messages such as tweets, as depicted in the schematic shown in FIG. 1 .
- Proper operation of the sentiment calculator 22 that is, sentiment calculations, requires that a filter module 14 configure input text into formats for use by the subsequent processing modules of the pipeline making up the present invention.
- the filter module 14 is composed of a set of rules (using regular expressions) created to transform the ingested social media messages into expressions without noise. Noise is considered to be elements in the message which are not part of natural language, such as hash tags, URLs, etc. Therefore, the filter module 14 functions to bring tweets as close as possible to expressions in natural language by eliminating expressions that are not considered part of current language usage.
- the filter module 14 eliminates URLs and hash tags at the periphery of the tweet, normalizes symbols and abbreviations, for example “q1
- Sentiment calculations in accordance with the present invention require that a Part of Speech (POS) Tagger 33 assign lexical categories to each of the filtered social media messages as they are broken into a stream of text, words, phrases, symbols, or other meaningful elements called tokens, that is, tokenized messages. Sentiment calculations in accordance with the present invention also require that a partial parser (PARS) 32 recover the structure of the main constituents/syntactic structures (a lemmatizer deriving the canonical form of lexical items, that is, a single or a group of words conveying a single meaning, to enable the lexical lookups) of the filtered social media messages.
- PARS partial parser
- MORPH 34 is a lemmatizer which reduces the spelling of words to its lexical root or base/lemma form.
- the base form for a verb is the simple infinitive.
- the base form for a noun is the singular form.
- the plural “mice” is a form of the lemma “mouse.”
- MORPH 34 uses a list of numerous such rules to reduce an ingested and non-filtered word to its base form.
- MorphAdorner A Java Library for the Morphological Adornment of English Language Texts”, Version 1.0. Apr. 30, 2009, Copyright ⁇ 2007, 2009 by Northwestern University, is incorporated herein by reference.
- the data composed of the filtered and NLP processed social media messages is supplied to the sentiment calculator 22 that calculates sentiment compositionally in the syntactic context.
- the process of sentiment calculation also employs an inference engine 24 that fine-tunes sentiment calculations using knowledge of the world. This process for sentiment calculation enables sentiment to be calculated on the basis of a set of rules deriving the polarity of stock events and their strength.
- tweets are constrained to 140 characters means that messages sent via Twitter begin to resemble programming languages such as Fortran (which originally had a constraint of 72 characters per line).
- the primary effect of this constraint is a limitation on the freedom available to the author of a tweet as he or she attempts to convey a specific message.
- compiling tweets (akin to compiling a programming language) and achieving very high levels of accuracy in deriving sentiment whilst minimizing resource consumption and interpretation times. It therefore becomes feasible to ingest and process potentially millions of messages per hour using Common Off The Shelf (COTS) computers.
- COTS Common Off The Shelf
- the technical advantage of the present system 10 relative to other known technologies is that the present system 10 is based on natural language processing techniques rather than machine learning techniques (for example, Naive Bayes, maximum entropy classification, and support vector machines), as described for example in Pang and Lee. Bo Pang and Lillian Lee 2002 . Thumbs up? Sentiment Classification using Machine Learning Techniques . Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 79-86.
- ENNLP Empirical Methods in Natural Language Processing
- the rule-based method that is used in accordance with the present invention avoids the shortcomings of the statistical method because it processes the social media messages directly instead of classifying social media messages on the basis of probabilistic algorithms.
- Another advantage lies in the innovative contribution of the inference engine 24 , which contributes to reduce uncertainty and brings further support to decision making.
- classification algorithms such as Na ⁇ ve Bayes classifiers
- Na ⁇ ve Bayes classifiers are not used in the implementation of the present invention.
- the present invention solves the problem by processing the actual content of the social media messages as they are formulated. It does not calculate the number of positive adjectives in a social media message, or in a set of social media messages, to compute sentiment, contrary to common practice.
- the sentiment calculator 22 is a module of the multilayered architecture employed in accordance with the present system 10 , which can be customized for different domains, including, for example, finance, security and pharmaceutics. In the disclosed application in the stock market exchange, the sentiment calculator 22 calculates sentiments from stock exchange-related social media messages 12 in order to predict stock movements before human traders can act.
- the innovation brought about by the sentiment calculation in accordance with the present invention is the event driven approach to sentiment mining.
- Unstructured incoming social media messages 12 are processed in order to extract sentiment about pre-specified assets, as they participate in ongoing events.
- the sentiment calculator 22 performs event-driven sentiment calculus.
- Equation (1) The event-driven approach to sentiment mining as applied in accordance with the present invention can be represented in accordance with equation (1), where M stands for Modifier, Ev stands for Event, and x, . . . , z stand for the participants of the event.
- M stands for Modifier
- Ev stands for Event
- x, . . . , z stand for the participants of the event.
- the asset the sentiment is about is a participant of the event.
- This relational approach to sentiment mining contrasts with the statistical keyword search approach, classifying messages on the basis of the number of positive or negative qualifiers.
- the statistical keyword search approach fails to provide sentiment-per-asset values.
- the present invention takes an event to be a change in the relation between the participants of the event.
- the participants of an event are: names, organizations, locations, expressions of time, quantities, monetary values, percentages, etc.
- the present system 10 includes name entity recognition capacities and syntax-semantic capacities to provide the articulation of events and their participants.
- the interpretation of syntactic structure is generally compositional: that is, the interpretation of the whole is a function of the interpretation of the parts. However, part of the semantics conveyed by natural language is non-compositional and idiosyncratic. The idiosyncratic meanings are listed in lexicons assuming both generic (Sent-Lex 20 ) and domain-specific (Stock-Lex 18 ) lexicons
- “sentiment” about an asset participating in an event is considered in accordance with the present invention to be the orientation (that is, the polarity in opinions expressed regarding the asset) and the strength of the opinions on that asset that deviates from the normal state.
- a sentiment is the expression of a psychological state relative to an event (whether that event be static or dynamic).
- social media messages sent via the social networking site Twitter are limited to 140 characters, lexical items, emoticons and other diacritics found in such messages cannot express the richness of thought and sentiments conveyed by traditional written natural language without further processing.
- the present system and methodology focus on the properties of natural language employed in the social media messages to calculate the sentiment with respect to given objects in ongoing stock events referred to in social media exchanges.
- sentiment is represented by an integer combining a polarity value (polarity positive +, negative ⁇ , neutral n) and a strength value ranging within a pre-defined scale.
- the sentiment calculator 22 uses data generated by the filtering and natural language processing of the social media messages, the sentiment calculator 22 yields an integer that combines the polarity and the strength values of each pair of expressions relating an asset to an event as explained below in greater detail.
- Polarity is a value (that is, positive, negative or neutral) that is part of the lexical specification of words and phrases. These values will compose according to the Polarity rules, provided below.
- Strength is an integer, that is, also part of the lexical specification of the words and phrases. The values for strength in accordance to a preferred embodiment of the present invention range from 1 to 3 (1 low and 3 is high). These numbers will be added in the processing of messages according to the Strength rules, provided below. However, it is appreciated that values for strength could range from 1 to 5 or higher.
- the sentiment calculator 22 is embedded as part of the present overall system 10 that ingests social media messages from multiple social media sources based on user-specified criteria.
- the social media messages go through a filtering layer/module 14 purging the messages of noise (URLs, hashtags, etc.).
- the results of the filtering of social media messages are tokens which are assigned part of speech tags according to the lexical and contextual properties of the lexical items based upon NLP module 16 .
- a parser 32 then recovers the major constituents/syntactic structures of the tokenized messages.
- the sentiment calculator 22 takes the annotated partial parses as its input and yields a sentiment-per-asset on the basis of the sentiment values of the lexical items and the sentiment logic, calculating the sentiment of constituents on the basis of the sentiment values of their parts.
- the sentiment calculator 22 interacts with the inference engine 24 to determine the sentiment with respect to the knowledge of the word.
- the sentiment calculator 22 derives sentiment in terms of polarity and strength with respect to objects (for example, assets as referenced by tickers and commodity names) as they participate in ongoing stock events described by the ingested social media messages, for example, tweets, 12.
- objects for example, assets as referenced by tickers and commodity names
- the generic representation in equation (1) as noted above can thus be instantiated by equation (2) for this application.
- stock-market specific lexical items and phrases are qualified in the Stock-Lex 18 , and the sentiment calculator 22 applies to pairs of sentiment-marked lexical items compositionally in their syntactic configuration.
- the sentiment calculator 22 is a module of the pipeline making up the present system 10 .
- the components of this system 10 process incoming social media messages, and yield a sentiment-per-object/asset for each ingested incoming social media message in real-time.
- the sentiment calculator 22 calculates the sentiment-per-asset for each incoming social media message ingested by the system 10 .
- FIG. 1 represents the three main components of the system: Ingest 11 (the social medial messages 12 ), Process 15 (the social media messages using the filter 14 , NLP 16 and sentiment calculator 22 ), and Display 30 (the results on the processing step on a reaction indicator 31 in the form of a graphical user interface 30 ). It also identifies the specific NLP components/modules (POS 33 , PARS 32 , MORPH 34 , Stock-Lex 18 and Sent-Lex 20 ) processing social media messages 12 from the ingest component 11 to the sentiment calculator 22 and its interaction with the inference engine 24 (which includes databases relating to Knowledge of the Stock Market world 23 and Knowledge of the world 25 ).
- Ingest 11 the social medial messages 12
- Process 15 the social media messages using the filter 14 , NLP 16 and sentiment calculator 22
- Display 30 the results on the processing step on a reaction indicator 31 in the form of a graphical user interface 30 . It also identifies the specific NLP components/modules (POS 33 , PARS 32 ,
- the architecture of the system 10 is shown in FIG. 1 .
- the following explains the main features of each component of this architecture, where the lexicon, the part of speech tagger (POS) 33 and the parser (PARS) 32 can be parameterized to process different languages.
- POS part of speech tagger
- PARS parser
- Simplex and complex keywords are used for ingesting the social media messages 12 , according to the requirements of the stock traders.
- the hardware used for ingest are standard off the shelf computers gathering and processing social media messages using the pre-determined keywords.
- the techniques used for collecting social media messages must take into consideration the requirements of stock traders, see Section 1.1, as well as they must enable the collection of social media messages with respect to specific assets, as described in Section 1.2
- the set of keywords for specific assets is defined in terms of generic categories that can be parameterized according to the finance domain.
- different strategies for ingesting a large number of relevant social media messages are used. For example, the following strategies may be employed:
- strategy (2) is used for Crude Oil, where only one keyword is used, “oil”, and a very large exclude list include expressions such as “Soya oil”, “olive oil”, etc.
- strategy (3) is preferred.
- Strategy (3) employs a large include list made up of both unary and binary expressions including the word “gold”, the object “X”, and another word, a predicate, as in “gold industry”, “gold news”, “gold investor”, “gold invest”, “gold investment”, “gold plunge”, “gold raise”, “gold plunged”, “gold raised”, “gold plunging”, “gold raising”, “gold decline”, “gold declined”, “gold declining”, “gold rally”, “gold rallied”, “gold rallying”, “gold fall”, “gold falling”, “gold fell”, etc.
- a large exclude list is still necessary to exclude for example jewelry items and colors.
- Strategy (3) can be used for other commodities by substituting names of other commodities to the variable in (3) and keeping constant the set of predicates. Thus, a very similar set of keywords may apply to other commodities.
- This technique using refined keyword strategy is used in conjunction with the methods described above to come up with a sufficiently large number of social media messages, and a high degree of correlation between derived sentiment and price movement, thereby meeting the two requirements of sentiment-price correlation and sufficient volume.
- the refined keyword strategy for ingesting relevant social media messages increases the volume of ingested social media messages that will be fed into the other components of the system, described in the following paragraphs.
- the ingested social media messages may include messages in a language other than English.
- a language identifier/detector 19 is therefore employed in identifying the language of an incoming message and assigns it a code.
- the ingested social media message (4) will be assigned the code (5), which stands for English.
- Language identification is a prerequisite for the NLP processing in accordance with the present invention, as the overt syntactic properties vary between languages, as well as the form and content of the lexical items. It is thus necessary to ensure that the social media messages processed by the NLP module 16 will be English messages, or whatever language the system 10 was parameterized for.
- the filter module 14 is a pre-NLP processing module that brings social media messages 12 as close as possible to expressions in natural language by eliminating expressions that are not part of current use of language. For example, the filter module 14 eliminates URLs and hash tags at the periphery of the tweet, normalizes symbols and abbreviations, for example q1
- the filter module 14 also performs sentence detection on the basis of typographic cues. This is a necessary step in the pre-NLP processing, since social media messages may include more than one sentence, see (6). As the NLP processing and the sentiment calculus are sentence bound, sentence boundary delimitation is necessary. For example, the filter module 14 applies to (7), replaces the URL by a period, convert capitals into lower case, and yields (8):
- the Sent-Lex 20 is a hand-crafted sentiment-based repository, or database, of the most frequent lexical items and phrases collected from the ingested social media messages, as well as from specialized vocabularies, that are indicative of sentiment.
- the lexical items and phrases vary according to the domain of application, e.g. finance, security, pharmaceutics, etc. Words that are not sentiment bearing, such as definite articles and auxiliaries, are not part of the Sent-Lex 20 .
- sentiment is associated to event denoting verbs and nouns, as well as with sentiment-bearing modifiers of events or of participants of the events.
- the lexical specifications are designed to be parameterized to specific domains of application.
- the generic format of the lexical entry includes the lexical item, followed by fields of lexical specifications.
- the first field specifies the category of the item, the second field specifies its polarity, the third field specifies its lexical strength, and the fourth field specifies the polarity of the semantic arguments of the lexical items and phrases, if applicable.
- the lexical items and phrases and their features are stored in a lexical database, that is, the Sent-Lex 20 .
- Each of the lexical items and phrases maintained in the Sent-Lex 20 is associated with a category tag, an inherent polarity value, an inherent strength value, and for some items, polarity and strength values are also associated to designated argument structure variables as in (11).
- the variable y in that verb's argument structure is associated with a positive value, as in (12), this is not the case for other verbs such a announce and report.
- Google is associated with a positive sentiment.
- the categories, nominal (NN), verbal (VB), adjectival (JJ), adverbial (RB) and their sub-categories, are intrinsically associated to polarity (+, ⁇ , n), and Strength (1, 2, 3). Furthermore, the lexical specifications differentiate degree modifiers, such as very, too and much from modifiers, such as good and better. Degree-intensifiers contribute their own lexical value, and add an extra value 1 to the category they modify, see (14) below for examples.
- the Stock-Lex 18 is the stock-based lexical repository, or database, consisting of the most frequent lexical items and phrases used in the ingested social media messages that relate to stock-based knowledge, as well as most frequent items used in stock exchange and financial news wire such as the Financial Post (or other commodity exchange system depending upon the application to which the present system 10 is applied).
- the Stock-Lex 18 thus includes a restricted set of stock-specific lexical items and phrases, associated with their domain specific polarity and strength values.
- the polarity values are: positive, negative and neutral.
- the lexical strength associated to the lexical items and phrases ranges from 1 to 3, where 1 is the lowest value and 3 is the highest value, see (15) for examples.
- the stock-specific lexical items and phrases are part of the major lexical and phrasal categories, nominal, verbal, adjectival. Only event denoting nominal and verbal expressions are part of the Stock-Lex 18 , and only stock specific adjectival and adverbial modifiers are part of the Stock-Lex 18 .
- Stock objects (tickers, company names, product names, etc.) have a neutral polarity and have no associated strength value.
- the sentiment calculator 22 derives the sentiment with respect to specific stock objects.
- the Stock-Lex 18 is a repository of the most frequent sentiment-bearing noun, verbs, adjectives and adverbs used in social media stock market-related exchanges. Each lexical item is associated with a part of speech (POS), a polarity and strength. The Stock-Lex 18 is handcrafted and contributes to the invention in providing sentiment specifications for event denoting items, and their dependents.
- the innovation is two-fold: i) it specifies sentiment values for other categories than adjectives, contrary to common practice; ii) it specifies sentiment value for event denoting lexical items and their dependents, thus providing the lexical information used by the sentiment calculator 22 for the compositional calculus of the sentiment-per-asset.
- each incoming filtered social media message is broken into a stream of text, words, phrases, symbols, or other meaningful elements called tokens, that is, tokenized messages and each token is assigned a Part Of Speech (POS) by the POS tagger 33 .
- POS Part Of Speech
- Brill Tagger that is, a known methodology for performing part of speech tagging, is used, as it is sensitive to the lexical properties and distributional properties of lexical items and phrases in natural languages. It is appreciated Brill Tagger is an “error-driven transformation-based tagger”. Brill Tagger is error-driven in the sense that it recourses to supervised learning transformation and Brill Tagger is based in the sense that a tag is assigned to each word and changed using a set of predefined rules.
- the POS tagger 33 is necessary in accordance with the present system 10 to identify the lexical items that contribute to the sentiment calculus, namely adjectives (JJ), adverbs (RB), as well as event denoting verbs (e.g., to upgrade) and nouns, e.g. (e.g., an upgrade).
- JJ adjectives
- RB adverbs
- event denoting verbs e.g., to upgrade
- nouns e.g., an upgrade
- the POS Tagger 33 applies to the ingested filtered social medial messages, tokenizes the string and assigns part of speech to the tokens on the basis of a set of lexical and contextual rules, accounting for the distribution of categories in natural language texts.
- Brill Tagger applies to (18) and derives the annotated tokenized string in (19).
- the tokenized and POS annotated messages resulting from the POS tagger 33 are fed to a partial parser 32 that recovers the main syntactic constituents of the social media messages.
- the partial sparser 32 employs a Cass parser, Abney's cascaded FST (Finite State Transducer), to recover the main syntactic constituents of the basis of the tokenized and POS annotated representations of social media messages, as illustrated in (20).
- Partial parsing is designed for use with large amounts of noisy text. Robustness and speed are primary design considerations. Not all NLP applications require a complete syntactic analysis. Partial parsing is used in information retrieval as well as information extraction applications, such as facts and sentiment mining, where finding simple nominal and verbal constituents is enough. Full parser provides more information than needed, and when expected information is missing, as it is generally the case in social media messages, where syntactic reductions and truncation are necessary to convey meaning within limited character constraints, for example, 140 characters when considering tweets using Twitter.
- the leaves of the parse tree are associated with their sentiment values via access to Stock-Lex 18 and the sentiment calculator 22 applies to the resulting semantically annotated tree.
- the main properties on the calculator are described in the following section.
- a sentiment is an integer, which can be either positive or negative, computed on the basis of the application of the rules of the sentiment calculus to pairs of lexical items in their local syntactic context; for example, nouns (that is, nominal lexical items) representing assets and nouns/verbs/adjectives (that is, nominal, verbal or adjectival lexical items) representing sentiment in the form of polarity and strength.
- the computed sentiment value ranges within a pre-established scale.
- the sentiment calculator 22 uses social media messages for the real-time evaluation of publicly traded equities and commodities wherein a sentiment is a positive or negative integer computed based upon pairs of lexical items in local syntactic context. In its most basic components the sentiment calculator employs a mechanism for determining lexical polarity in social media messages and a mechanism for determining a strength value of lexical items and phrases used in social media messages.
- the sentiment calculus employed by the sentiment calculator 22 applies to the output of the annotated Cass tree produced by the partial parser 32 . It compositionally derives the sentiment associated to entities in the event denoted by the expression they are part of.
- the sentiment logic is a compositional calculus deriving the sentiment value of a relation on the basis of the sentiment values of its parts.
- the sentiment logic calculates sentiment values per asset with respect to stock market events described by the incoming social media messages. Namely, it calculates the sentiment with respect to given assets, as they occur is stock events.
- the social media messages relating to an asset are gathered by a set of keywords used for ingesting the social media messages.
- the sentiment calculus is based on the lexical polarity and strength value of the lexical items and phrases defined in the Stock-Lex 18 and how they are syntactically organized in the Cass tree.
- the maximal local domain for the application of the calculus is the sentence; the minimal local domain is the smallest constituents including the keywords standing for the asset.
- the sentiment calculus applies locally to the constituents including the asset within the sentences of the message.
- the Cass parser derives the syntactic constituents of the sentences, including the adjectival (cx), as well the nominal (nx) and the verbal (vx) constituents.
- a head of a constituent is a lexical item, such as a verb, e.g., hit, or a noun, e.g., acquisition, that makes the constituent it is part verbal (vx) or nominal (nx).
- a head selects a complement, which is a syntactic constituent such as a nominal phrase, e.g. the market in hit the market, and AAPL in the acquisition of AAPL.
- a modifier is an adjective or an adverb that modifies another constituent, a nominal constituent in the first case and a verbal constituent in the other case, e.g., strong market and strongly hit the market.
- the subject-predicate relation is the relation between a subject, generally a nominal constituent and a predicate, generally a verbal constituent, e.g., in the sentence AAPL hits the market, AAPL is the subject and hit the market is the predicate.
- the sentiment calculus includes separate rules for calculating the polarity and the strength. They have the generic form of dyadic operators (Op (arg1, arg2)), and their specific form is dependent on the relation between arg1 and arg2, as well as the lexical polarity and strength values of the lexical items and phrases specified in the Stock-Lex 18 .
- the rule applies locally in syntactic constituents/domains, e.g., nx, vx, cx, etc. It derives the strength of constituents on the basis of the strength of their parts and how they are syntactically related by the application of an arithmetic operation to the pair of arguments depending on the nature of the syntax-semantic relation and the polarity of the constituents.
- the strength rules apply to the lexical items and phrases in the three universal syntactic relations, and the strength is calculated on the basis of elemental arithmetic operations.
- the Strength rules include the following:
- social media messages may include more than one sentence, may talk about more than one asset, more than one stock event, and they may express more than one sentiment.
- the sentiment calculator 22 is sentence bound. Moreover it calculates sentiment in the local syntax-semantic domain of an asset. Thus, it ensures that the specific sentiment with respect to a given asset conveyed by a message is calculated. It applies iteratively in the local domain of the constituent including the asset (keyword, set of keywords), e.g. OIL, or GOLD, and the expression of a stock event (e.g., “lose”, “gain”, “sell”, “buy”) or a sentiment (e.g., “high”, “low”).
- a stock event e.g., “lose”, “gain”, “sell”, “buy”
- a sentiment e.g., “high”, “low”.
- the following trace for the tweet (26) illustrates the application of the sentiment calculator 22 that calculates sentiment-per-asset in the local domain of the targeted asset: Oil.
- the calculus assigned the value +3 to Oil, discarding the value of the computation for Canadian dollar, which is ⁇ 5.
- This example shows that every step of the computation by the modules of the system provides the structure for the application of the sentiment calculus.
- This calculus applies in local syntactic domains and provides an integer that represents the sentiment (polarity and strength) with respect to designated assets.
- Inference engine 24 is part of expert systems, which are designed to process a problem expressing an uncertainty with respect to a decision, and to provide a decision, or a set of decisions reducing the uncertainty. Inference engine 24 attempts to provide an answer to a problem, or clarify uncertainties where normally one or more human experts would need to be consulted.
- the inference engine 24 of the present system 10 is part of the pipeline and provides a mechanism to sharpen the accuracy of the sentiment computed, by bringing both knowledge of the stock market world 23 and knowledge of the world 25 into the computation.
- the inference engine 24 includes a data structure, and a set of inference rules (if X then Y) relating facts to sentiments. This knowledge interacts with the domain-specific knowledge stored in the lexicon and used by the sentiment calculator 22 .
- the inference engine 24 includes a data structure, a knowledge base that uses some knowledge representation structure to capture the knowledge of a specific domain, for example a relational table relating entities in knowledge domains, and a set of inference rules applying to the entities in the relational table and drawing consequences.
- a relational table relating entities in knowledge domains
- inference rules use reasoning, which more closely resembles human reasoning.
- the knowledge base consists of a relational table relating stock entities (tickers, company names, products, etc.), stock events (e.g., upgrade, downgrade) and facts, extracted from news wire.
- the rules of the inference engine 24 apply to the elements of the relational table and infer sentiment values.
- the knowledge base includes (28) below, and the inference rules (29) below, stating that if gas oil (at the pump) is inferior to $3 then the sentiment value is positive, +2, if the gas oil is superior to $3 then the sentiment value is negative, ⁇ 2.
- This real world knowledge varies according to time and place.
- the sentiment calculator 22 alone would not derive the negative sentiment associated to the second sentence in (27). While the sentiment calculator 22 assigns the value neutral to questions, the inference engine 24 assigns the sentiment value of ⁇ 2.
- the inference engine 24 ensures that the sentiment is grounded in the real world. It contributes to the innovative technology, which leads to both simplify and sharpen decision taking in stock market transactions.
- Sentiment calculations in accordance with the present system 10 are a result of the pipeline or multilayered embodied by the present invention that ingests social medial messages, identifies the language of the social media messages, and filters them from elements that are not part of natural language for which the system 10 has been parameterized (here English).
- the POS tagger 33 and the partial parser 32 modules of the NLP processor 16 assign parts of speech to the tokens of sentences, and recover the structure they are part of.
- the sentiment calculus of the sentiment calculator 22 applies to the annotated structures and derives the sentiment value per asset based on the sentiment value of the event they are part of.
- the inference engine 24 reduces uncertainly by relying on a relational database including knowledge of the world information and a set of inference rules.
- the present sentiment calculation system includes computer implemented mechanism for obtaining and converting ingested unstructured social media messages regarding a plurality of objects/assets being tracked into a sentiment value for each object/assets.
- the sentiment value includes a polarity value and strength value derived from a natural language processing algorithm containing a database of lexical items and phrases related to the objects being tracked.
- the precise sentiment value per object is derived by the compositional calculus based on the sentiment values of lexical items (and phrases) and their syntactic organization.
- the contextual sentiment value is based on the inference engine 24 deriving a sentiment value with respect to knowledge of the world.
- the interaction of the sentiment calculus and the inference engine 24 yields accurate sentiment in real-time.
- the sentiment cognitive-based calculus relates conceptual processing with natural language processing algorithm.
- the data generated by the sentiment calculator 22 is applied to a graphical user interface 30 that combines sentiment and intensity data relating to the assets.
- the graphical user interface 30 includes moving graphic objects displayed upon a monitor that depict social media market sentiment; a timeline slider object 46 ; and a vertical bar chart object 44 .
- the graphical user interface 30 provides for the visualization of graphic objects in the form of moving spheres 40 where the sphere size and color depict social media market sentiment.
- the moving spherical graphic objects 40 shrink and grow based on intensity changes.
- the sphere color changes based on social media sentiment polarity.
- the center sphere 40 a represents the weighted sentiment average. Clicking one of the moving spherical graphic objects 40 results in the display of a chart 42 (see FIGS. 6A & 6B ) graphing (based on what the trader selects) all or a choice of price, volume, social media frequency, social media sentiment, cross-correlation and a variety of price and sentiment derived technical indicators.
- Sphere updates are based on a configurable polling time.
- the graphical user interface 30 contains a time slider 46 to go back to a point in time and replay history.
- a vertical bar chart 44 graphs the social media sentiment when the graphical user interface 30 is in full screen mode.
- reaction indicator 31 The purpose of the reaction indicator 31 is to provide a mechanism wherein hundreds of assets can be tracked, but only those that are “interesting” based on preprogrammed parameters will float to the surface and draw the viewer's attention.
- the reaction indicator 31 provides a graphical user interface 30 displaying three graphical areas of objects, moving spherical graphic objects 40 , a timeline slider object 46 and a vertical bar chart object 44 .
- the moving graphic objects may take shapes other than spheres, such as squares. Referring to FIG. 2 , the spherical moving graphical objects are represented at 40 , the timeline slider object at 46 and the vertical bar chart object at 44 .
- the reaction indicator polls a data stream containing mathematically computed values for social media intensity, social media sentiment, social media frequency, social media weighted average frequency and social media weighted average sentiment auto refreshing the moving spherical graphic objects 40 and the vertical bar chart object 44 based on a configurable polling time. Intensity is defined as the ratio of short term frequency divided by long term frequency.
- the mathematical computations for the data stream are calculated by an algorithm discussed herein in detail in a section related to cross correlation. The calculations are based upon information obtained from a multilayer pipeline architecture previously discussed.
- the moving spherical graphic objects 40 shrink and grow based on the social media intensity attribute and are sized relative to each other taking into consideration the stage size and browser screen resolution.
- the color of the moving spherical graphic object 40 is based on social media sentiment polarity where polarity is defined as negative, neutral or positive.
- Each of the moving spherical graphic objects 40 displays a label, social media sentiment and social media frequency.
- the center sphere 40 a object visualizes a weighted average of all sphere objects based on weights assigned to the spheres.
- the weighted sphere object is represented at 40 a .
- the weighted average sphere size is static relative to the other sphere objects, which shrink and grow, and displays weighted average social media sentiment and weighted average social media frequency, if sphere weights have been assigned. If sphere weights have not been assigned, the weighted average sphere object does not display any data.
- the weighted average sphere object does not change color to reflect social media sentiment polarity.
- An example where weights may play a role is in the instance where the visualization represents an Exchange Traded Fund (ETF).
- ETF holds assets such as stocks, commodities or bonds.
- the assets would be represented in the spheres.
- the weight for each asset assigned would represent the percentage in the ETF for an amalgamation of all assets.
- the timeline slider object 46 visualizes a timeline where the date and time on the left represent the earliest date and time where data exists for the collection of moving spherical graphic objects 40 .
- the date and time on the fight represents current date and time.
- Moving to various points on the timeline slider object 46 move the moving spherical graphic objects 40 and the vertical bar chart object 44 to a point in time, pausing the real-time display, then replaying history. From the historical point in time selected, the moving spherical graphic object 40 and the vertical bar chart object 44 will poll the data stream coming from the sentiment calculator 22 for social media intensity, social media sentiment, social media frequency, social media weighted average frequency and social media weighted average sentiment from the point in time selected then rerun history as if it were happening real-time. Referring to FIG. 2 , the timeline slider object is represented at 46 .
- the vertical bar chart object 44 utilizes the same data stream as the moving spherical graphic objects 40 to graph social media frequency, using the same color scheme as the spherical objects. Referring to FIG. 2 , the vertical bar chart object is represented at 44 .
- Clicking on a moving spherical graphic object 40 will launch a chart, graphing price, volume, social media sentiment, social media frequency, and cross-correlation auto refreshing based on a configurable time, e.g. every second as seen in the screen shots depicted in FIGS. 6A and 6B .
- Each of the moving spherical graphic objects 40 display a symbol, such as an exclamation mark within the sphere, preferably in the center, when an alert has been triggered.
- a trigger will result when sentiment and intensity variables cross certain thresholds, the related moving spherical graphic object shall display an exclamation mark, signaling a potential trading opportunity; for example, when the sentiment and intensity for a given asset A exceeds a preprogrammed value indicating sell.
- An exclamation mark will be displayed in the center of sphere A alerting the operator to take action.
- the operator shall have the option of directly executing a trade via a combination of keyclicks.
- the operator can program the reaction indicator 31 to automatically place a trade.
- the operator can program the reaction indicator 31 to send an alert via e-mail or text message.
- the reaction indicator 31 comprises a plurality of moving graphic objects 40 which change size and color based upon social media market sentiment, intensity and frequency captured and correlated in real-time from a stream of online social media messages related to a market segment.
- the moving spherical graphic objects 40 shrink or grow in size based upon the social media intensity attributed to each moving spherical graphic object 40 and the moving spherical graphic objects 40 change color based upon whether the social media sentiment attributed to each moving spherical graphic object is positive, negative or neutral.
- the reaction indicator 31 also provides a weighted average of all displayed moving spherical graphic objects 40 displayed based on weights assigned to the objects prior to capturing social media streams is displayed among the plurality of displayed objects.
- the present system and method provides a mechanism for cross-correlating the sentiment and intensity data with the actual fluctuations with asset prices.
- the present invention provides two methods to find patterns in a target real-valued time series by utilizing two other real-valued time series derived from a stream of social-media messages (Twitter for instance): sentiment and frequency.
- the series used to find patterns in the target are called predictive.
- the patterns can be depicted graphically on charts, together with the time series, to be used as a decision making tool.
- the patterns can also serve as the input to an automated trading system to generate trading signals.
- the curves are a depiction of the sentiment time-series for the target (thick curve labeled s s ) and the sentiment-frequency time series (thin curve s f ). The calculation of the sentiment-frequency series will be described later.
- the method of the present invention finds patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social-media messages, wherein the target represents a quantifiable property of an asset being tracked.
- the method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, s s (which is plotted); generating a frequency time series plot, s f (which is plotted); and determining a pattern based upon the sentiment time series and the frequency time series.
- a real-valued time series is defined as a sequence of pairs (time (t), value (s)), also called points, ordered by increasing time.
- time (t), value (s) also called points, ordered by increasing time.
- a simple time series could look like this: [(12:36,27),(13:03,37),(16:34,88)].
- T s F ( ⁇ , that is the set of finite subsets of ⁇ , whose elements are endowed with the total order ⁇ :((t,s), (t′,s′)) ⁇ ( ⁇ t ⁇ t′ ⁇ true, false ⁇ .
- each series s ⁇ T s is naturally mapped to the vector V(s) ⁇ ( ⁇ ) #(s) such that v i is the i th element of s.
- the vector of first components will be denoted by V 1 (s) and the vector of second components V 2 (s).
- V ( s ) [(12:36,27),(13:03,37),(16:34,88)]
- V 1 ( s ) [12:36,13:03,16:34]
- V 2 ( s ) [27,37,88].
- a semantic distinction is drawn between pulsated time series where points represent a punctual event (i.e., sequence of Diracs), such as the arrival of a message, and sampled time series that represent a discretization of a function that's defined at all times, such as the market price. It is thus natural to interpolate points of a sampled time series to try and recover the original function it was sampled from.
- a punctual event i.e., sequence of Diracs
- the target is an arbitrary sampled real-valued time series.
- the algorithm has been applied with prices as target.
- the sentiment time series s s is generated by the Natural Language Processing (NLP) module 16 . It is a pulsated time series. For each message in the input stream, the sentiment time series contains a pair whose time is the time when the message was posted, and whose value is the result of the NLP processor 16 . This value is called sentiment.
- NLP Natural Language Processing
- the frequency time series s f depends on two parameters: the sentiment time series and a positive number w representing a time called window size. It is a pulsated time series. For each point (t, s) in the sentiment series, the frequency series contains a point (t, f) where f is the number of points in the sentiment series in the time range [t ⁇ w, t], divided by w. This number f is called frequency.
- a pattern P is defined as a cross-correlation c in [ ⁇ 1,1], a positive window size w, a time lag l, and a time t s . These numbers are interpreted as “the predictive series over [t s ⁇ w, t s ] correlates to the target series over [t s ⁇ w+l, t s +l] with a cross-correlation of c”.
- a pattern is thus an element of [ ⁇ 1,1] ⁇ ⁇ ⁇
- the lag is positive, it is said to be predictive.
- the cross-correlation determines the relevance of the pattern: the higher it is, the more relevant the pattern is considered.
- the method is called the sentiment-frequency method. It uses the sentiment to create a sentiment-frequency series, and correlates the latter to the target using a plain statistical cross-correlation. It then identifies patterns by finding the optimal time lag.
- the system first creates an average sentiment series s a such that for every point (t,s) in the sentiment time series s s there is a point (t, a) in the average sentiment series where a is the arithmetic average of all the sentiments in the time range, or interval [t ⁇ w, t].
- s sf ⁇ ( t,f a )
- the series correlator produces a set of patterns based on a real-valued pulsated time series s p , a real-valued sampled time series s s , an interpolation method I for s s , and a window size w.
- Interpolation is a classical subject and it will not be described here. Common interpolation methods are linear or cubic splines.
- E s (s s , s p ,t,l) For any time t and lag l, we defined the vector E s (s s , s p ,t,l) so that for every (t p ,p) in s p with t p in [t ⁇ w,t], E s (s s ,s p ,t,l) contains the point i(s s ,t p +l).
- E s (s s , s p ,t,l) the interpolated.
- the system also defines the vector E p (s p ,t) so that for every (t p ,p) in s p with t p in [t ⁇ w,t], E p (s p , t) contains the point p.
- the cross-correlation CC(s s , s p , t, l) is defined as the scalar product of E p (s p ,t) and E s (s s ,s p , t, l) divided by the product of their norms.
- CC(s s ,s p ,t,l) has a finite set of local maximums. There are many methods to find local maximums. One possible method is to use a gradient method on points spread evenly on the time interval that the series covers.
- the local maximums of CC t :l ⁇ CC(s s ,s p ,t,l) simply move linearly with t when no points of s p leaves or enters [t ⁇ w,t].
- the sets of local maximums of CC t for t or (t ⁇ w) the time of a point in s p is a finite set that represents completely the set of local maximums of CC t for all t.
- the system For every w, the system computes a finite set of times t and lags l and a cross-correlation c for each of them. This defines a finite set of patterns (c, w, t, l) which the system orders by relevance.
- the system for sentiment, intensity cross-correlation provides for time-based cross-correlation between the real-time sentiment value and frequency of a message stream relative to an object and a quantifiable property of that object.
- the time correlation relates patterns in the sentiment and frequency to patterns in the object property.
- the cross-correlation system further includes graphical depictions showing relations identified by the patterns between the object property and the sentiment, frequency, and any quantity derived from them.
- the cross-correlation system also includes event prediction of future up and down movement of the object property based upon the aforementioned patterns, as well as trading signals generated on and trading strategies based on the aforementioned patterns.
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Tourism & Hospitality (AREA)
- Data Mining & Analysis (AREA)
- Technology Law (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method finds patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social media messages, wherein the target represents a quantifiable property of an asset being tracked. The method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, ss, relating to an asset; generating a frequency time series, sf, relating to an asset; and determining a pattern based upon the sentiment time series and the frequency time series.
Description
- This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/466,067, entitled “METHOD AND SYSTEM USING SOCIAL MEDIA FOR REAL-TIME EVENT DRIVEN TRADING”, filed Mar. 22, 2011.
- 1. Field of the Invention
- The present invention relates a method and system using social media for real-time event driven trading of equities, commodities and other traded assets.
- 2. Description of the Related Art
- Sentiment analysis applies various analytical techniques in identifying subjective information from different information sources. Sentiment analysis, therefore, attempts to ascertain the feelings, thoughts, attitude, opinion, etc. of a speaker or a writer with respect to a topic.
- Most work on sentiment analysis has relied on two main approaches. The first approach, in particular, a so called “bag of words” approach, attempts to apply a positive/negative document classifier based on occurrence frequencies of the various words in a document. Applying this approach various learning methods can be used to select or weight different parts of the text used in the classification process. This approach fails to process the sentiment with respect to assets (for example, equities or commodities) in short digital messages such as tweets sent via the online social networking service Twitter.
- The second approach is “semantic orientation.” Semantic orientation automatically classifies words into two classes, “good” and “bad”, and then computes an overall good/bad score for the text. This method does not take into consideration the sentiment conveyed by parts of speech other than adjectives, including verbs, for example, to bounce, to crash, nouns, for example, a put, a call, and phrases, for example, ascending triangle, black Friday, head-and-shoulders.
- Both methods fail to determine the sentiment with respect to specific assets in short digital messages such as tweets sent via the online social networking service Twitter. Their main pitfall is that they fail to process the sentiment in the syntax-semantic context of the message.
- It is, therefore, an object of the present invention to provide a method for finding patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social media messages, wherein the target represents a quantifiable property of an asset being tracked. The method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, ss, relating to an asset; generating a frequency time series, sf, relating to an asset; and determining a pattern based upon the sentiment time series and the frequency time series.
- It is also an object of the present invention to provide a method wherein sentiment is an expression of a psychological state relative to an event.
- It is another object of the present invention to provide a method wherein frequency represents the volume of social media messages about the asset.
- It is a further object of the present invention to provide a method wherein the step of generating a sentiment time series is performed by language processing and is derived based upon pairs of lexical items in local syntactic context found in a volume of social media messages.
- It is also an object of the present invention to provide a method wherein the step of generating a sentiment time series includes the creation of an average sentiment series, sa, such that for every point (t,s) in the sentiment time series, ss, there is a point (t, a) in an average sentiment series where “a” is the arithmetic average of all the sentiments in a time range [t−w, t].
- It is another object of the present invention to provide a method wherein the step of generating a sentiment time series includes the creation of a sentiment-frequency series, ssf, to contain a point (t,vsf) for every (t, a) in the sentiment time series, ss, and (t, f) in the frequency time series, sf, where vsf=fa(=ea ln(f)).
- It is a further object of the present invention to provide a method wherein the frequency time series, sf, is dependent upon the sentiment time series, ss, and a positive number w representing a time called window size.
- It is also an object of the present invention to provide a method wherein for each point (t, s) in the sentiment time series, ss, the frequency time series, sf, contains a point (t, f) where f is the number of points in the sentiment time series, ss, in the time range [t−w, t], divided by w.
- It is another object of the present invention to provide a method wherein the number f is called frequency and
-
- It is a further object of the present invention to provide a method wherein the pattern P is a cross-correlation c in [−1,1], a positive window size w, a time lag l, and a time ts, and these numbers are interpreted as a predictive series over [ts−w, ts] correlating to the target series over [ts−w+l, ts+l] with a cross-correlation of c″.
- It is also an object of the present invention to provide a method wherein the step of determining a pattern employs a sentiment-frequency method that uses sentiment to create a sentiment-frequency series, sfs, and correlates to the target using a plain statistical cross-correlation.
- It is another object of the present invention to provide a method wherein the step of determining a pattern includes the step of identifying an optimal time lag.
- It is a further object of the present invention to provide a method wherein correlating two time-series using a plain statistical cross-correlation and finding the optimal lag is achieved with a series correlator.
- It is also an object of the present invention to provide a method wherein the series correlator produces a set of patterns based on a real-valued pulsated time series sp, a real-valued sampled time series, ss, an interpolation method I for ss, and a window size w.
- It is another object of the present invention to provide a method wherein the interpolation method I, is a function of a time series ss and of a time t that is C1-piecewise continuous with respect to t, and such that if there exists a point (t, v) in ss, I(ss, t)=v.
- Other objects and advantages of the present invention will become apparent from the following detailed description when viewed in conjunction with the accompanying drawings, which set forth certain embodiments of the invention.
-
FIG. 1 is a schematic overview of the present system. -
FIG. 2 is a representation of the graphical user interface in accordance with the present invention. -
FIG. 3 is a partial view of the reaction indicator. -
FIG. 4 is a graphical depiction showing the correlation of frequency and sentiment. -
FIG. 5 is a screen shot showing the ingest and processing of various assets. -
FIGS. 6A and 6B are screen shots when a moving spherical graphic object is clicked in the graphical user interface. -
FIG. 7 is a screen shot showing various moving spherical graphic objects shrinking and growing based on social media intensity thereof. - The detailed embodiment of the present invention is disclosed herein. It should be understood, however, that the disclosed embodiment is merely exemplary of the invention, which may be embodied in various forms. Therefore, the details disclosed herein are not to be interpreted as limiting, but merely as a basis for teaching one skilled in the art how to make and/or use the invention.
- In accordance with the present invention, and with reference to
FIGS. 1 to 7 , a method and system using social media for event-driven trading are disclosed. The present method andsystem 10 use social media for the real-time evaluation of publicly traded assets, in particular, equities and commodities, using information generated through social media interactions. For example, and with reference toFIG. 5 , a series of “tweets” (social comments transmitted using the social networking service Twitter) are shown. As used herein an “asset” is considered to be a resource with economic value that an individual, corporation or country owns or controls with the expectation that it will provide future benefit. Assets include, but not limited to investments in equities, options, derivatives, commodities, bonds, futures, currencies, etc. It should further be appreciated that “equities” are stocks or any other securities representing an ownership interest. - It is appreciated the following discloses the present method and
system 10 with reference to the stock market, although the application of the present invention could be extended to commodities and other asset based markets. By monitoring publicly available social media information, thepresent system 10 is able to effectively predict swings in asset prices for effective and profitable trading thereof. - As will be appreciated based upon the following disclosure, the present method and
system 10 provide asentiment calculator 22 that employs natural language processing in evaluating social media interactions by anticipating the sentiment of traders relating to specific equities and commodities in terms of the polarity of the sentiment and the strength of the sentiment. The data generated by thesentiment calculator 22 is applied to areaction indicator 31 in the form of agraphical user interface 30 that combines sentiment and frequency (which is indicative of the intensity of the sentiment) data relating to the assets. Once sentiment and frequency are fully appreciated, thepresent system 10 and method provide a mechanism for cross-correlating the sentiment and intensity data (the perceived strength of the sentiment being expressed by the social media) with the actual fluctuations occurring with the price of assets. - Briefly, and in accordance with a preferred implementation of the present invention, the
present system 10 provides for the processing of social media messages generating data for the real-time evaluation of publicly traded assets, for example, stocks. Thesystem 10 includes aningest component 11 for ingesting the social media messages; afilter module 14 eliminating expressions not considered useful language from social media messages; a natural language processor (NLP) 16 processing filtered social media messages; asentiment calculator 22 applying rules to the filtered and NLP processed social media messages so as to compute a representation of values associated with the filtered and NLP processed social media messages; and agraphical user interface 30 displaying the values generated by thesentiment calculator 22. - With reference to
FIG. 1 , the ingestcomponent 11 consumes, acquires or gathers a wide range ofsocial media messages 12 and immediately filters the messages as will be explained below in greater detail. The ingestcomponent 11 is a data acquisition module. The ingestcomponent 11 allows thesystem 10 to automatically import raw social media messages, for example, tweets from Twitter or other social media sites. The data, that is, the raw social media messages, is acquired on the basis of a predefined set of keywords or combination of keywords thesystem 10 has been programmed to look for. The filtered social media messages are then subjected to natural language processing viaNLP module 16 based uponlexical databases sentiment calculator 22 andinference engine 24. Thesentiment calculator 22 andinference engine 24 apply information fromdatabases 26, 28 respectively relating to the knowledge of the stock market world and the knowledge of the world. The results of thesentiment calculator 22 andinference engine 24 are then presented to the user via areaction indicator 31 in the form of a graphical user interface upon a computer monitor which displays sentiment per asset information. - As discussed above, sentiment calculation is part of the
present system 10 for event-driven trading using social media messages. As described above, thesystem 10 ingests content (that is, social media messages such as tweets) from one or multiple social media sources based on user-specified criteria. The meaning of the information conveyed by the social media messages is determined using a natural language processing (NLP)module 16. Thesystem 10 then calculates “sentiment” and presents metrics relating thereto in real-time. -
FIG. 5 shows social media messages, for example, “tweets”, with annotations relating to the sentiment scoring for the individual tweets. In this way, sentiment calculations in accordance with the present invention may be used to anticipate the reaction of the traders before they act. - In accordance with a preferred embodiment, the
sentiment calculator 22 of thepresent system 10 analyzes social media messages to calculate the sentiment with respect to events pertaining to objects. Objects relate to assets being traded, via situations having a bearing on public sentiment and relating the value of the asset being traded (preferably on an exchange). It should be appreciated “object(s)” refers to anything related to an asset that can be publicly traded and monitored. For example, an “iphone” and the stock symbol “AAPL” are objects which relates to the asset Apple Inc. which can be publicly traded. - The
sentiment calculator 22 represents one module of the presentmultilayered system 10 for processing short and noisy messages such as tweets, as depicted in the schematic shown inFIG. 1 . Proper operation of thesentiment calculator 22, that is, sentiment calculations, requires that afilter module 14 configure input text into formats for use by the subsequent processing modules of the pipeline making up the present invention. Thefilter module 14 is composed of a set of rules (using regular expressions) created to transform the ingested social media messages into expressions without noise. Noise is considered to be elements in the message which are not part of natural language, such as hash tags, URLs, etc. Therefore, thefilter module 14 functions to bring tweets as close as possible to expressions in natural language by eliminating expressions that are not considered part of current language usage. For example, thefilter module 14 eliminates URLs and hash tags at the periphery of the tweet, normalizes symbols and abbreviations, for example “q1|1q|1st quarter|” is replaced by “first quarter”. Basically, thefilter module 14 eliminates noisy elements from the data being ingested, such as URLs and hash tags so that it may be further processed by theNLP module 16. - Once filtered, the ingested social media messages are then sent to
NLP module 16 for further processing. Sentiment calculations in accordance with the present invention require that a Part of Speech (POS)Tagger 33 assign lexical categories to each of the filtered social media messages as they are broken into a stream of text, words, phrases, symbols, or other meaningful elements called tokens, that is, tokenized messages. Sentiment calculations in accordance with the present invention also require that a partial parser (PARS) 32 recover the structure of the main constituents/syntactic structures (a lemmatizer deriving the canonical form of lexical items, that is, a single or a group of words conveying a single meaning, to enable the lexical lookups) of the filtered social media messages. - The system also employs MORPH 34, which is a lemmatizer which reduces the spelling of words to its lexical root or base/lemma form. In English, the base form for a verb is the simple infinitive. For example, the gerund “striking” and the past form “struck” are both forms of the lemma “(to) strike”. The base form for a noun is the singular form. For example, the plural “mice” is a form of the lemma “mouse.” Most English spellings can be lemmatized using regular rules of English grammar, as long as the word class is known. MORPH 34 uses a list of numerous such rules to reduce an ingested and non-filtered word to its base form. In accordance with a preferred embodiment the application MorphAdorner is utilized, the documentation of which, “MorphAdorner, A Java Library for the Morphological Adornment of English Language Texts”, Version 1.0. Apr. 30, 2009, Copyright© 2007, 2009 by Northwestern University, is incorporated herein by reference.
- Finally, the data composed of the filtered and NLP processed social media messages is supplied to the
sentiment calculator 22 that calculates sentiment compositionally in the syntactic context. The process of sentiment calculation also employs aninference engine 24 that fine-tunes sentiment calculations using knowledge of the world. This process for sentiment calculation enables sentiment to be calculated on the basis of a set of rules deriving the polarity of stock events and their strength. - The problem of identifying the sentiment of social media messages on asset markets can be detailed as follows:
-
- i) the social media messages are short, for example a tweet using Twitter is limited to 140 characters;
- ii) the social media messages lack several constituents that are normally part of English sentences;
- iii) the social media messages are noisy, they include characters and expressions that are not part of English sentences;
- iv) the social media messages may be in a language other than English;
- v) in some cases, the social media messages are not complete English sentences and truncated messages are observed;
- vi) reported information, such as headlines, which do not directly convey sentiment, as well as social media messages conveying sentiments are also part of the ingest; consequently sentiments cannot be differentiated from facts;
- vi) the knowledge of the asset markets world includes constant as well as contingent knowledge; and
- viii) the sentiment is thus a function of the natural language expressions used in the social media messages in conjunction with the knowledge of these expressions as they are used in asset market exchanges.
- The fact that tweets are constrained to 140 characters means that messages sent via Twitter begin to resemble programming languages such as Fortran (which originally had a constraint of 72 characters per line). The primary effect of this constraint is a limitation on the freedom available to the author of a tweet as he or she attempts to convey a specific message. This means that it is now possible to envision compiling tweets (akin to compiling a programming language) and achieving very high levels of accuracy in deriving sentiment whilst minimizing resource consumption and interpretation times. It therefore becomes feasible to ingest and process potentially millions of messages per hour using Common Off The Shelf (COTS) computers.
- The technical advantage of the
present system 10 relative to other known technologies is that thepresent system 10 is based on natural language processing techniques rather than machine learning techniques (for example, Naive Bayes, maximum entropy classification, and support vector machines), as described for example in Pang and Lee. Bo Pang and Lillian Lee 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 79-86. - As will be appreciated based upon the following disclosure, the rule-based method that is used in accordance with the present invention avoids the shortcomings of the statistical method because it processes the social media messages directly instead of classifying social media messages on the basis of probabilistic algorithms. Another advantage lies in the innovative contribution of the
inference engine 24, which contributes to reduce uncertainty and brings further support to decision making. - Because of their limitations, classification algorithms, such as Naïve Bayes classifiers, are not used in the implementation of the present invention. The present invention solves the problem by processing the actual content of the social media messages as they are formulated. It does not calculate the number of positive adjectives in a social media message, or in a set of social media messages, to compute sentiment, contrary to common practice.
- The features of the present invention that provide a solution or benefit are the following:
-
- i) the ingestion of social media messages on the basis of targeted keywords according to the requirements of the stock traders (as discussed below in more detail);
- ii) the filtering of items that are not part of natural language (English);
- iii) the tagging of the items in the filtered messages with part-of-speech tags;
- iv) the recovery of the syntactic structure associated with each social media message;
- v) the application of the sentiment calculus rules to the output of the syntactic structure on the basis of the sentiment value of the lexical items and how they are syntactically combined;
- vi) the stock-specific lexical items and phrases of the major lexical categories (event denoting Nouns and Verbs, stock-market specific Adjectives and Adverbs) are associated with lexical sentiment values; and
- vii) the sentiment calculator applies to pairs of lexical items and their syntactic structures/constituents relating sentiment-marked lexical items in the syntactic configuration where they occur, ensuring the computation of an accurate sentiment-per-asset value.
- The
sentiment calculator 22 is a module of the multilayered architecture employed in accordance with thepresent system 10, which can be customized for different domains, including, for example, finance, security and pharmaceutics. In the disclosed application in the stock market exchange, thesentiment calculator 22 calculates sentiments from stock exchange-relatedsocial media messages 12 in order to predict stock movements before human traders can act. - The innovation brought about by the sentiment calculation in accordance with the present invention is the event driven approach to sentiment mining. Unstructured incoming
social media messages 12 are processed in order to extract sentiment about pre-specified assets, as they participate in ongoing events. As will be explained below, thesentiment calculator 22 performs event-driven sentiment calculus. - The event-driven approach to sentiment mining as applied in accordance with the present invention can be represented in accordance with equation (1), where M stands for Modifier, Ev stands for Event, and x, . . . , z stand for the participants of the event. The asset the sentiment is about is a participant of the event.
-
(M(Ev(x, . . . ,z))) (1) - This relational approach to sentiment mining contrasts with the statistical keyword search approach, classifying messages on the basis of the number of positive or negative qualifiers. The statistical keyword search approach fails to provide sentiment-per-asset values.
- The present invention takes an event to be a change in the relation between the participants of the event. The participants of an event are: names, organizations, locations, expressions of time, quantities, monetary values, percentages, etc. The
present system 10 includes name entity recognition capacities and syntax-semantic capacities to provide the articulation of events and their participants. The interpretation of syntactic structure is generally compositional: that is, the interpretation of the whole is a function of the interpretation of the parts. However, part of the semantics conveyed by natural language is non-compositional and idiosyncratic. The idiosyncratic meanings are listed in lexicons assuming both generic (Sent-Lex 20) and domain-specific (Stock-Lex 18) lexicons - As briefly discussed above, “sentiment” about an asset participating in an event is considered in accordance with the present invention to be the orientation (that is, the polarity in opinions expressed regarding the asset) and the strength of the opinions on that asset that deviates from the normal state. A sentiment is the expression of a psychological state relative to an event (whether that event be static or dynamic). Considering social media messages sent via the social networking site Twitter are limited to 140 characters, lexical items, emoticons and other diacritics found in such messages cannot express the richness of thought and sentiments conveyed by traditional written natural language without further processing. The present system and methodology focus on the properties of natural language employed in the social media messages to calculate the sentiment with respect to given objects in ongoing stock events referred to in social media exchanges.
- In accordance with the present invention, sentiment is represented by an integer combining a polarity value (polarity positive +, negative −, neutral n) and a strength value ranging within a pre-defined scale. Using data generated by the filtering and natural language processing of the social media messages, the
sentiment calculator 22 yields an integer that combines the polarity and the strength values of each pair of expressions relating an asset to an event as explained below in greater detail. - Polarity is a value (that is, positive, negative or neutral) that is part of the lexical specification of words and phrases. These values will compose according to the Polarity rules, provided below. Strength is an integer, that is, also part of the lexical specification of the words and phrases. The values for strength in accordance to a preferred embodiment of the present invention range from 1 to 3 (1 low and 3 is high). These numbers will be added in the processing of messages according to the Strength rules, provided below. However, it is appreciated that values for strength could range from 1 to 5 or higher.
- The
sentiment calculator 22 is embedded as part of the presentoverall system 10 that ingests social media messages from multiple social media sources based on user-specified criteria. As discussed above, the social media messages go through a filtering layer/module 14 purging the messages of noise (URLs, hashtags, etc.). The results of the filtering of social media messages are tokens which are assigned part of speech tags according to the lexical and contextual properties of the lexical items based uponNLP module 16. A parser 32 then recovers the major constituents/syntactic structures of the tokenized messages. Thesentiment calculator 22 takes the annotated partial parses as its input and yields a sentiment-per-asset on the basis of the sentiment values of the lexical items and the sentiment logic, calculating the sentiment of constituents on the basis of the sentiment values of their parts. Thesentiment calculator 22 interacts with theinference engine 24 to determine the sentiment with respect to the knowledge of the word. - For example, and considering the
present system 10 as applied in the stock-exchange domain, thesentiment calculator 22 derives sentiment in terms of polarity and strength with respect to objects (for example, assets as referenced by tickers and commodity names) as they participate in ongoing stock events described by the ingested social media messages, for example, tweets, 12. The generic representation in equation (1) as noted above can thus be instantiated by equation (2) for this application. -
M(stock-event(stock-object x, . . . ,stock-object z))) (2) - As will be appreciated based upon the following disclosure, stock-market specific lexical items and phrases are qualified in the Stock-
Lex 18, and thesentiment calculator 22 applies to pairs of sentiment-marked lexical items compositionally in their syntactic configuration. - As discussed above, the
sentiment calculator 22 is a module of the pipeline making up thepresent system 10. The components of thissystem 10 process incoming social media messages, and yield a sentiment-per-object/asset for each ingested incoming social media message in real-time. Thesentiment calculator 22 calculates the sentiment-per-asset for each incoming social media message ingested by thesystem 10. -
FIG. 1 represents the three main components of the system: Ingest 11 (the social medial messages 12), Process 15 (the social media messages using thefilter 14,NLP 16 and sentiment calculator 22), and Display 30 (the results on the processing step on areaction indicator 31 in the form of a graphical user interface 30). It also identifies the specific NLP components/modules (POS 33, PARS 32, MORPH 34, Stock-Lex 18 and Sent-Lex 20) processingsocial media messages 12 from the ingestcomponent 11 to thesentiment calculator 22 and its interaction with the inference engine 24 (which includes databases relating to Knowledge of theStock Market world 23 and Knowledge of the world 25). - As discussed above, the architecture of the
system 10 is shown inFIG. 1 . The following explains the main features of each component of this architecture, where the lexicon, the part of speech tagger (POS) 33 and the parser (PARS) 32 can be parameterized to process different languages. Thus, in addition to the fact that thissystem 10 can calculate sentiment in different domains, it can also process sentiment cross-linguistically. - 1. The Ingest Component
- Simplex and complex keywords are used for ingesting the
social media messages 12, according to the requirements of the stock traders. The hardware used for ingest are standard off the shelf computers gathering and processing social media messages using the pre-determined keywords. The techniques used for collecting social media messages must take into consideration the requirements of stock traders, see Section 1.1, as well as they must enable the collection of social media messages with respect to specific assets, as described in Section 1.2 - 1.1 The Requirements of Stock Traders
- From a stock traders perspective, there must be a measurable and significant correlation between sentiment (as manifested in social media messages) and price movement. The correlation can be positive or negative. For example, there is usually a strong positive correlation between the performance of the financial sector and the S&P 500 and there is a negative correlation between Volatility and the S&P500. As there needs to be enough social media, for example, tweet, volume to provide confidence that the aggregate sentiment will have enough mass to move the asset price. In many cases, collecting all tweets pertaining to a single stock symbol will NOT meet the volume threshold that would produce a reliable correlation between sentiment and price. This can be mitigated by trading assets that have measured price correlations over an extended period of time by ingesting and processing tweets that pertain to all price-correlated assets and then using the sentiment derived from the above described aggregation of tweets to trade each individual asset.
- 1.2 Collecting Social Media Messages Regarding Specific Assets
- The set of keywords for specific assets is defined in terms of generic categories that can be parameterized according to the finance domain. Depending on the nature of the asset, different strategies for ingesting a large number of relevant social media messages are used. For example, the following strategies may be employed:
-
- (2) single keyword and exclude list of irrelevant combinations; or
- (3) binary keyword template of the form: object “X”+predicate
- An example of the strategy (2) is used for Crude Oil, where only one keyword is used, “oil”, and a very large exclude list include expressions such as “Soya oil”, “olive oil”, etc. In the case of commodities, such as Gold, strategy (3) is preferred. Strategy (3) employs a large include list made up of both unary and binary expressions including the word “gold”, the object “X”, and another word, a predicate, as in “gold industry”, “gold news”, “gold investor”, “gold invest”, “gold investment”, “gold plunge”, “gold raise”, “gold plunged”, “gold raised”, “gold plunging”, “gold raising”, “gold decline”, “gold declined”, “gold declining”, “gold rally”, “gold rallied”, “gold rallying”, “gold fall”, “gold falls”, “gold falling”, “gold fell”, etc. A large exclude list is still necessary to exclude for example jewelry items and colors. Strategy (3) can be used for other commodities by substituting names of other commodities to the variable in (3) and keeping constant the set of predicates. Thus, a very similar set of keywords may apply to other commodities.
- This technique using refined keyword strategy is used in conjunction with the methods described above to come up with a sufficiently large number of social media messages, and a high degree of correlation between derived sentiment and price movement, thereby meeting the two requirements of sentiment-price correlation and sufficient volume.
- The refined keyword strategy for ingesting relevant social media messages increases the volume of ingested social media messages that will be fed into the other components of the system, described in the following paragraphs.
- 2. The Language Identifier Module
- The ingested social media messages may include messages in a language other than English. A language identifier/
detector 19 is therefore employed in identifying the language of an incoming message and assigns it a code. For example, the ingested social media message (4) will be assigned the code (5), which stands for English. -
- (4) Gold Rises but Lags as the Dollar Drops Sharply http://bit.ly/da88XX
- (5) en
- Language identification is a prerequisite for the NLP processing in accordance with the present invention, as the overt syntactic properties vary between languages, as well as the form and content of the lexical items. It is thus necessary to ensure that the social media messages processed by the
NLP module 16 will be English messages, or whatever language thesystem 10 was parameterized for. - 3. The Filter Module
- The
filter module 14 is a pre-NLP processing module that bringssocial media messages 12 as close as possible to expressions in natural language by eliminating expressions that are not part of current use of language. For example, thefilter module 14 eliminates URLs and hash tags at the periphery of the tweet, normalizes symbols and abbreviations, for example q1|1q|1st quarter| is replaced by “first quarter”. - The
filter module 14 also performs sentence detection on the basis of typographic cues. This is a necessary step in the pre-NLP processing, since social media messages may include more than one sentence, see (6). As the NLP processing and the sentiment calculus are sentence bound, sentence boundary delimitation is necessary. For example, thefilter module 14 applies to (7), replaces the URL by a period, convert capitals into lower case, and yields (8): -
- (6) Crude Oil Is Unchanged as US Stocks Decline, China's Processing Surges: The Cisco announcement sent stocks lower . . . _http://bit.ly/dyU40L
- (7) Gold Rises but Lags as the Dollar Drops Sharply http://bit.ly/da88XX
- (8) Gold rises but lags as the dollar drops sharply.
Thus, thefilter module 14 takes an ingested social media message as its input and transforms the social media message to a less noisy English expression, which is then subject to NLP processing.
- 4. The NLP Processor
- 4.1 The Sent-Lex (Sentiment-Lex)
- The Sent-
Lex 20 is a hand-crafted sentiment-based repository, or database, of the most frequent lexical items and phrases collected from the ingested social media messages, as well as from specialized vocabularies, that are indicative of sentiment. The lexical items and phrases vary according to the domain of application, e.g. finance, security, pharmaceutics, etc. Words that are not sentiment bearing, such as definite articles and auxiliaries, are not part of the Sent-Lex 20. In the present event-driven approach to sentiment mining, sentiment is associated to event denoting verbs and nouns, as well as with sentiment-bearing modifiers of events or of participants of the events. - The lexical specifications are designed to be parameterized to specific domains of application. The generic format of the lexical entry includes the lexical item, followed by fields of lexical specifications. The first field specifies the category of the item, the second field specifies its polarity, the third field specifies its lexical strength, and the fourth field specifies the polarity of the semantic arguments of the lexical items and phrases, if applicable.
-
- (10) Lexial item, category, polarity, strength, argument's polarity and strength
- Thus, the lexical items and phrases and their features are stored in a lexical database, that is, the Sent-
Lex 20. Each of the lexical items and phrases maintained in the Sent-Lex 20 is associated with a category tag, an inherent polarity value, an inherent strength value, and for some items, polarity and strength values are also associated to designated argument structure variables as in (11). For example, in the case of the verb acquire, the acquired object, the variable y in that verb's argument structure is associated with a positive value, as in (12), this is not the case for other verbs such a announce and report. Thus, in (13) Google is associated with a positive sentiment. -
- (11) Categorial tag: NN, VB, RB, . . . .
- Polarity values: +, −, n
- Strength values: 1, 2, 3, where 1 is min. and 3 is max.
- Argument structure values associated to the argument variables: (x, y, z, w)
- (12) acquire (x, y)
- +2
- (13) Apple acquired Google.
- +2
- (11) Categorial tag: NN, VB, RB, . . . .
- The categories, nominal (NN), verbal (VB), adjectival (JJ), adverbial (RB) and their sub-categories, are intrinsically associated to polarity (+, −, n), and Strength (1, 2, 3). Furthermore, the lexical specifications differentiate degree modifiers, such as very, too and much from modifiers, such as good and better. Degree-intensifiers contribute their own lexical value, and add an
extra value 1 to the category they modify, see (14) below for examples. -
- (14) Sample of the JJ/RB database:
-
Tag Polarity Strength Intensifier Several JJ n 1 impressive JJ + 2 More JJR n 2 Most JJS n 3 Good JJ + 1 Better JJR + 2 Best JJS + 3 Very RB* n 3 1 Weak JJ − 1 Weaker JJR − 2 Weakest JJS − 3 So RB* n 1 1 Too RB* n 1 1 - 4.2 the Stock-Lex (Specific Stock Trading Lexicon)
- In the current application, the Stock-
Lex 18 is the stock-based lexical repository, or database, consisting of the most frequent lexical items and phrases used in the ingested social media messages that relate to stock-based knowledge, as well as most frequent items used in stock exchange and financial news wire such as the Financial Post (or other commodity exchange system depending upon the application to which thepresent system 10 is applied). The Stock-Lex 18 thus includes a restricted set of stock-specific lexical items and phrases, associated with their domain specific polarity and strength values. The polarity values are: positive, negative and neutral. The lexical strength associated to the lexical items and phrases ranges from 1 to 3, where 1 is the lowest value and 3 is the highest value, see (15) for examples. -
- (15) decline, V, −, 2
- decrease, V, −, 2
- deleverage, V, +, 3
- detain, V, −, 1
- deteriorate, V, −, 2
- develop, V, +, 1
- die, V, −, 3
- dip, V, −, 2
- (15) decline, V, −, 2
- The stock-specific lexical items and phrases are part of the major lexical and phrasal categories, nominal, verbal, adjectival. Only event denoting nominal and verbal expressions are part of the Stock-
Lex 18, and only stock specific adjectival and adverbial modifiers are part of the Stock-Lex 18. - Stock objects (tickers, company names, product names, etc.) have a neutral polarity and have no associated strength value. The
sentiment calculator 22, as specified below, derives the sentiment with respect to specific stock objects. - The Stock-
Lex 18 is a repository of the most frequent sentiment-bearing noun, verbs, adjectives and adverbs used in social media stock market-related exchanges. Each lexical item is associated with a part of speech (POS), a polarity and strength. The Stock-Lex 18 is handcrafted and contributes to the invention in providing sentiment specifications for event denoting items, and their dependents. The innovation is two-fold: i) it specifies sentiment values for other categories than adjectives, contrary to common practice; ii) it specifies sentiment value for event denoting lexical items and their dependents, thus providing the lexical information used by thesentiment calculator 22 for the compositional calculus of the sentiment-per-asset. - 4.3 The POS Tagger
- The sentiment calculus applies to lexical items and phrases in their syntactic context. In order to derive the syntactic context for sentiment calculus, each incoming filtered social media message is broken into a stream of text, words, phrases, symbols, or other meaningful elements called tokens, that is, tokenized messages and each token is assigned a Part Of Speech (POS) by the
POS tagger 33. In accordance with a preferred embodiment, Brill Tagger, that is, a known methodology for performing part of speech tagging, is used, as it is sensitive to the lexical properties and distributional properties of lexical items and phrases in natural languages. It is appreciated Brill Tagger is an “error-driven transformation-based tagger”. Brill Tagger is error-driven in the sense that it recourses to supervised learning transformation and Brill Tagger is based in the sense that a tag is assigned to each word and changed using a set of predefined rules. - The
POS tagger 33 is necessary in accordance with thepresent system 10 to identify the lexical items that contribute to the sentiment calculus, namely adjectives (JJ), adverbs (RB), as well as event denoting verbs (e.g., to upgrade) and nouns, e.g. (e.g., an upgrade). Thus, the POS identification of the elements of the event structure (16), (17), reduces the complexity of sentiment mining, and contributes to the precision of the sentiment calculus. - Thus, the
POS Tagger 33 applies to the ingested filtered social medial messages, tokenizes the string and assigns part of speech to the tokens on the basis of a set of lexical and contextual rules, accounting for the distribution of categories in natural language texts. To illustrate, Brill Tagger applies to (18) and derives the annotated tokenized string in (19). -
- (18) Gold Rises but Lags as the Dollar Drops Sharply.
- (19) Gold/NNP rises/VBZ but/CC lags/VBZ as/IN the/DT dollar/NN drops/VBZ sharply/RB./.
- where the Brill tags NNP stands for proper noun, VBZ stands for verb, CC stands for conjunction, IN stands for preposition, DT stands for determiner, NN stands for common noun, and RB stands for adverb.
- The majority of operating sentiment mining systems detect sentiments only on the basis of mining of adjectives with positive, e.g., good, great, excellent, or negative value, e.g., bad, worse, terrible, and so on. However, other parts of speech also convey sentiment. This is the case of adverbs in the verbal domain, which modify the event (action or state) described by the verbal projection they modify, like adjectives in the nominal domain, which modify the object denoted by the nominal projection. In this relational approach to sentiment mining as applied in accordance with the present invention, sets of POS are related to the elements of event structures, for example in (16), (17) above the M can be adjective JJ or adverb RB, the event can be a noun NN or verb VB. The identification of the POS of the tokens of the filtered social media messages reduces the complexity of sentiment mining as well as it contributes to its efficiency.
- 4.4 The Parser
- The tokenized and POS annotated messages resulting from the
POS tagger 33 are fed to a partial parser 32 that recovers the main syntactic constituents of the social media messages. The partial sparser 32 employs a Cass parser, Abney's cascaded FST (Finite State Transducer), to recover the main syntactic constituents of the basis of the tokenized and POS annotated representations of social media messages, as illustrated in (20). -
(20) Gold/NNP rises/VBZ but/CC lags/VBZ as/IN the/DT dollar/NN drops/VBZ sharply/RB ./. [c [c0 [nx [name [nnp Gold]]] [vx [vbz rises]]]] [cc but] [vp [vx [vbz lags]] [pp [as as] [nx [dt the] [nn dollar]]]] [vp [vx [vbz drops]] [rb sharply]] [per .] - Partial parsing is designed for use with large amounts of noisy text. Robustness and speed are primary design considerations. Not all NLP applications require a complete syntactic analysis. Partial parsing is used in information retrieval as well as information extraction applications, such as facts and sentiment mining, where finding simple nominal and verbal constituents is enough. Full parser provides more information than needed, and when expected information is missing, as it is generally the case in social media messages, where syntactic reductions and truncation are necessary to convey meaning within limited character constraints, for example, 140 characters when considering tweets using Twitter.
- The leaves of the parse tree are associated with their sentiment values via access to Stock-
Lex 18 and thesentiment calculator 22 applies to the resulting semantically annotated tree. The main properties on the calculator are described in the following section. - 5. The Sentiment Calculator
- A sentiment is an integer, which can be either positive or negative, computed on the basis of the application of the rules of the sentiment calculus to pairs of lexical items in their local syntactic context; for example, nouns (that is, nominal lexical items) representing assets and nouns/verbs/adjectives (that is, nominal, verbal or adjectival lexical items) representing sentiment in the form of polarity and strength. The computed sentiment value ranges within a pre-established scale. In accordance with the present invention, the
sentiment calculator 22 uses social media messages for the real-time evaluation of publicly traded equities and commodities wherein a sentiment is a positive or negative integer computed based upon pairs of lexical items in local syntactic context. In its most basic components the sentiment calculator employs a mechanism for determining lexical polarity in social media messages and a mechanism for determining a strength value of lexical items and phrases used in social media messages. - The sentiment calculus employed by the
sentiment calculator 22 applies to the output of the annotated Cass tree produced by the partial parser 32. It compositionally derives the sentiment associated to entities in the event denoted by the expression they are part of. The sentiment logic is a compositional calculus deriving the sentiment value of a relation on the basis of the sentiment values of its parts. - In the specific domain of stock-market exchanges, the sentiment logic calculates sentiment values per asset with respect to stock market events described by the incoming social media messages. Namely, it calculates the sentiment with respect to given assets, as they occur is stock events.
- As discussed above, the social media messages relating to an asset are gathered by a set of keywords used for ingesting the social media messages. The sentiment calculus is based on the lexical polarity and strength value of the lexical items and phrases defined in the Stock-
Lex 18 and how they are syntactically organized in the Cass tree. The maximal local domain for the application of the calculus is the sentence; the minimal local domain is the smallest constituents including the keywords standing for the asset. The sentiment calculus applies locally to the constituents including the asset within the sentences of the message. The Cass parser derives the syntactic constituents of the sentences, including the adjectival (cx), as well the nominal (nx) and the verbal (vx) constituents. - The polarity and strength rules apply to syntactic constituents in head-complement, modifier-modified, and subject-predicate relations, which are identified on Cass trees. These relations are defined as follows. A head of a constituent is a lexical item, such as a verb, e.g., hit, or a noun, e.g., acquisition, that makes the constituent it is part verbal (vx) or nominal (nx). A head selects a complement, which is a syntactic constituent such as a nominal phrase, e.g. the market in hit the market, and AAPL in the acquisition of AAPL. A modifier is an adjective or an adverb that modifies another constituent, a nominal constituent in the first case and a verbal constituent in the other case, e.g., strong market and strongly hit the market. The subject-predicate relation is the relation between a subject, generally a nominal constituent and a predicate, generally a verbal constituent, e.g., in the sentence AAPL hits the market, AAPL is the subject and hit the market is the predicate.
- The sentiment calculus includes separate rules for calculating the polarity and the strength. They have the generic form of dyadic operators (Op (arg1, arg2)), and their specific form is dependent on the relation between arg1 and arg2, as well as the lexical polarity and strength values of the lexical items and phrases specified in the Stock-
Lex 18. -
- Polarity (Pol): Pol (arg1, arg2), where arg1 is a head and arg2 is a dependent. The rule applies locally in syntactic constituents/domains, e.g., nx, vx, cx, etc. It derives the polarity of constituents on the basis of the polarity of their parts and how they are syntactically related. The polarity rules apply in three universal syntactic relations defined above (that is, head-complement, modification (modifier-modified), and predication (subject-predicate) relation), according to the polarity of the parts of the relations. The Polarity rules include the following:
- Pol ([x] [y])=Compose ([x], [y]) as specified by the following rules:
-
- (21) if (x is NEG) and (y is +), then Pol (y=−) NEG, +=− no upgrade
- if (x is NEG) and (y is −), then Pol (y=n) NEG, −=n not bad
- if (x is NEG) and (y is n), then Pol (y=n) NEG, n=n no report
- (22) if (Pol (x)=Pol (y)), then
- if (x is n) and (y is n), then Pol (y=n) n, n=n average result
- if (x is +) and (y is +), then Pol (x=+) +, +=+ announce an upgrade
- if (x is −) and (y is −), then Pol (y=−) −, −=− downgrade to sell
- (23) if (Pol (x)≠Pol (y)), and
- if (x is +) and (y is n), then Pol (y=+) +, n=+ impressive report
- if (x is +) and (y is −), then Pol (x=−) +, −=− impressive downgrade
- if (x is −) and (y is n), then Pol (y=−) n=− weak report
- if (x is −) and (y is +), then Pol (y=−) −, +=− missed rally
- if (x is n) and (y is +), then Pol(y=+) n, +=+average upgrade
- if (x is n) and (y is −), then Pol (y=−) n, −=− average depreciation)
- (21) if (x is NEG) and (y is +), then Pol (y=−) NEG, +=− no upgrade
- Strength (Str):
- Str (arg1, arg2), where arg1 is a head and arg2 is a dependent. The rule applies locally in syntactic constituents/domains, e.g., nx, vx, cx, etc. It derives the strength of constituents on the basis of the strength of their parts and how they are syntactically related by the application of an arithmetic operation to the pair of arguments depending on the nature of the syntax-semantic relation and the polarity of the constituents. The strength rules apply to the lexical items and phrases in the three universal syntactic relations, and the strength is calculated on the basis of elemental arithmetic operations. The Strength rules include the following:
- Function (arg1, arg2), where arg1 is a head and arg2 is its dependent
Str ([x] [y])=Compose ([x], [y]) as specified by the following rule: -
- (24) if (x is the head (h)) and (y is the complement (o)), then Str (x)+Str (y)
- if (x is the head (h)) and (y is the modifier (m)), then Str (x)+Str (y)
Function (arg1, arg2), where arg1 is a modifier and arg2 is the modified
Str ([x] [y])=Compose ([x], [y]) as specified by the following rules:
- if (x is the head (h)) and (y is the modifier (m)), then Str (x)+Str (y)
- (25) if (x is JJ, RB) and (y is NN, VB), then Str(x)+Str (y)
- if (x is an JJ, RB) and (y is a JJ, RB), then Str (x)+Str (y)
- if (x is RB*) and (y is a JJ, RB), then Str (x)+Str (y)
- (24) if (x is the head (h)) and (y is the complement (o)), then Str (x)+Str (y)
- It is appreciated that social media messages may include more than one sentence, may talk about more than one asset, more than one stock event, and they may express more than one sentiment. Computing blindly the sentiment values of all the lexical items and phrases of social media messages, the resulting value is general and not necessarily asset specific. The
sentiment calculator 22 is sentence bound. Moreover it calculates sentiment in the local syntax-semantic domain of an asset. Thus, it ensures that the specific sentiment with respect to a given asset conveyed by a message is calculated. It applies iteratively in the local domain of the constituent including the asset (keyword, set of keywords), e.g. OIL, or GOLD, and the expression of a stock event (e.g., “lose”, “gain”, “sell”, “buy”) or a sentiment (e.g., “high”, “low”). - The following trace for the tweet (26) illustrates the application of the
sentiment calculator 22 that calculates sentiment-per-asset in the local domain of the targeted asset: Oil. The calculus assigned the value +3 to Oil, discarding the value of the computation for Canadian dollar, which is −5. -
- (26) Canadian dollar falls for second week. Crude Oil prices raises.
-
[root {oil: Positive,3.0,null} [sen {_: Negative,5.0,null} [c {_: Negative,5.0,null} [c0 {_: Negative,3.0,null} [nx {_: Null,null,null} [jj [{_: Null,null,null}] (Canadian)] [nn [{_: Null,null,null}] (dollar)] ] [vx {_: Negative,3.0,null} [vbz [{_: Negative,3.0,null}] (falls)] <<<< {−} ] ] [pp {_: Neutral,2.0,null} [in [{_: Null,null,null}] (for)] [nx {_: Neutral,2.0,null} [jj [{_: Neutral,2.0,null}] (second)] [tunit [{_: Null,null,null}] (week)] ] ] ] [per [{_: Null,null,null}] (.)] ] [sen {oil: Positive,3.0,null} [c {oil: Positive,3.0,null} [c0 {oil: Positive,3.0,null} [nx {oil: Null,0.0,null} [jj [{_: Null,null,null}] (Crude)] [nn [{oil: Null,0.0,null}] (oil)] <<<< {K} [nns [{_: Null,null,null}] (prices)] ] [vx {_: Positive,3.0,null} [vbz [{_: Positive,3.0,null}] (raises)] <<<< {+} ] ] ] ] ] - This example shows that every step of the computation by the modules of the system provides the structure for the application of the sentiment calculus. This calculus applies in local syntactic domains and provides an integer that represents the sentiment (polarity and strength) with respect to designated assets.
- 6. The Inference Engine
-
Inference engine 24 is part of expert systems, which are designed to process a problem expressing an uncertainty with respect to a decision, and to provide a decision, or a set of decisions reducing the uncertainty.Inference engine 24 attempts to provide an answer to a problem, or clarify uncertainties where normally one or more human experts would need to be consulted. - The
inference engine 24 of thepresent system 10 is part of the pipeline and provides a mechanism to sharpen the accuracy of the sentiment computed, by bringing both knowledge of thestock market world 23 and knowledge of theworld 25 into the computation. - The
inference engine 24 includes a data structure, and a set of inference rules (if X then Y) relating facts to sentiments. This knowledge interacts with the domain-specific knowledge stored in the lexicon and used by thesentiment calculator 22. - The
inference engine 24 includes a data structure, a knowledge base that uses some knowledge representation structure to capture the knowledge of a specific domain, for example a relational table relating entities in knowledge domains, and a set of inference rules applying to the entities in the relational table and drawing consequences. One advantage of inference rules over traditional programming is that inference rules use reasoning, which more closely resembles human reasoning. In the specific application of stock-market trade, the knowledge base consists of a relational table relating stock entities (tickers, company names, products, etc.), stock events (e.g., upgrade, downgrade) and facts, extracted from news wire. The rules of theinference engine 24 apply to the elements of the relational table and infer sentiment values. -
- (27) Damn you OPEC! Will this be the summer we finally see $5/gal at the pump??? I sure hope not. Kills any similar fun from last summer
- For example, the knowledge base includes (28) below, and the inference rules (29) below, stating that if gas oil (at the pump) is inferior to $3 then the sentiment value is positive, +2, if the gas oil is superior to $3 then the sentiment value is negative, −2. This real world knowledge varies according to time and place.
-
- (28) OPEC, oil, $X/gal, locations
- (29) In “$X/gal” expressions, where X is a digit
- if X is inferior to 3 then polarity=+, and strength is 2
- if X superior to 3 then polarity=−, an strength is 2
- The
sentiment calculator 22 alone would not derive the negative sentiment associated to the second sentence in (27). While thesentiment calculator 22 assigns the value neutral to questions, theinference engine 24 assigns the sentiment value of −2. - Thus, the
inference engine 24 ensures that the sentiment is grounded in the real world. It contributes to the innovative technology, which leads to both simplify and sharpen decision taking in stock market transactions. - Sentiment calculations in accordance with the
present system 10 are a result of the pipeline or multilayered embodied by the present invention that ingests social medial messages, identifies the language of the social media messages, and filters them from elements that are not part of natural language for which thesystem 10 has been parameterized (here English). ThePOS tagger 33 and the partial parser 32 modules of theNLP processor 16 assign parts of speech to the tokens of sentences, and recover the structure they are part of. The sentiment calculus of thesentiment calculator 22 applies to the annotated structures and derives the sentiment value per asset based on the sentiment value of the event they are part of. Finally theinference engine 24 reduces uncertainly by relying on a relational database including knowledge of the world information and a set of inference rules. - The present sentiment calculation system includes computer implemented mechanism for obtaining and converting ingested unstructured social media messages regarding a plurality of objects/assets being tracked into a sentiment value for each object/assets. The sentiment value includes a polarity value and strength value derived from a natural language processing algorithm containing a database of lexical items and phrases related to the objects being tracked. The precise sentiment value per object is derived by the compositional calculus based on the sentiment values of lexical items (and phrases) and their syntactic organization. The contextual sentiment value is based on the
inference engine 24 deriving a sentiment value with respect to knowledge of the world. The interaction of the sentiment calculus and theinference engine 24 yields accurate sentiment in real-time. The sentiment cognitive-based calculus relates conceptual processing with natural language processing algorithm. - 7. Reaction Indicator
- As discussed above, the data generated by the
sentiment calculator 22 is applied to agraphical user interface 30 that combines sentiment and intensity data relating to the assets. Thegraphical user interface 30 includes moving graphic objects displayed upon a monitor that depict social media market sentiment; atimeline slider object 46; and a verticalbar chart object 44. - In accordance with the present invention, the
graphical user interface 30 provides for the visualization of graphic objects in the form of movingspheres 40 where the sphere size and color depict social media market sentiment. The moving sphericalgraphic objects 40 shrink and grow based on intensity changes. The sphere color changes based on social media sentiment polarity. Thecenter sphere 40 a represents the weighted sentiment average. Clicking one of the moving sphericalgraphic objects 40 results in the display of a chart 42 (seeFIGS. 6A & 6B ) graphing (based on what the trader selects) all or a choice of price, volume, social media frequency, social media sentiment, cross-correlation and a variety of price and sentiment derived technical indicators. Sphere updates are based on a configurable polling time. - The
graphical user interface 30 contains atime slider 46 to go back to a point in time and replay history. Avertical bar chart 44 graphs the social media sentiment when thegraphical user interface 30 is in full screen mode. - The purpose of the
reaction indicator 31 is to provide a mechanism wherein hundreds of assets can be tracked, but only those that are “interesting” based on preprogrammed parameters will float to the surface and draw the viewer's attention. - More particularly, and with reference to
FIGS. 2 , 3, 4, 6 and 7, thereaction indicator 31 provides agraphical user interface 30 displaying three graphical areas of objects, moving sphericalgraphic objects 40, atimeline slider object 46 and a verticalbar chart object 44. It is noted the moving graphic objects may take shapes other than spheres, such as squares. Referring toFIG. 2 , the spherical moving graphical objects are represented at 40, the timeline slider object at 46 and the vertical bar chart object at 44. - The reaction indicator polls a data stream containing mathematically computed values for social media intensity, social media sentiment, social media frequency, social media weighted average frequency and social media weighted average sentiment auto refreshing the moving spherical
graphic objects 40 and the verticalbar chart object 44 based on a configurable polling time. Intensity is defined as the ratio of short term frequency divided by long term frequency. The mathematical computations for the data stream are calculated by an algorithm discussed herein in detail in a section related to cross correlation. The calculations are based upon information obtained from a multilayer pipeline architecture previously discussed. - Referring to
FIG. 7 , the moving sphericalgraphic objects 40 shrink and grow based on the social media intensity attribute and are sized relative to each other taking into consideration the stage size and browser screen resolution. The color of the moving sphericalgraphic object 40 is based on social media sentiment polarity where polarity is defined as negative, neutral or positive. Each of the moving sphericalgraphic objects 40 displays a label, social media sentiment and social media frequency. - The
center sphere 40 a object visualizes a weighted average of all sphere objects based on weights assigned to the spheres. Referring toFIG. 3 , the weighted sphere object is represented at 40 a. The weighted average sphere size is static relative to the other sphere objects, which shrink and grow, and displays weighted average social media sentiment and weighted average social media frequency, if sphere weights have been assigned. If sphere weights have not been assigned, the weighted average sphere object does not display any data. The weighted average sphere object does not change color to reflect social media sentiment polarity. An example where weights may play a role is in the instance where the visualization represents an Exchange Traded Fund (ETF). An ETF holds assets such as stocks, commodities or bonds. The assets would be represented in the spheres. The weight for each asset assigned would represent the percentage in the ETF for an amalgamation of all assets. - The
timeline slider object 46 visualizes a timeline where the date and time on the left represent the earliest date and time where data exists for the collection of moving sphericalgraphic objects 40. The date and time on the fight represents current date and time. Moving to various points on thetimeline slider object 46 move the moving sphericalgraphic objects 40 and the verticalbar chart object 44 to a point in time, pausing the real-time display, then replaying history. From the historical point in time selected, the moving sphericalgraphic object 40 and the verticalbar chart object 44 will poll the data stream coming from thesentiment calculator 22 for social media intensity, social media sentiment, social media frequency, social media weighted average frequency and social media weighted average sentiment from the point in time selected then rerun history as if it were happening real-time. Referring toFIG. 2 , the timeline slider object is represented at 46. - The vertical
bar chart object 44 utilizes the same data stream as the moving sphericalgraphic objects 40 to graph social media frequency, using the same color scheme as the spherical objects. Referring toFIG. 2 , the vertical bar chart object is represented at 44. - Clicking on a moving spherical
graphic object 40 will launch a chart, graphing price, volume, social media sentiment, social media frequency, and cross-correlation auto refreshing based on a configurable time, e.g. every second as seen in the screen shots depicted inFIGS. 6A and 6B . - Each of the moving spherical
graphic objects 40 display a symbol, such as an exclamation mark within the sphere, preferably in the center, when an alert has been triggered. Specifically, a trigger will result when sentiment and intensity variables cross certain thresholds, the related moving spherical graphic object shall display an exclamation mark, signaling a potential trading opportunity; for example, when the sentiment and intensity for a given asset A exceeds a preprogrammed value indicating sell. An exclamation mark will be displayed in the center of sphere A alerting the operator to take action. The operator shall have the option of directly executing a trade via a combination of keyclicks. The operator can program thereaction indicator 31 to automatically place a trade. The operator can program thereaction indicator 31 to send an alert via e-mail or text message. - In summary, the
reaction indicator 31 comprises a plurality of movinggraphic objects 40 which change size and color based upon social media market sentiment, intensity and frequency captured and correlated in real-time from a stream of online social media messages related to a market segment. The moving sphericalgraphic objects 40 shrink or grow in size based upon the social media intensity attributed to each moving sphericalgraphic object 40 and the moving sphericalgraphic objects 40 change color based upon whether the social media sentiment attributed to each moving spherical graphic object is positive, negative or neutral. Thereaction indicator 31 also provides a weighted average of all displayed moving sphericalgraphic objects 40 displayed based on weights assigned to the objects prior to capturing social media streams is displayed among the plurality of displayed objects. - As discussed above, once sentiment and intensity are fully appreciated, the present system and method provides a mechanism for cross-correlating the sentiment and intensity data with the actual fluctuations with asset prices. The present invention provides two methods to find patterns in a target real-valued time series by utilizing two other real-valued time series derived from a stream of social-media messages (Twitter for instance): sentiment and frequency.
-
- The target is arbitrary. It represents a quantifiable property of the asset that is being tracked. For instance, we have applied the algorithm using stocks and commodities as asset, and their market prices as targets.
- The sentiment, as defined previously, is relative to the asset underlying the target.
- The frequency represents the volume of messages about the asset. It is derived from the sentiment time series and a parameter called the window size.
- When supplied with a window size, and applied in real-time those methods have a predictive value on the target. For this reason the series used to find patterns in the target, such as the sentiment series and the frequency series, are called predictive. As shown in
FIG. 4 , the patterns can be depicted graphically on charts, together with the time series, to be used as a decision making tool. - The patterns can also serve as the input to an automated trading system to generate trading signals.
- In the example shown in
FIG. 4 , the curves are a depiction of the sentiment time-series for the target (thick curve labeled ss) and the sentiment-frequency time series (thin curve sf). The calculation of the sentiment-frequency series will be described later. - From a visual inspection of the picture it is easy to see that the target is reproducing the bell pattern the sentiment-frequency curve had earlier. This provides the ability to predict the future move of the target better. Looking at the sentiment times series ss for the target only, it seems the target is dropping sharply. However, using the pattern of the sentiment-frequency, one can anticipate that the target will soon experience a rather important rebound. This is the predictive value of the method. A visual inspection of
FIGS. 6A and 6B will reveal that sentiment, despite NOT being derived from price, can show extremely strong correlation to price, either as a leading indicator or a supporting indicator, both scenarios being extremely relevant and useful to stock traders. - As will be appreciated based upon the following disclosure, the method of the present invention finds patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social-media messages, wherein the target represents a quantifiable property of an asset being tracked. The method includes identifying a target, which is a sampled real-valued time series; generating a sentiment time series, ss (which is plotted); generating a frequency time series plot, sf (which is plotted); and determining a pattern based upon the sentiment time series and the frequency time series.
- A real-valued time series is defined as a sequence of pairs (time (t), value (s)), also called points, ordered by increasing time. A simple time series could look like this: [(12:36,27),(13:03,37),(16:34,88)].
-
-
- For example,
-
={(12:36,27),(13:03,37),(16:34,88)} -
V(s)=[(12:36,27),(13:03,37),(16:34,88)] -
V 1(s)=[12:36,13:03,16:34] -
V 2(s)=[27,37,88]. - A semantic distinction is drawn between pulsated time series where points represent a punctual event (i.e., sequence of Diracs), such as the arrival of a message, and sampled time series that represent a discretization of a function that's defined at all times, such as the market price. It is thus natural to interpolate points of a sampled time series to try and recover the original function it was sampled from.
- The target is an arbitrary sampled real-valued time series. The algorithm has been applied with prices as target.
- The sentiment time series ss is generated by the Natural Language Processing (NLP)
module 16. It is a pulsated time series. For each message in the input stream, the sentiment time series contains a pair whose time is the time when the message was posted, and whose value is the result of theNLP processor 16. This value is called sentiment. - The frequency time series sf depends on two parameters: the sentiment time series and a positive number w representing a time called window size. It is a pulsated time series. For each point (t, s) in the sentiment series, the frequency series contains a point (t, f) where f is the number of points in the sentiment series in the time range [t−w, t], divided by w. This number f is called frequency.
- Formally,
-
- A pattern P is defined as a cross-correlation c in [−1,1], a positive window size w, a time lag l, and a time ts. These numbers are interpreted as “the predictive series over [ts−w, ts] correlates to the target series over [ts−w+l, ts+l] with a cross-correlation of c”.
-
- If the lag is positive, it is said to be predictive. The cross-correlation determines the relevance of the pattern: the higher it is, the more relevant the pattern is considered.
- Pattern Identification Method
- The method is called the sentiment-frequency method. It uses the sentiment to create a sentiment-frequency series, and correlates the latter to the target using a plain statistical cross-correlation. It then identifies patterns by finding the optimal time lag.
- Correlating two time-series using a plain statistical cross-correlation and finding the optimal lag is an independent component. This component is called the series correlator and is described below.
- Sentiment-Frequency Method
- The system first creates an average sentiment series sa such that for every point (t,s) in the sentiment time series ss there is a point (t, a) in the average sentiment series where a is the arithmetic average of all the sentiments in the time range, or interval [t−w, t].
- Formally let,
-
s a={(t,A w(t))|tεV 1(s s ∩[t−w,t])} - The system then creates the sentiment-frequency series ssf to contain a point (t,vsf) for every (t,a) in the sentiment series and (t,f) in the frequency time series sf, where vsf=fa(=ea ln(f)).
- Formally define as:
-
s sf={(t,f a)|(t,a)εs a,(t,f)εs f} - Next the series correlator is applied to the sentiment-frequency series and the target.
- Series Correlator
- The series correlator produces a set of patterns based on a real-valued pulsated time series sp, a real-valued sampled time series ss, an interpolation method I for ss, and a window size w.
- The interpolation method I, is a function of a time series ss and of a time t that is C1-piecewise continuous with respect to t, and such that if there exists a point (t,v) in ss, I(ss, t)=v. Interpolation is a classical subject and it will not be described here. Common interpolation methods are linear or cubic splines.
- Formally,
-
∀(t,v)εs s ,I(s s ,t)=v - For any time t and lag l, we defined the vector Es (ss, sp,t,l) so that for every (tp,p) in sp with tp in [t−w,t], Es(ss,sp,t,l) contains the point i(ss,tp+l). We call Es (ss, sp,t,l) the interpolated.
- Formally,
-
E s(s s ,s p ,t,l)=I(s s ,V 1(s p ∩[t−w,t])i +l) - The system also defines the vector Ep(sp,t) so that for every (tp,p) in sp with tp in [t−w,t], Ep(sp, t) contains the point p.
- Formally,
-
E p =V 2(s p ∩[t−w,t]) - The cross-correlation CC(ss, sp, t, l) is defined as the scalar product of Ep(sp,t) and Es(ss,sp, t, l) divided by the product of their norms.
- Formally,
-
-
- From the definition above, the local maximums of CCt:lεCC(ss,sp,t,l) simply move linearly with t when no points of sp leaves or enters [t−w,t]. Hence the sets of local maximums of CCt for t or (t−w) the time of a point in sp is a finite set that represents completely the set of local maximums of CCt for all t.
- For every w, the system computes a finite set of times t and lags l and a cross-correlation c for each of them. This defines a finite set of patterns (c, w, t, l) which the system orders by relevance.
- Real-Time Target Prediction
- The system runs the previous algorithm for t=now. The system then chooses the one with the most relevant predictive lag, and project that the target will behave like the sentiment-frequency curve.
- When applied to real-time several optimizations are made:
-
- Non-predictive lags can be ignored (we don't have data on the target in the future)
- The system only computes new local maximums when a new point arrives in the sentiment series.
- Updating the cross-correlation series can be optimized, not all the scalar products have to be recomputed.
- The system can reuse the local maximums we had already identified to find the new ones.
- In summary, the system for sentiment, intensity cross-correlation provides for time-based cross-correlation between the real-time sentiment value and frequency of a message stream relative to an object and a quantifiable property of that object. The time correlation relates patterns in the sentiment and frequency to patterns in the object property. The cross-correlation system further includes graphical depictions showing relations identified by the patterns between the object property and the sentiment, frequency, and any quantity derived from them. The cross-correlation system also includes event prediction of future up and down movement of the object property based upon the aforementioned patterns, as well as trading signals generated on and trading strategies based on the aforementioned patterns.
- While the preferred embodiments have been shown and described, it will be understood that there is no intent to limit the invention by such disclosure, but rather, is intended to cover all modifications and alternate constructions falling within the spirit and scope of the invention.
Claims (15)
1. A method for finding patterns in a target real-valued time series by utilizing sentiment and frequency derived from a stream of social media messages, wherein the target represents a quantifiable property of an asset being tracked, comprising:
identifying a target, which is a sampled real-valued time series;
generating a sentiment time series, ss, relating to an asset;
generating a frequency time series, sf, relating to an asset;
determining a pattern based upon the sentiment time series and the frequency time series.
2. The method according to claim 1 , wherein sentiment is an expression of a psychological state relative to an event.
3. The method according to claim 1 , wherein frequency represents the volume of social media messages about the asset.
4. The method according to claim 1 , wherein the step of generating a sentiment time series is performed by language processing and is derived based upon pairs of lexical items in local syntactic context found in a volume of social media messages.
5. The method according to claim 4 , wherein the step of generating a sentiment time series includes the creation of an average sentiment series, sa, such that for every point (t,s) in the sentiment time series, ss, there is a point (t, a) in an average sentiment series where “a” is the arithmetic average of all the sentiments in a time range [t−w, t].
6. The method according to claim 5 , wherein the step of generating a sentiment time series includes the creation of a sentiment-frequency series, ssf, to contain a point (t,vsf) for every (t, a) in the sentiment time series, ss, and (t, f) in the frequency time series, sf, where vsf=fa(=ea ln(f)).
7. The method according to claim 1 , wherein the frequency time series, sf, is dependent upon the sentiment time series, ss, and a positive number w representing a time called window size.
8. The method according to claim 7 , wherein for each point (t, s) in the sentiment time series, ss, the frequency time series, sf, contains a point (t, f) where f is the number of points in the sentiment time series, ss, in the time range [t−w, t], divided by w.
9. The method according to claim 8 , wherein the number f is called frequency and
10. The method according to claim 9 , wherein the pattern P is a cross-correlation c in [−1,1], a positive window size w, a time lag l, and a time ts, and these numbers are interpreted as a predictive series over [ts−w, ts] correlating to the target series over [ts−w+l, ts+l] with a cross-correlation of c″.
11. The method according to claim 1 , wherein the step of determining a pattern employs a sentiment-frequency method that uses sentiment to create a sentiment-frequency series, sfs, and correlates to the target using a plain statistical cross-correlation.
12. The method according to claim 11 , wherein the step of determining a pattern includes the step of identifying an optimal time lag.
13. The method according to claim 12 , wherein correlating two time-series using a plain statistical cross-correlation and finding the optimal lag is achieved with a series correlator.
14. The method according to claim 13 , wherein the series correlator produces a set of patterns based on a real-valued pulsated time series sp, a real-valued sampled time series, ss, an interpolation method I for ss, and a window size w.
15. The method according to claim 11 , wherein the interpolation method I, is a function of a time series ss and of a time t that is C1-piecewise continuous with respect to t, and such that if there exists a point (t, v) in ss, I(ss, t)=v.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/427,833 US20130073480A1 (en) | 2011-03-22 | 2012-03-22 | Real time cross correlation of intensity and sentiment from social media messages |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161466067P | 2011-03-22 | 2011-03-22 | |
US13/427,833 US20130073480A1 (en) | 2011-03-22 | 2012-03-22 | Real time cross correlation of intensity and sentiment from social media messages |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130073480A1 true US20130073480A1 (en) | 2013-03-21 |
Family
ID=46878142
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/427,828 Expired - Fee Related US8856056B2 (en) | 2011-03-22 | 2012-03-22 | Sentiment calculus for a method and system using social media for event-driven trading |
US13/427,819 Active 2033-10-02 US9940672B2 (en) | 2011-03-22 | 2012-03-22 | System for generating data from social media messages for the real-time evaluation of publicly traded assets |
US13/427,830 Abandoned US20120246054A1 (en) | 2011-03-22 | 2012-03-22 | Reaction indicator for sentiment of social media messages |
US13/427,833 Abandoned US20130073480A1 (en) | 2011-03-22 | 2012-03-22 | Real time cross correlation of intensity and sentiment from social media messages |
US15/904,819 Abandoned US20180182038A1 (en) | 2011-03-22 | 2018-02-26 | System for generating data from social media messages for the real-time evaluation of publicly traded assets |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/427,828 Expired - Fee Related US8856056B2 (en) | 2011-03-22 | 2012-03-22 | Sentiment calculus for a method and system using social media for event-driven trading |
US13/427,819 Active 2033-10-02 US9940672B2 (en) | 2011-03-22 | 2012-03-22 | System for generating data from social media messages for the real-time evaluation of publicly traded assets |
US13/427,830 Abandoned US20120246054A1 (en) | 2011-03-22 | 2012-03-22 | Reaction indicator for sentiment of social media messages |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/904,819 Abandoned US20180182038A1 (en) | 2011-03-22 | 2018-02-26 | System for generating data from social media messages for the real-time evaluation of publicly traded assets |
Country Status (1)
Country | Link |
---|---|
US (5) | US8856056B2 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130018896A1 (en) * | 2011-07-13 | 2013-01-17 | Bluefin Labs, Inc. | Topic and Time Based Media Affinity Estimation |
US9104734B2 (en) | 2012-02-07 | 2015-08-11 | Social Market Analytics, Inc. | Systems and methods of detecting, measuring, and extracting signatures of signals embedded in social media data streams |
US9418389B2 (en) * | 2012-05-07 | 2016-08-16 | Nasdaq, Inc. | Social intelligence architecture using social media message queues |
US20170083817A1 (en) * | 2015-09-23 | 2017-03-23 | Isentium, Llc | Topic detection in a social media sentiment extraction system |
US10185996B2 (en) * | 2015-07-15 | 2019-01-22 | Foundation Of Soongsil University Industry Cooperation | Stock fluctuation prediction method and server |
US10440402B2 (en) | 2011-01-26 | 2019-10-08 | Afterlive.tv Inc | Method and system for generating highlights from scored data streams |
JP2020123401A (en) * | 2015-11-16 | 2020-08-13 | ウバープル カンパニー リミテッド | Method for displaying asset information |
Families Citing this family (117)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8565781B2 (en) | 2007-07-27 | 2013-10-22 | Intertrust Technologies Corporation | Content publishing systems and methods |
US10339541B2 (en) | 2009-08-19 | 2019-07-02 | Oracle International Corporation | Systems and methods for creating and inserting application media content into social media system displays |
US11620660B2 (en) | 2009-08-19 | 2023-04-04 | Oracle International Corporation | Systems and methods for creating and inserting application media content into social media system displays |
US20120011432A1 (en) | 2009-08-19 | 2012-01-12 | Vitrue, Inc. | Systems and methods for associating social media systems and web pages |
US9633399B2 (en) * | 2009-08-19 | 2017-04-25 | Oracle International Corporation | Method and system for implementing a cloud-based social media marketing method and system |
US9324112B2 (en) | 2010-11-09 | 2016-04-26 | Microsoft Technology Licensing, Llc | Ranking authors in social media systems |
US9286619B2 (en) | 2010-12-27 | 2016-03-15 | Microsoft Technology Licensing, Llc | System and method for generating social summaries |
WO2012116236A2 (en) | 2011-02-23 | 2012-08-30 | Nova Spivack | System and method for analyzing messages in a network or across networks |
US20130018892A1 (en) * | 2011-07-12 | 2013-01-17 | Castellanos Maria G | Visually Representing How a Sentiment Score is Computed |
US20130159219A1 (en) * | 2011-12-14 | 2013-06-20 | Microsoft Corporation | Predicting the Likelihood of Digital Communication Responses |
US8832092B2 (en) | 2012-02-17 | 2014-09-09 | Bottlenose, Inc. | Natural language processing optimized for micro content |
US20130263019A1 (en) * | 2012-03-30 | 2013-10-03 | Maria G. Castellanos | Analyzing social media |
US8620718B2 (en) * | 2012-04-06 | 2013-12-31 | Unmetric Inc. | Industry specific brand benchmarking system based on social media strength of a brand |
JP5607859B2 (en) * | 2012-04-25 | 2014-10-15 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Sentence classification method based on evaluation polarity, computer program, computer |
US20130297546A1 (en) * | 2012-05-07 | 2013-11-07 | The Nasdaq Omx Group, Inc. | Generating synthetic sentiment using multiple transactions and bias criteria |
US9374374B2 (en) * | 2012-06-19 | 2016-06-21 | SecureMySocial, Inc. | Systems and methods for securing social media for users and businesses and rewarding for enhancing security |
US9460473B2 (en) * | 2012-06-26 | 2016-10-04 | International Business Machines Corporation | Content-sensitive notification icons |
US9678948B2 (en) * | 2012-06-26 | 2017-06-13 | International Business Machines Corporation | Real-time message sentiment awareness |
US10165067B2 (en) * | 2012-06-29 | 2018-12-25 | Nuvi, Llc | Systems and methods for visualization of electronic social network content |
US9009126B2 (en) | 2012-07-31 | 2015-04-14 | Bottlenose, Inc. | Discovering and ranking trending links about topics |
US9539498B1 (en) | 2012-07-31 | 2017-01-10 | Niantic, Inc. | Mapping real world actions to a virtual world associated with a location-based game |
US9226106B1 (en) | 2012-07-31 | 2015-12-29 | Niantic, Inc. | Systems and methods for filtering communication within a location-based game |
US9604131B1 (en) | 2012-07-31 | 2017-03-28 | Niantic, Inc. | Systems and methods for verifying player proximity within a location-based game |
US9669293B1 (en) | 2012-07-31 | 2017-06-06 | Niantic, Inc. | Game data validation |
US9782668B1 (en) | 2012-07-31 | 2017-10-10 | Niantic, Inc. | Placement of virtual elements in a virtual world associated with a location-based parallel reality game |
US9128789B1 (en) | 2012-07-31 | 2015-09-08 | Google Inc. | Executing cross-cutting concerns for client-server remote procedure calls |
US9621635B1 (en) | 2012-07-31 | 2017-04-11 | Niantic, Inc. | Using side channels in remote procedure calls to return information in an interactive environment |
US9669296B1 (en) | 2012-07-31 | 2017-06-06 | Niantic, Inc. | Linking real world activities with a parallel reality game |
EP2885756A4 (en) * | 2012-08-15 | 2016-07-06 | Thomson Reuters Glo Resources | System and method for forming predictions using event-based sentiment analysis |
US9727925B2 (en) | 2012-09-09 | 2017-08-08 | Oracle International Corporation | Method and system for implementing semantic analysis of internal social network content |
US9852239B2 (en) * | 2012-09-24 | 2017-12-26 | Adobe Systems Incorporated | Method and apparatus for prediction of community reaction to a post |
US8968099B1 (en) | 2012-11-01 | 2015-03-03 | Google Inc. | System and method for transporting virtual objects in a parallel reality game |
CN103812906B (en) * | 2012-11-14 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Website recommendation method and device and communication system |
US10395321B2 (en) * | 2012-11-30 | 2019-08-27 | Facebook, Inc. | Dynamic expressions for representing features in an online system |
US10671926B2 (en) | 2012-11-30 | 2020-06-02 | Servicenow, Inc. | Method and system for generating predictive models for scoring and prioritizing opportunities |
US9280739B2 (en) | 2012-11-30 | 2016-03-08 | Dxcontinuum Inc. | Computer implemented system for automating the generation of a business decision analytic model |
US10706359B2 (en) | 2012-11-30 | 2020-07-07 | Servicenow, Inc. | Method and system for generating predictive models for scoring and prioritizing leads |
US9460083B2 (en) | 2012-12-27 | 2016-10-04 | International Business Machines Corporation | Interactive dashboard based on real-time sentiment analysis for synchronous communication |
US9690775B2 (en) | 2012-12-27 | 2017-06-27 | International Business Machines Corporation | Real-time sentiment analysis for synchronous communication |
US9477704B1 (en) * | 2012-12-31 | 2016-10-25 | Teradata Us, Inc. | Sentiment expression analysis based on keyword hierarchy |
US9294576B2 (en) | 2013-01-02 | 2016-03-22 | Microsoft Technology Licensing, Llc | Social media impact assessment |
US8762302B1 (en) | 2013-02-22 | 2014-06-24 | Bottlenose, Inc. | System and method for revealing correlations between data streams |
WO2014138415A1 (en) * | 2013-03-06 | 2014-09-12 | Northwestern University | Linguistic expression of preferences in social media for prediction and recommendation |
US9432325B2 (en) | 2013-04-08 | 2016-08-30 | Avaya Inc. | Automatic negative question handling |
US20150317038A1 (en) * | 2014-05-05 | 2015-11-05 | Marty Mianji | Method and apparatus for organizing, stamping, and submitting pictorial data |
US9514133B1 (en) | 2013-06-25 | 2016-12-06 | Jpmorgan Chase Bank, N.A. | System and method for customized sentiment signal generation through machine learning based streaming text analytics |
US9268770B1 (en) | 2013-06-25 | 2016-02-23 | Jpmorgan Chase Bank, N.A. | System and method for research report guided proactive news analytics for streaming news and social media |
JP6150282B2 (en) * | 2013-06-27 | 2017-06-21 | 国立研究開発法人情報通信研究機構 | Non-factoid question answering system and computer program |
SG10201403898TA (en) * | 2013-07-05 | 2015-02-27 | Barrett Carter Keith | Computer-implemented intelligence tool |
US10463953B1 (en) | 2013-07-22 | 2019-11-05 | Niantic, Inc. | Detecting and preventing cheating in a location-based game |
US9262438B2 (en) * | 2013-08-06 | 2016-02-16 | International Business Machines Corporation | Geotagging unstructured text |
US9715492B2 (en) | 2013-09-11 | 2017-07-25 | Avaya Inc. | Unspoken sentiment |
US9545565B1 (en) | 2013-10-31 | 2017-01-17 | Niantic, Inc. | Regulating and scoring player interactions within a virtual world associated with a location-based parallel reality game |
US20150134402A1 (en) * | 2013-11-11 | 2015-05-14 | Yahoo! Inc. | System and method for network-oblivious community detection |
US10515631B2 (en) | 2013-12-17 | 2019-12-24 | Koninklijke Philips N.V. | System and method for assessing the cognitive style of a person |
US20150206243A1 (en) * | 2013-12-27 | 2015-07-23 | Martin Camins | Method and system for measuring financial asset predictions using social media |
US9241069B2 (en) | 2014-01-02 | 2016-01-19 | Avaya Inc. | Emergency greeting override by system administrator or routing to contact center |
US10346752B2 (en) * | 2014-04-17 | 2019-07-09 | International Business Machines Corporation | Correcting existing predictive model outputs with social media features over multiple time scales |
US20150309965A1 (en) * | 2014-04-28 | 2015-10-29 | Elwha Llc | Methods, systems, and devices for outcome prediction of text submission to network based on corpora analysis |
US20150310003A1 (en) * | 2014-04-28 | 2015-10-29 | Elwha Llc | Methods, systems, and devices for machines and machine states that manage relation data for modification of documents based on various corpora and/or modification data |
GB2526622A (en) * | 2014-05-30 | 2015-12-02 | Mastercard International Inc | Graphically rendering account data |
US11354340B2 (en) | 2014-06-05 | 2022-06-07 | International Business Machines Corporation | Time-based optimization of answer generation in a question and answer system |
US9785684B2 (en) | 2014-06-05 | 2017-10-10 | International Business Machines Corporation | Determining temporal categories for a domain of content for natural language processing |
US20150363796A1 (en) * | 2014-06-13 | 2015-12-17 | Thomson Licensing | System and method for filtering social media messages for presentation on digital signage systems |
US9646198B2 (en) | 2014-08-08 | 2017-05-09 | International Business Machines Corporation | Sentiment analysis in a video conference |
US9648061B2 (en) | 2014-08-08 | 2017-05-09 | International Business Machines Corporation | Sentiment analysis in a video conference |
US10922657B2 (en) | 2014-08-26 | 2021-02-16 | Oracle International Corporation | Using an employee database with social media connections to calculate job candidate reputation scores |
US9978362B2 (en) | 2014-09-02 | 2018-05-22 | Microsoft Technology Licensing, Llc | Facet recommendations from sentiment-bearing content |
US10706432B2 (en) * | 2014-09-17 | 2020-07-07 | [24]7.ai, Inc. | Method, apparatus and non-transitory medium for customizing speed of interaction and servicing on one or more interactions channels based on intention classifiers |
US10666664B2 (en) * | 2014-11-06 | 2020-05-26 | Pcms Holdings, Inc. | System and method of providing location-based privacy on social media |
US11599841B2 (en) | 2015-01-05 | 2023-03-07 | Saama Technologies Inc. | Data analysis using natural language processing to obtain insights relevant to an organization |
KR101634086B1 (en) * | 2015-01-19 | 2016-07-08 | 주식회사 엔씨소프트 | Method and computer system of analyzing communication situation based on emotion information |
US9805128B2 (en) | 2015-02-18 | 2017-10-31 | Xerox Corporation | Methods and systems for predicting psychological types |
US20160260166A1 (en) * | 2015-03-02 | 2016-09-08 | Trade Social, LLC | Identification, curation and trend monitoring for uncorrelated information sources |
US10521420B2 (en) | 2015-07-31 | 2019-12-31 | International Business Machines Corporation | Analyzing search queries to determine a user affinity and filter search results |
US10572206B2 (en) * | 2015-08-28 | 2020-02-25 | Vinuth Tulasi | System and method for minimizing screen space required for displaying auxiliary content |
US10187675B2 (en) * | 2015-10-12 | 2019-01-22 | The Nielsen Company (Us), Llc | Methods and apparatus to identify co-relationships between media using social media |
US10073794B2 (en) | 2015-10-16 | 2018-09-11 | Sprinklr, Inc. | Mobile application builder program and its functionality for application development, providing the user an improved search capability for an expanded generic search based on the user's search criteria |
US20170148097A1 (en) * | 2015-11-23 | 2017-05-25 | Indiana University Research And Technology Corporation | Systems and methods for deriving financial information from emotional content analysis |
US11004096B2 (en) | 2015-11-25 | 2021-05-11 | Sprinklr, Inc. | Buy intent estimation and its applications for social media data |
WO2017100361A1 (en) * | 2015-12-08 | 2017-06-15 | Dennehy Matthew T | System and method for tracking stock fluctuations |
TWI582683B (en) * | 2015-12-08 | 2017-05-11 | 宏碁股份有限公司 | Electronic device and the method for operation thereof |
CN106886513A (en) * | 2015-12-16 | 2017-06-23 | 宏碁股份有限公司 | Electronic installation and its operating method |
US10042842B2 (en) * | 2016-02-24 | 2018-08-07 | Utopus Insights, Inc. | Theft detection via adaptive lexical similarity analysis of social media data streams |
US10133735B2 (en) | 2016-02-29 | 2018-11-20 | Rovi Guides, Inc. | Systems and methods for training a model to determine whether a query with multiple segments comprises multiple distinct commands or a combined command |
US10031967B2 (en) * | 2016-02-29 | 2018-07-24 | Rovi Guides, Inc. | Systems and methods for using a trained model for determining whether a query comprising multiple segments relates to an individual query or several queries |
US20170300823A1 (en) * | 2016-04-13 | 2017-10-19 | International Business Machines Corporation | Determining user influence by contextual relationship of isolated and non-isolated content |
US20180025545A1 (en) * | 2016-07-19 | 2018-01-25 | Pol-Lin Tai | Method for creating visualized effect for data |
CN106295702B (en) * | 2016-08-15 | 2019-10-25 | 西北工业大学 | A kind of social platform user classification method based on the analysis of individual affective behavior |
US20180053197A1 (en) * | 2016-08-18 | 2018-02-22 | International Business Machines Corporation | Normalizing user responses to events |
US10397326B2 (en) | 2017-01-11 | 2019-08-27 | Sprinklr, Inc. | IRC-Infoid data standardization for use in a plurality of mobile applications |
US10831796B2 (en) * | 2017-01-15 | 2020-11-10 | International Business Machines Corporation | Tone optimization for digital content |
US10489510B2 (en) * | 2017-04-20 | 2019-11-26 | Ford Motor Company | Sentiment analysis of product reviews from social media |
JP6959044B2 (en) * | 2017-06-23 | 2021-11-02 | 株式会社野村総合研究所 | Recording server, recording method and program |
US11934937B2 (en) * | 2017-07-10 | 2024-03-19 | Accenture Global Solutions Limited | System and method for detecting the occurrence of an event and determining a response to the event |
US10717005B2 (en) | 2017-07-22 | 2020-07-21 | Niantic, Inc. | Validating a player's real-world location using activity within a parallel reality game |
US10574601B2 (en) * | 2017-08-03 | 2020-02-25 | International Business Machines Corporation | Managing and displaying online messages along timelines |
CN107526831B (en) * | 2017-09-04 | 2020-03-31 | 华为技术有限公司 | Natural language processing method and device |
US11238535B1 (en) | 2017-09-14 | 2022-02-01 | Wells Fargo Bank, N.A. | Stock trading platform with social network sentiment |
US11164266B2 (en) * | 2017-10-27 | 2021-11-02 | International Business Machines Corporation | Protection of water providing entities from loss due to environmental events |
CN108009148B (en) * | 2017-11-16 | 2021-04-27 | 天津大学 | Text emotion classification representation method based on deep learning |
US10528660B2 (en) | 2017-12-02 | 2020-01-07 | International Business Machines Corporation | Leveraging word patterns in the language of popular influencers to predict popular trends |
US11238087B2 (en) * | 2017-12-21 | 2022-02-01 | Microsoft Technology Licensing, Llc | Social analytics based on viral mentions and threading |
US10380613B1 (en) | 2018-11-07 | 2019-08-13 | Capital One Services, Llc | System and method for analyzing cryptocurrency-related information using artificial intelligence |
US10789430B2 (en) | 2018-11-19 | 2020-09-29 | Genesys Telecommunications Laboratories, Inc. | Method and system for sentiment analysis |
CA3120977A1 (en) * | 2018-11-19 | 2020-05-28 | Genesys Telecummications Laboratories, Inc. | Method and system for sentiment analysis |
CN109949076B (en) * | 2019-02-26 | 2022-02-18 | 北京首钢自动化信息技术有限公司 | Method for establishing hypersphere mapping model, information recommendation method and device |
US11461847B2 (en) * | 2019-03-21 | 2022-10-04 | The University Of Chicago | Applying a trained model to predict a future value using contextualized sentiment data |
US11573995B2 (en) * | 2019-09-10 | 2023-02-07 | International Business Machines Corporation | Analyzing the tone of textual data |
GB201915879D0 (en) * | 2019-10-31 | 2019-12-18 | Black Swan Data Ltd | Using social data to improve long term sales forecasting |
US12106061B2 (en) | 2020-04-29 | 2024-10-01 | Clarabridge, Inc. | Automated narratives of interactive communications |
US11546285B2 (en) * | 2020-04-29 | 2023-01-03 | Clarabridge, Inc. | Intelligent transaction scoring |
US11689487B1 (en) * | 2020-07-16 | 2023-06-27 | Kynami, Inc. | System and method for identifying and blocking trolls on social networks |
US10878505B1 (en) | 2020-07-31 | 2020-12-29 | Agblox, Inc. | Curated sentiment analysis in multi-layer, machine learning-based forecasting model using customized, commodity-specific neural networks |
US20220261818A1 (en) * | 2021-02-16 | 2022-08-18 | RepTrak Holdings, Inc. | System and method for determining and managing reputation of entities and industries through use of media data |
US20220383411A1 (en) * | 2021-06-01 | 2022-12-01 | Jpmorgan Chase Bank, N.A. | Method and system for assessing social media effects on market trends |
US11797517B2 (en) * | 2021-06-21 | 2023-10-24 | Yahoo Assets Llc | Public content validation and presentation method and apparatus |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080270116A1 (en) * | 2007-04-24 | 2008-10-30 | Namrata Godbole | Large-Scale Sentiment Analysis |
US20100325031A1 (en) * | 2009-06-18 | 2010-12-23 | Penson Worldwide, Inc. | Method and system for trading financial assets |
Family Cites Families (98)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6122628A (en) * | 1997-10-31 | 2000-09-19 | International Business Machines Corporation | Multidimensional data clustering and dimension reduction for indexing and searching |
US7216304B1 (en) * | 2000-01-05 | 2007-05-08 | Apple Inc. | Graphical user interface for computers having variable size icons |
JP3573688B2 (en) * | 2000-06-28 | 2004-10-06 | 松下電器産業株式会社 | Similar document search device and related keyword extraction device |
US7185065B1 (en) * | 2000-10-11 | 2007-02-27 | Buzzmetrics Ltd | System and method for scoring electronic messages |
US20070226640A1 (en) * | 2000-11-15 | 2007-09-27 | Holbrook David M | Apparatus and methods for organizing and/or presenting data |
US8285619B2 (en) | 2001-01-22 | 2012-10-09 | Fred Herz Patents, LLC | Stock market prediction using natural language processing |
US7159178B2 (en) * | 2001-02-20 | 2007-01-02 | Communispace Corp. | System for supporting a virtual community |
US7216073B2 (en) * | 2001-03-13 | 2007-05-08 | Intelligate, Ltd. | Dynamic natural language understanding |
US20080015871A1 (en) * | 2002-04-18 | 2008-01-17 | Jeff Scott Eder | Varr system |
JP2005010854A (en) * | 2003-06-16 | 2005-01-13 | Sony Computer Entertainment Inc | Information presenting method and system |
GB2403636A (en) * | 2003-07-02 | 2005-01-05 | Sony Uk Ltd | Information retrieval using an array of nodes |
US7213206B2 (en) * | 2003-09-09 | 2007-05-01 | Fogg Brian J | Relationship user interface |
US7865354B2 (en) * | 2003-12-05 | 2011-01-04 | International Business Machines Corporation | Extracting and grouping opinions from text documents |
JP4394517B2 (en) | 2004-05-12 | 2010-01-06 | 富士通株式会社 | Feature information extraction method, feature information extraction program, and feature information extraction device |
WO2006039566A2 (en) | 2004-09-30 | 2006-04-13 | Intelliseek, Inc. | Topical sentiments in electronically stored communications |
US20060242040A1 (en) * | 2005-04-20 | 2006-10-26 | Aim Holdings Llc | Method and system for conducting sentiment analysis for securities research |
US20070005477A1 (en) | 2005-06-24 | 2007-01-04 | Mcatamney Pauline | Interactive asset data visualization guide |
US20070005564A1 (en) * | 2005-06-29 | 2007-01-04 | Mark Zehner | Method and system for performing multi-dimensional searches |
US7502789B2 (en) * | 2005-12-15 | 2009-03-10 | Microsoft Corporation | Identifying important news reports from news home pages |
EP1989639A4 (en) * | 2006-02-28 | 2012-05-02 | Buzzlogic Inc | Social analytics system and method for analyzing conversations in social media |
US20070239590A1 (en) * | 2006-04-07 | 2007-10-11 | Lee Gang P | Two-step method and system for commodity trading |
US7882014B2 (en) * | 2006-04-28 | 2011-02-01 | Pipeline Financial Group, Inc. | Display of market impact in algorithmic trading engine |
US7720835B2 (en) * | 2006-05-05 | 2010-05-18 | Visible Technologies Llc | Systems and methods for consumer-generated media reputation management |
US9269068B2 (en) * | 2006-05-05 | 2016-02-23 | Visible Technologies Llc | Systems and methods for consumer-generated media reputation management |
US20090070683A1 (en) * | 2006-05-05 | 2009-03-12 | Miles Ward | Consumer-generated media influence and sentiment determination |
US7676518B2 (en) * | 2006-08-16 | 2010-03-09 | Sap Ag | Clustering for structured data |
US8862591B2 (en) * | 2006-08-22 | 2014-10-14 | Twitter, Inc. | System and method for evaluating sentiment |
US8271429B2 (en) * | 2006-09-11 | 2012-09-18 | Wiredset Llc | System and method for collecting and processing data |
US7730316B1 (en) * | 2006-09-22 | 2010-06-01 | Fatlens, Inc. | Method for document fingerprinting |
US8078450B2 (en) * | 2006-10-10 | 2011-12-13 | Abbyy Software Ltd. | Method and system for analyzing various languages and constructing language-independent semantic structures |
US7693773B2 (en) * | 2006-10-13 | 2010-04-06 | Morgan Stanley | Interactive user interface for displaying information related to publicly traded securities |
US20080104225A1 (en) * | 2006-11-01 | 2008-05-01 | Microsoft Corporation | Visualization application for mining of social networks |
US7930302B2 (en) | 2006-11-22 | 2011-04-19 | Intuit Inc. | Method and system for analyzing user-generated content |
GB0701202D0 (en) * | 2007-01-22 | 2007-02-28 | Wanzke Detlev | Data analysis |
US20080249764A1 (en) * | 2007-03-01 | 2008-10-09 | Microsoft Corporation | Smart Sentiment Classifier for Product Reviews |
US7966241B2 (en) | 2007-03-01 | 2011-06-21 | Reginald Nosegbe | Stock method for measuring and assigning precise meaning to market sentiment |
US7734641B2 (en) * | 2007-05-25 | 2010-06-08 | Peerset, Inc. | Recommendation systems and methods using interest correlation |
US7987188B2 (en) | 2007-08-23 | 2011-07-26 | Google Inc. | Domain-specific sentiment classification |
US8280885B2 (en) * | 2007-10-29 | 2012-10-02 | Cornell University | System and method for automatically summarizing fine-grained opinions in digital text |
US8781989B2 (en) * | 2008-01-14 | 2014-07-15 | Aptima, Inc. | Method and system to predict a data value |
US8010539B2 (en) | 2008-01-25 | 2011-08-30 | Google Inc. | Phrase based snippet generation |
JP2011516938A (en) * | 2008-02-22 | 2011-05-26 | ソーシャルレップ・エルエルシー | Systems and methods for measuring and managing distributed online conversations |
US8239189B2 (en) * | 2008-02-26 | 2012-08-07 | Siemens Enterprise Communications Gmbh & Co. Kg | Method and system for estimating a sentiment for an entity |
US8463594B2 (en) | 2008-03-21 | 2013-06-11 | Sauriel Llc | System and method for analyzing text using emotional intelligence factors |
WO2009151502A2 (en) * | 2008-04-08 | 2009-12-17 | Allgress, Inc. | Enterprise information security management software used to prove return on investment of security projects and activities using interactive graphs |
US8117207B2 (en) * | 2008-04-18 | 2012-02-14 | Biz360 Inc. | System and methods for evaluating feature opinions for products, services, and entities |
AU2009260033A1 (en) * | 2008-06-19 | 2009-12-23 | Wize Technologies, Inc. | System and method for aggregating and summarizing product/topic sentiment |
US8446412B2 (en) * | 2008-06-26 | 2013-05-21 | Microsoft Corporation | Static visualization of multiple-dimension data trends |
US8219916B2 (en) * | 2008-07-25 | 2012-07-10 | Yahoo! Inc. | Techniques for visual representation of user activity associated with an information resource |
US20100121707A1 (en) * | 2008-11-13 | 2010-05-13 | Buzzient, Inc. | Displaying analytic measurement of online social media content in a graphical user interface |
US8669994B2 (en) * | 2008-11-15 | 2014-03-11 | New Bis Safe Luxco S.A R.L | Data visualization methods |
US20100332465A1 (en) | 2008-12-16 | 2010-12-30 | Frizo Janssens | Method and system for monitoring online media and dynamically charting the results to facilitate human pattern detection |
US9076125B2 (en) * | 2009-02-27 | 2015-07-07 | Microsoft Technology Licensing, Llc | Visualization of participant relationships and sentiment for electronic messaging |
JP4852119B2 (en) * | 2009-03-25 | 2012-01-11 | 株式会社東芝 | Data display device, data display method, and data display program |
US8166032B2 (en) * | 2009-04-09 | 2012-04-24 | MarketChorus, Inc. | System and method for sentiment-based text classification and relevancy ranking |
JP5559306B2 (en) * | 2009-04-24 | 2014-07-23 | アルグレス・インコーポレイテッド | Enterprise information security management software for predictive modeling using interactive graphs |
US8504550B2 (en) * | 2009-05-15 | 2013-08-06 | Citizennet Inc. | Social network message categorization systems and methods |
US8346702B2 (en) * | 2009-05-22 | 2013-01-01 | Step 3 Systems, Inc. | System and method for automatically predicting the outcome of expert forecasts |
JP5879260B2 (en) * | 2009-06-09 | 2016-03-08 | イービーエイチ エンタープライズィーズ インコーポレイテッド | Method and apparatus for analyzing content of microblog message |
JP5795580B2 (en) * | 2009-07-16 | 2015-10-14 | ブルーフィン ラボズ インコーポレイテッド | Estimating and displaying social interests in time-based media |
US8386482B2 (en) * | 2009-09-02 | 2013-02-26 | Xurmo Technologies Private Limited | Method for personalizing information retrieval in a communication network |
US20110112995A1 (en) * | 2009-10-28 | 2011-05-12 | Industrial Technology Research Institute | Systems and methods for organizing collective social intelligence information using an organic object data model |
US20120137367A1 (en) * | 2009-11-06 | 2012-05-31 | Cataphora, Inc. | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
US8281238B2 (en) * | 2009-11-10 | 2012-10-02 | Primal Fusion Inc. | System, method and computer program for creating and manipulating data structures using an interactive graphical interface |
US11132748B2 (en) * | 2009-12-01 | 2021-09-28 | Refinitiv Us Organization Llc | Method and apparatus for risk mining |
US20120316916A1 (en) * | 2009-12-01 | 2012-12-13 | Andrews Sarah L | Methods and systems for generating corporate green score using social media sourced data and sentiment analysis |
US20120296845A1 (en) * | 2009-12-01 | 2012-11-22 | Andrews Sarah L | Methods and systems for generating composite index using social media sourced data and sentiment analysis |
US8356025B2 (en) | 2009-12-09 | 2013-01-15 | International Business Machines Corporation | Systems and methods for detecting sentiment-based topics |
US9201863B2 (en) | 2009-12-24 | 2015-12-01 | Woodwire, Inc. | Sentiment analysis from social media content |
US8849649B2 (en) * | 2009-12-24 | 2014-09-30 | Metavana, Inc. | System and method for determining sentiment expressed in documents |
JP5284990B2 (en) * | 2010-01-08 | 2013-09-11 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Processing method for time series analysis of keywords, processing system and computer program |
US8402035B2 (en) | 2010-03-12 | 2013-03-19 | General Sentiment, Inc. | Methods and systems for determing media value |
WO2011119410A2 (en) | 2010-03-24 | 2011-09-29 | Taykey, Ltd. | A system and methods thereof for mining web based user generated content for creation of term taxonomies |
US8965835B2 (en) * | 2010-03-24 | 2015-02-24 | Taykey Ltd. | Method for analyzing sentiment trends based on term taxonomies of user generated content |
US9613139B2 (en) | 2010-03-24 | 2017-04-04 | Taykey Ltd. | System and methods thereof for real-time monitoring of a sentiment trend with respect of a desired phrase |
US20110246921A1 (en) | 2010-03-30 | 2011-10-06 | Microsoft Corporation | Visualizing sentiment of online content |
US8725494B2 (en) | 2010-03-31 | 2014-05-13 | Attivio, Inc. | Signal processing approach to sentiment analysis for entities in documents |
US8326880B2 (en) * | 2010-04-05 | 2012-12-04 | Microsoft Corporation | Summarizing streams of information |
US20110251977A1 (en) | 2010-04-13 | 2011-10-13 | Michal Cialowicz | Ad Hoc Document Parsing |
US20110258256A1 (en) | 2010-04-14 | 2011-10-20 | Bernardo Huberman | Predicting future outcomes |
US20110264531A1 (en) * | 2010-04-26 | 2011-10-27 | Yahoo! Inc. | Watching a user's online world |
US20110275046A1 (en) * | 2010-05-07 | 2011-11-10 | Andrew Grenville | Method and system for evaluating content |
CN102884530A (en) | 2010-05-16 | 2013-01-16 | 捷通国际有限公司 | Data collection, tracking, and analysis for multiple media including impact analysis and influence tracking |
US20110320542A1 (en) | 2010-06-28 | 2011-12-29 | Bank Of America Corporation | Analyzing Social Networking Information |
US9582908B2 (en) * | 2010-10-26 | 2017-02-28 | Inetco Systems Limited | Method and system for interactive visualization of hierarchical time series data |
US8955001B2 (en) * | 2011-07-06 | 2015-02-10 | Symphony Advanced Media | Mobile remote media control platform apparatuses and methods |
US9292602B2 (en) | 2010-12-14 | 2016-03-22 | Microsoft Technology Licensing, Llc | Interactive search results page |
US8706647B2 (en) * | 2010-12-17 | 2014-04-22 | University Of Southern California | Estimating value of user's social influence on other users of computer network system |
US8380607B2 (en) * | 2010-12-17 | 2013-02-19 | Indiana University Research And Technology Corporation | Predicting economic trends via network communication mood tracking |
US8805714B2 (en) * | 2011-01-20 | 2014-08-12 | Ipc Systems, Inc. | User interface displaying communication information |
WO2012116236A2 (en) * | 2011-02-23 | 2012-08-30 | Nova Spivack | System and method for analyzing messages in a network or across networks |
US8660581B2 (en) * | 2011-02-23 | 2014-02-25 | Digimarc Corporation | Mobile device indoor navigation |
US8650023B2 (en) * | 2011-03-21 | 2014-02-11 | Xerox Corporation | Customer review authoring assistant |
US8838438B2 (en) * | 2011-04-29 | 2014-09-16 | Cbs Interactive Inc. | System and method for determining sentiment from text content |
US9100669B2 (en) * | 2011-05-12 | 2015-08-04 | At&T Intellectual Property I, Lp | Method and apparatus for associating micro-blogs with media programs |
US20130018954A1 (en) * | 2011-07-15 | 2013-01-17 | Samsung Electronics Co., Ltd. | Situation-aware user sentiment social interest models |
US10165067B2 (en) * | 2012-06-29 | 2018-12-25 | Nuvi, Llc | Systems and methods for visualization of electronic social network content |
US20140207579A1 (en) * | 2013-01-18 | 2014-07-24 | Salesforce.Com, Inc. | Syndication of online message content using social media |
-
2012
- 2012-03-22 US US13/427,828 patent/US8856056B2/en not_active Expired - Fee Related
- 2012-03-22 US US13/427,819 patent/US9940672B2/en active Active
- 2012-03-22 US US13/427,830 patent/US20120246054A1/en not_active Abandoned
- 2012-03-22 US US13/427,833 patent/US20130073480A1/en not_active Abandoned
-
2018
- 2018-02-26 US US15/904,819 patent/US20180182038A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080270116A1 (en) * | 2007-04-24 | 2008-10-30 | Namrata Godbole | Large-Scale Sentiment Analysis |
US20100325031A1 (en) * | 2009-06-18 | 2010-12-23 | Penson Worldwide, Inc. | Method and system for trading financial assets |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11082722B2 (en) | 2011-01-26 | 2021-08-03 | Afterlive.tv Inc. | Method and system for generating highlights from scored data streams |
US10440402B2 (en) | 2011-01-26 | 2019-10-08 | Afterlive.tv Inc | Method and system for generating highlights from scored data streams |
US10769194B2 (en) | 2011-07-13 | 2020-09-08 | Bluefin Labs, Inc. | Topic and time based media affinity estimation |
US8600984B2 (en) * | 2011-07-13 | 2013-12-03 | Bluefin Labs, Inc. | Topic and time based media affinity estimation |
US9009130B2 (en) | 2011-07-13 | 2015-04-14 | Bluefin Labs, Inc. | Topic and time based media affinity estimation |
US20130018896A1 (en) * | 2011-07-13 | 2013-01-17 | Bluefin Labs, Inc. | Topic and Time Based Media Affinity Estimation |
US11301505B2 (en) | 2011-07-13 | 2022-04-12 | Bluefin Labs, Inc. | Topic and time based media affinity estimation |
US9753923B2 (en) | 2011-07-13 | 2017-09-05 | Bluefin Labs, Inc. | Topic and time based media affinity estimation |
US9104734B2 (en) | 2012-02-07 | 2015-08-11 | Social Market Analytics, Inc. | Systems and methods of detecting, measuring, and extracting signatures of signals embedded in social media data streams |
US10031909B2 (en) | 2012-02-07 | 2018-07-24 | Social Market Analytics, Inc. | Systems and methods of detecting, measuring, and extracting signatures of signals embedded in social media data streams |
US10846479B2 (en) | 2012-02-07 | 2020-11-24 | Social Market Analytics, Inc. | Systems and methods of detecting, measuring, and extracting signatures of signals embedded in social media data streams |
US9418389B2 (en) * | 2012-05-07 | 2016-08-16 | Nasdaq, Inc. | Social intelligence architecture using social media message queues |
US11086885B2 (en) * | 2012-05-07 | 2021-08-10 | Nasdaq, Inc. | Social intelligence architecture using social media message queues |
US20210349907A1 (en) * | 2012-05-07 | 2021-11-11 | Nasdaq, Inc. | Social intelligence architecture using social media message queues |
US11803557B2 (en) * | 2012-05-07 | 2023-10-31 | Nasdaq, Inc. | Social intelligence architecture using social media message queues |
US10185996B2 (en) * | 2015-07-15 | 2019-01-22 | Foundation Of Soongsil University Industry Cooperation | Stock fluctuation prediction method and server |
US20170083817A1 (en) * | 2015-09-23 | 2017-03-23 | Isentium, Llc | Topic detection in a social media sentiment extraction system |
JP2020123401A (en) * | 2015-11-16 | 2020-08-13 | ウバープル カンパニー リミテッド | Method for displaying asset information |
JP7021289B2 (en) | 2015-11-16 | 2022-02-16 | ウバープル カンパニー リミテッド | How to display asset information |
Also Published As
Publication number | Publication date |
---|---|
US20120246104A1 (en) | 2012-09-27 |
US20120246054A1 (en) | 2012-09-27 |
US20180182038A1 (en) | 2018-06-28 |
US8856056B2 (en) | 2014-10-07 |
US9940672B2 (en) | 2018-04-10 |
US20120324023A1 (en) | 2012-12-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20180182038A1 (en) | System for generating data from social media messages for the real-time evaluation of publicly traded assets | |
Derakhshan et al. | Sentiment analysis on stock social media for stock price movement prediction | |
US20170083817A1 (en) | Topic detection in a social media sentiment extraction system | |
CN110799981B (en) | Systems and methods for domain-independent aspect level emotion detection | |
Nassirtoussi et al. | Text mining of news-headlines for FOREX market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment | |
Nassirtoussi et al. | Text mining for market prediction: A systematic review | |
Smailović et al. | Stream-based active learning for sentiment analysis in the financial domain | |
Lutz et al. | Sentence-level sentiment analysis of financial news using distributed text representations and multi-instance learning | |
Urolagin | Text mining of tweet for sentiment classification and association with stock prices | |
Kim et al. | Stock price prediction through sentiment analysis of corporate disclosures using distributed representation | |
CN113449108A (en) | Financial news stream burst detection method based on hierarchical clustering | |
Ao | Sentiment analysis based on financial tweets and market information | |
Zhao et al. | Dynamic impacts of online investor sentiment on international crude oil prices | |
Sun et al. | Financial fraud detection based on the part-of-speech features of textual risk disclosures in financial reports | |
Perikos et al. | Opinion mining and visualization of online users reviews: a case study in Booking. com | |
Dash | Information Extraction from Unstructured Big Data: A Case Study of Deep Natural Language Processing in Fintech | |
Kamal et al. | A Comprehensive Review on Summarizing Financial News Using Deep Learning | |
Wade | Transformers and tradition: using Generative AI and Deep Learning for financial markets prediction | |
Xu | Data Mining in Social Media for Stock Market Prediction | |
Gutiérrez et al. | Similarity analysis of federal reserve statements using document embeddings: the Great Recession vs. COVID-19 | |
Schlaubitz | Natural Language Processing in finance: analysis of sentiment and complexity of news and earnings reports of swiss SMEs and their relevance for stock returns | |
Lwanga | Stock market price prediction using sentiment analysis: a case study of Nairobi stock exchange market | |
Bhalla et al. | A Review of Various Sentiment Analysis Techniques, Methodologies and their Applications. | |
Venturini | Analysing the impact of ECB Communication on Financial Markets: A Text Mining approach | |
Albahli et al. | Opinion mining for stock trend prediction using deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ISENTIUM TECHNOLOGIES, INC., CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALBERTI, LIONEL;SASTRI, GAUTHAM;SIGNING DATES FROM 20120404 TO 20120517;REEL/FRAME:028309/0968 |
|
AS | Assignment |
Owner name: ISENTIUM, LLC, FLORIDA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ISENTIUM TECHNOLOGIES INC.;REEL/FRAME:033536/0088 Effective date: 20140812 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |